Stage (OpenIMAJ master project 1.3.10 API)

java.lang.Object
- org.openimaj.hadoop.mapreduce.stage.Stage<INPUT_FORMAT,OUTPUT_FORMAT,INPUT_KEY,INPUT_VALUE,MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE,OUTPUT_KEY,OUTPUT_VALUE>

Type Parameters:

INPUT_FORMAT - The job's input format. Must be a FileOutputFormat. Used to FileInputFormat.setInputPaths(Job, Path...) with the stage's input locations

OUTPUT_FORMAT - The job's output format. Must be a FileOutputFormat. Used to FileOutputFormat.setOutputPath(Job, Path) with the stage's output location

INPUT_KEY - The key format of the input to the map task

INPUT_VALUE - The value format of the input to the map task

MAP_OUTPUT_KEY - The key format of the output of the map task (and therefore the input of the reduce)

MAP_OUTPUT_VALUE - The value format of the output of the map task (and therefore the input of the reduce)

OUTPUT_KEY - The key format of the output of the reduce task

OUTPUT_VALUE - The valueformat of the output of the reduce task

Direct Known Subclasses:

MultipleOutputStage, SequenceFileStage, SequenceFileTextStage, TextDaySplitStage, TextSequenceFileStage, TextSomethingTextStage, TextStage
```
public abstract class Stage<INPUT_FORMAT extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<INPUT_KEY,INPUT_VALUE>,OUTPUT_FORMAT extends org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<OUTPUT_KEY,OUTPUT_VALUE>,INPUT_KEY,INPUT_VALUE,MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE,OUTPUT_KEY,OUTPUT_VALUE>
extends Object
```
A stage in a multi step job. Each step is told where the jobs data will come from, where the output should be directed and then is expected to produce a stage. The job is configured and set up based on the generic types assigned to the stage. For most jobs these generics and providing the mapper/reducer classes should be enough. If any further settings need to be configured use the setup(Job) which is called before the job is being returned

Author:

Sina Samangooei (ss@ecs.soton.ac.uk)

Constructor Summary

Constructors
Constructor and Description

Stage()
Inititalise all the classes based on the generics

Constructors
Constructor and Description
`Stage()` Inititalise all the classes based on the generics

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`Class<? extends org.apache.hadoop.mapreduce.Reducer<MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE,MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE>>`	`combiner()` By default this method returns the `IdentityReducer` class.
`void`	`finished(org.apache.hadoop.mapreduce.Job job)` Called when the stage's job is completed.
`boolean`	`lzoCompress()`
`Class<? extends org.apache.hadoop.mapreduce.Mapper<INPUT_KEY,INPUT_VALUE,MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE>>`	`mapper()` By default this method returns the `IdentityMapper` class.
`String`	`outname()`
`Class<? extends org.apache.hadoop.mapreduce.Reducer<MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE,OUTPUT_KEY,OUTPUT_VALUE>>`	`reducer()` By default this method returns the `IdentityReducer` class.
`void`	`setCombinerClass(org.apache.hadoop.mapreduce.Job job, Class<? extends org.apache.hadoop.mapreduce.Reducer<MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE,MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE>> combiner)` For stages which require more fine grained control of how a job's combiner is set.
`void`	`setMapperClass(org.apache.hadoop.mapreduce.Job job, Class<? extends org.apache.hadoop.mapreduce.Mapper<INPUT_KEY,INPUT_VALUE,MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE>> mapper)` For stages which need more fine grained control of how a job's mapper is set.
`void`	`setReducerClass(org.apache.hadoop.mapreduce.Job job, Class<? extends org.apache.hadoop.mapreduce.Reducer<MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE,OUTPUT_KEY,OUTPUT_VALUE>> reducer)` For stages which require more fine grained control of how a job's reducer is set.
`void`	`setup(org.apache.hadoop.mapreduce.Job job)` Add any final adjustments to the job's config
`org.apache.hadoop.mapreduce.Job`	`stage(org.apache.hadoop.fs.Path[] inputs, org.apache.hadoop.fs.Path output, org.apache.hadoop.conf.Configuration conf)`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - Stage
```
public Stage()
```
    Inititalise all the classes based on the generics
- Method Detail
  - outname
```
public String outname()
```
    Returns:
    
    the name of the output directory of this stage. If the name is null the directory itself is used.
  - stage
```
public org.apache.hadoop.mapreduce.Job stage(org.apache.hadoop.fs.Path[] inputs,
                                             org.apache.hadoop.fs.Path output,
                                             org.apache.hadoop.conf.Configuration conf)
                                      throws Exception
```
    Parameters:
    
    inputs - the input paths to be expected
    
    output - the output location
    
    conf - the job configuration
    
    Returns:
    
    the job to be launched in this stage
    
    Throws:
    
    Exception
    
    IOException
  - setCombinerClass
```
public void setCombinerClass(org.apache.hadoop.mapreduce.Job job,
                             Class<? extends org.apache.hadoop.mapreduce.Reducer<MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE,MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE>> combiner)
```
    For stages which require more fine grained control of how a job's combiner is set. This class is called with the job being constructed by this stage and the result of combiner().
    
    Parameters:
    
    job -
    
    combiner -
  - setReducerClass
```
public void setReducerClass(org.apache.hadoop.mapreduce.Job job,
                            Class<? extends org.apache.hadoop.mapreduce.Reducer<MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE,OUTPUT_KEY,OUTPUT_VALUE>> reducer)
```
    For stages which require more fine grained control of how a job's reducer is set. This class is called with the job being constructed by this stage and the result of reducer().
    
    Parameters:
    
    job -
    
    reducer -
  - setMapperClass
```
public void setMapperClass(org.apache.hadoop.mapreduce.Job job,
                           Class<? extends org.apache.hadoop.mapreduce.Mapper<INPUT_KEY,INPUT_VALUE,MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE>> mapper)
```
    For stages which need more fine grained control of how a job's mapper is set. For example, MultithreadedMapper) stages should overwrite this class, set the job's mapper to MultithreadedMapper with Job.setMapperClass(Class) and set the MultithreadedMapper mapper classed with MultithreadedMapper.setMapperClass(Job, Class). this function is called with the result of mapper()
    
    Parameters:
    
    job -
    
    mapper -
  - setup
```
public void setup(org.apache.hadoop.mapreduce.Job job)
           throws IOException
```
    Add any final adjustments to the job's config
    
    Parameters:
    
    job -
    
    Throws:
    
    IOException
  - mapper
```
public Class<? extends org.apache.hadoop.mapreduce.Mapper<INPUT_KEY,INPUT_VALUE,MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE>> mapper()
```
    By default this method returns the IdentityMapper class. This mapper outputs the values handed as they are.
    
    Returns:
    
    the class of the mapper to use
  - reducer
```
public Class<? extends org.apache.hadoop.mapreduce.Reducer<MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE,OUTPUT_KEY,OUTPUT_VALUE>> reducer()
```
    By default this method returns the IdentityReducer class. This reducer outputs the values handed as they are.
    
    Returns:
    
    the class of the reducer to use
  - combiner
```
public Class<? extends org.apache.hadoop.mapreduce.Reducer<MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE,MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE>> combiner()
```
    By default this method returns the IdentityReducer class. This combiner outputs the values handed as they are.
    
    Returns:
    
    the class of the reducer to use
  - finished
```
public void finished(org.apache.hadoop.mapreduce.Job job)
```
    Called when the stage's job is completed. Might never be called in some cases. For example, when the stagerunner is told specifically not to wait for the job to finish.
    
    Parameters:
    
    job -
  - lzoCompress
```
public boolean lzoCompress()
```
    Returns:
    
    Whether this stage should LZO compress its output

Class Stage<INPUT_FORMAT extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<INPUT_KEY,INPUT_VALUE>,OUTPUT_FORMAT extends org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<OUTPUT_KEY,OUTPUT_VALUE>,INPUT_KEY,INPUT_VALUE,MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE,OUTPUT_KEY,OUTPUT_VALUE>

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

Stage

Method Detail

outname

stage

setCombinerClass

setReducerClass

setMapperClass

setup

mapper

reducer

combiner

finished

lzoCompress