INPUT_FORMAT
- The job's input format. Must be a FileOutputFormat
. Used
to FileInputFormat.setInputPaths(Job, Path...)
with the
stage's input locationsOUTPUT_FORMAT
- The job's output format. Must be a FileOutputFormat
. Used
to FileOutputFormat.setOutputPath(Job, Path)
with the
stage's output locationINPUT_KEY
- The key format of the input to the map taskINPUT_VALUE
- The value format of the input to the map taskMAP_OUTPUT_KEY
- The key format of the output of the map task (and therefore the
input of the reduce)MAP_OUTPUT_VALUE
- The value format of the output of the map task (and therefore the
input of the reduce)OUTPUT_KEY
- The key format of the output of the reduce taskOUTPUT_VALUE
- The valueformat of the output of the reduce taskpublic abstract class Stage<INPUT_FORMAT extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<INPUT_KEY,INPUT_VALUE>,OUTPUT_FORMAT extends org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<OUTPUT_KEY,OUTPUT_VALUE>,INPUT_KEY,INPUT_VALUE,MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE,OUTPUT_KEY,OUTPUT_VALUE> extends Object
setup(Job)
which is called before the job is being returnedConstructor and Description |
---|
Stage()
Inititalise all the classes based on the generics
|
Modifier and Type | Method and Description |
---|---|
Class<? extends org.apache.hadoop.mapreduce.Reducer<MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE,MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE>> |
combiner()
By default this method returns the
IdentityReducer class. |
void |
finished(org.apache.hadoop.mapreduce.Job job)
Called when the stage's job is completed.
|
boolean |
lzoCompress() |
Class<? extends org.apache.hadoop.mapreduce.Mapper<INPUT_KEY,INPUT_VALUE,MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE>> |
mapper()
By default this method returns the
IdentityMapper class. |
String |
outname() |
Class<? extends org.apache.hadoop.mapreduce.Reducer<MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE,OUTPUT_KEY,OUTPUT_VALUE>> |
reducer()
By default this method returns the
IdentityReducer class. |
void |
setCombinerClass(org.apache.hadoop.mapreduce.Job job,
Class<? extends org.apache.hadoop.mapreduce.Reducer<MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE,MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE>> combiner)
For stages which require more fine grained control of how a job's
combiner is set.
|
void |
setMapperClass(org.apache.hadoop.mapreduce.Job job,
Class<? extends org.apache.hadoop.mapreduce.Mapper<INPUT_KEY,INPUT_VALUE,MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE>> mapper)
For stages which need more fine grained control of how a job's mapper is
set.
|
void |
setReducerClass(org.apache.hadoop.mapreduce.Job job,
Class<? extends org.apache.hadoop.mapreduce.Reducer<MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE,OUTPUT_KEY,OUTPUT_VALUE>> reducer)
For stages which require more fine grained control of how a job's reducer
is set.
|
void |
setup(org.apache.hadoop.mapreduce.Job job)
Add any final adjustments to the job's config
|
org.apache.hadoop.mapreduce.Job |
stage(org.apache.hadoop.fs.Path[] inputs,
org.apache.hadoop.fs.Path output,
org.apache.hadoop.conf.Configuration conf) |
public Stage()
public String outname()
public org.apache.hadoop.mapreduce.Job stage(org.apache.hadoop.fs.Path[] inputs, org.apache.hadoop.fs.Path output, org.apache.hadoop.conf.Configuration conf) throws Exception
inputs
- the input paths to be expectedoutput
- the output locationconf
- the job configurationException
IOException
public void setCombinerClass(org.apache.hadoop.mapreduce.Job job, Class<? extends org.apache.hadoop.mapreduce.Reducer<MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE,MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE>> combiner)
combiner()
.job
- combiner
- public void setReducerClass(org.apache.hadoop.mapreduce.Job job, Class<? extends org.apache.hadoop.mapreduce.Reducer<MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE,OUTPUT_KEY,OUTPUT_VALUE>> reducer)
reducer()
.job
- reducer
- public void setMapperClass(org.apache.hadoop.mapreduce.Job job, Class<? extends org.apache.hadoop.mapreduce.Mapper<INPUT_KEY,INPUT_VALUE,MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE>> mapper)
MultithreadedMapper
) stages should overwrite
this class, set the job's mapper to MultithreadedMapper
with
Job.setMapperClass(Class)
and set the MultithreadedMapper
mapper classed with
MultithreadedMapper.setMapperClass(Job, Class)
.
this function is called with the result of mapper()
job
- mapper
- public void setup(org.apache.hadoop.mapreduce.Job job) throws IOException
job
- IOException
public Class<? extends org.apache.hadoop.mapreduce.Mapper<INPUT_KEY,INPUT_VALUE,MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE>> mapper()
IdentityMapper
class. This
mapper outputs the values handed as they are.public Class<? extends org.apache.hadoop.mapreduce.Reducer<MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE,OUTPUT_KEY,OUTPUT_VALUE>> reducer()
IdentityReducer
class. This
reducer outputs the values handed as they are.public Class<? extends org.apache.hadoop.mapreduce.Reducer<MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE,MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE>> combiner()
IdentityReducer
class. This
combiner outputs the values handed as they are.public void finished(org.apache.hadoop.mapreduce.Job job)
job
- public boolean lzoCompress()