INPUT_FORMAT - The job's input format. Must be a FileOutputFormat. Used
to FileInputFormat.setInputPaths(Job, Path...) with the
stage's input locationsOUTPUT_FORMAT - The job's output format. Must be a FileOutputFormat. Used
to FileOutputFormat.setOutputPath(Job, Path) with the
stage's output locationINPUT_KEY - The key format of the input to the map taskINPUT_VALUE - The value format of the input to the map taskMAP_OUTPUT_KEY - The key format of the output of the map task (and therefore the
input of the reduce)MAP_OUTPUT_VALUE - The value format of the output of the map task (and therefore the
input of the reduce)OUTPUT_KEY - The key format of the output of the reduce taskOUTPUT_VALUE - The valueformat of the output of the reduce taskpublic abstract class Stage<INPUT_FORMAT extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<INPUT_KEY,INPUT_VALUE>,OUTPUT_FORMAT extends org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<OUTPUT_KEY,OUTPUT_VALUE>,INPUT_KEY,INPUT_VALUE,MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE,OUTPUT_KEY,OUTPUT_VALUE> extends Object
setup(Job) which is called before the job is being returned| Constructor and Description |
|---|
Stage()
Inititalise all the classes based on the generics
|
| Modifier and Type | Method and Description |
|---|---|
Class<? extends org.apache.hadoop.mapreduce.Reducer<MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE,MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE>> |
combiner()
By default this method returns the
IdentityReducer class. |
void |
finished(org.apache.hadoop.mapreduce.Job job)
Called when the stage's job is completed.
|
boolean |
lzoCompress() |
Class<? extends org.apache.hadoop.mapreduce.Mapper<INPUT_KEY,INPUT_VALUE,MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE>> |
mapper()
By default this method returns the
IdentityMapper class. |
String |
outname() |
Class<? extends org.apache.hadoop.mapreduce.Reducer<MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE,OUTPUT_KEY,OUTPUT_VALUE>> |
reducer()
By default this method returns the
IdentityReducer class. |
void |
setCombinerClass(org.apache.hadoop.mapreduce.Job job,
Class<? extends org.apache.hadoop.mapreduce.Reducer<MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE,MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE>> combiner)
For stages which require more fine grained control of how a job's
combiner is set.
|
void |
setMapperClass(org.apache.hadoop.mapreduce.Job job,
Class<? extends org.apache.hadoop.mapreduce.Mapper<INPUT_KEY,INPUT_VALUE,MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE>> mapper)
For stages which need more fine grained control of how a job's mapper is
set.
|
void |
setReducerClass(org.apache.hadoop.mapreduce.Job job,
Class<? extends org.apache.hadoop.mapreduce.Reducer<MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE,OUTPUT_KEY,OUTPUT_VALUE>> reducer)
For stages which require more fine grained control of how a job's reducer
is set.
|
void |
setup(org.apache.hadoop.mapreduce.Job job)
Add any final adjustments to the job's config
|
org.apache.hadoop.mapreduce.Job |
stage(org.apache.hadoop.fs.Path[] inputs,
org.apache.hadoop.fs.Path output,
org.apache.hadoop.conf.Configuration conf) |
public Stage()
public String outname()
public org.apache.hadoop.mapreduce.Job stage(org.apache.hadoop.fs.Path[] inputs, org.apache.hadoop.fs.Path output, org.apache.hadoop.conf.Configuration conf) throws Exception
inputs - the input paths to be expectedoutput - the output locationconf - the job configurationExceptionIOExceptionpublic void setCombinerClass(org.apache.hadoop.mapreduce.Job job, Class<? extends org.apache.hadoop.mapreduce.Reducer<MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE,MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE>> combiner)
combiner().job - combiner - public void setReducerClass(org.apache.hadoop.mapreduce.Job job, Class<? extends org.apache.hadoop.mapreduce.Reducer<MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE,OUTPUT_KEY,OUTPUT_VALUE>> reducer)
reducer().job - reducer - public void setMapperClass(org.apache.hadoop.mapreduce.Job job, Class<? extends org.apache.hadoop.mapreduce.Mapper<INPUT_KEY,INPUT_VALUE,MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE>> mapper)
MultithreadedMapper) stages should overwrite
this class, set the job's mapper to MultithreadedMapper with
Job.setMapperClass(Class) and set the MultithreadedMapper
mapper classed with
MultithreadedMapper.setMapperClass(Job, Class).
this function is called with the result of mapper()job - mapper - public void setup(org.apache.hadoop.mapreduce.Job job) throws IOException
job - IOExceptionpublic Class<? extends org.apache.hadoop.mapreduce.Mapper<INPUT_KEY,INPUT_VALUE,MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE>> mapper()
IdentityMapper class. This
mapper outputs the values handed as they are.public Class<? extends org.apache.hadoop.mapreduce.Reducer<MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE,OUTPUT_KEY,OUTPUT_VALUE>> reducer()
IdentityReducer class. This
reducer outputs the values handed as they are.public Class<? extends org.apache.hadoop.mapreduce.Reducer<MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE,MAP_OUTPUT_KEY,MAP_OUTPUT_VALUE>> combiner()
IdentityReducer class. This
combiner outputs the values handed as they are.public void finished(org.apache.hadoop.mapreduce.Job job)
job - public boolean lzoCompress()