public class CountWordsAcrossTimeperiod extends StageProvider
CountWordsAcrossTimeperiod.Map
and
CountWordsAcrossTimeperiod.Reduce
.
The Map expects times as keys and maps of word counts as input. The mapper
emits words as keys with pairs of the timeperiods and counts within that time
period
This along with the TimeIndex generated by the
CountTweetsInTimeperiod
is used to construct a DFIDF per word per
time period.
One interpretation of the DFIDF score is a weighting based on counts up to a
particular time period. This functionality is encoded in the
CountWordsAcrossTimeperiod.NonCombinedTimesReducer
Another interpretation is that the DFIDF score weights words based on word
occurences at the end of some period of time, i.e. the count at some ENDModifier and Type | Class and Description |
---|---|
static class |
CountWordsAcrossTimeperiod.Map
function(timePeriodLength) map input:
|
static class |
CountWordsAcrossTimeperiod.NonCombinedTimesReducer
reduce input:
|
static class |
CountWordsAcrossTimeperiod.Reduce
reduce input:
|
Modifier and Type | Field and Description |
---|---|
static String |
ARGS_KEY
arg key
|
static String |
WORDCOUNT_DIR
where the intermediat word counts should be stored
|
Constructor and Description |
---|
CountWordsAcrossTimeperiod(String[] nonHadoopArgs) |
CountWordsAcrossTimeperiod(String[] nonHadoopArgs,
boolean combinedTimes) |
Modifier and Type | Method and Description |
---|---|
SimpleSequenceFileStage<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.Text,org.apache.hadoop.io.BytesWritable> |
stage() |
public static final String ARGS_KEY
public static final String WORDCOUNT_DIR
public CountWordsAcrossTimeperiod(String[] nonHadoopArgs)
nonHadoopArgs
- public CountWordsAcrossTimeperiod(String[] nonHadoopArgs, boolean combinedTimes)
nonHadoopArgs
- combinedTimes
- whether the mapper expects times entries with values for each
word. i.e. combined timespublic SimpleSequenceFileStage<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.io.Text,org.apache.hadoop.io.BytesWritable> stage()
stage
in class StageProvider