KEY
- Type of dataset class keyINSTANCE
- Type of instances in the datasetpublic class GroupedRandomSplitter<KEY,INSTANCE> extends Object implements TrainSplitProvider<GroupedDataset<KEY,ListDataset<INSTANCE>,INSTANCE>>, TestSplitProvider<GroupedDataset<KEY,ListDataset<INSTANCE>,INSTANCE>>, ValidateSplitProvider<GroupedDataset<KEY,ListDataset<INSTANCE>,INSTANCE>>
GroupedDataset
into subsets for training,
validation and testing. The number of instances required for each subset can
be chosen independently. Instances are assigned to subsets randomly without
replacement within the groups.
The GroupedRandomSplitter
class allows the splits to be recomputed at
any time. This makes it easy to generate new splits (for cross-validation for
example). There are static methods to simplify the generation of such data.
Constructor and Description |
---|
GroupedRandomSplitter(GroupedDataset<KEY,? extends ListDataset<INSTANCE>,INSTANCE> dataset,
int numTraining,
int numValidation,
int numTesting)
Construct the dataset splitter with the given target instance sizes for
each group of the training, validation and testing data.
|
Modifier and Type | Method and Description |
---|---|
static <KEY,INSTANCE> |
createCrossValidationData(GroupedDataset<KEY,? extends ListDataset<INSTANCE>,INSTANCE> dataset,
int numTraining,
int numValidation,
int numIterations)
Create a
CrossValidationIterable from the dataset. |
GroupedDataset<KEY,ListDataset<INSTANCE>,INSTANCE> |
getTestDataset() |
GroupedDataset<KEY,ListDataset<INSTANCE>,INSTANCE> |
getTrainingDataset() |
GroupedDataset<KEY,ListDataset<INSTANCE>,INSTANCE> |
getValidationDataset() |
void |
recomputeSubsets()
Recompute the underlying splits of the training, validation and testing
data by randomly picking new subsets of the input dataset given in the
constructor.
|
public GroupedRandomSplitter(GroupedDataset<KEY,? extends ListDataset<INSTANCE>,INSTANCE> dataset, int numTraining, int numValidation, int numTesting)
dataset
- the dataset to splitnumTraining
- the number of training instances per groupnumValidation
- the number of validation instances per groupnumTesting
- the number of testing instances per grouppublic void recomputeSubsets()
public GroupedDataset<KEY,ListDataset<INSTANCE>,INSTANCE> getTestDataset()
getTestDataset
in interface TestSplitProvider<GroupedDataset<KEY,ListDataset<INSTANCE>,INSTANCE>>
public GroupedDataset<KEY,ListDataset<INSTANCE>,INSTANCE> getTrainingDataset()
getTrainingDataset
in interface TrainSplitProvider<GroupedDataset<KEY,ListDataset<INSTANCE>,INSTANCE>>
public GroupedDataset<KEY,ListDataset<INSTANCE>,INSTANCE> getValidationDataset()
getValidationDataset
in interface ValidateSplitProvider<GroupedDataset<KEY,ListDataset<INSTANCE>,INSTANCE>>
public static <KEY,INSTANCE> CrossValidationIterable<GroupedDataset<KEY,ListDataset<INSTANCE>,INSTANCE>> createCrossValidationData(GroupedDataset<KEY,? extends ListDataset<INSTANCE>,INSTANCE> dataset, int numTraining, int numValidation, int numIterations)
CrossValidationIterable
from the dataset. Internally,
this method creates a GroupedRandomSplitter
to split the dataset
into subsets of the requested size (with no test instances) and then
produces an CrossValidationIterable
that recomputes the subsets
on each iteration through recomputeSubsets()
.KEY
- Type of dataset class keyINSTANCE
- Type of instances in the datasetdataset
- the dataset to splitnumTraining
- the number of training instances per groupnumValidation
- the number of validation instances per groupnumIterations
- the number of cross-validation iterations to createCrossValidationIterable