|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectcom.rapidminer.operator.Operator
com.rapidminer.operator.OperatorChain
com.rapidminer.operator.learner.meta.AbstractMetaLearner
com.rapidminer.operator.learner.meta.BayesianBoosting
public class BayesianBoosting
This operator trains an ensemble of classifiers for boolean target
attributes. In each iteration the training set is reweighted, so that
previously discovered patterns and other kinds of prior knowledge are
"sampled out" [Scholz/2005b]. An inner classifier,
typically a rule or decision tree induction algorithm, is sequentially
applied several times, and the models are combined to a single global model.
The number of models to be trained maximally are specified by the parameter
iterations.
If the parameter rescale_label_priors is set, then the example
set is reweighted, so that all classes are equally probable (or frequent).
For two-class problems this turns the problem of fitting models to maximize
weighted relative accuracy into the more common task of classifier induction
[Scholz/2005a]. Applying a rule induction algorithm as an inner
learner allows to do subgroup discovery. This option is also recommended for
data sets with class skew, if a "very weak learner" like a decision
stump is used. If rescale_label_priors is not set, then the
operator performs boosting based on probability estimates.
The estimates used by this operator may either be computed using the same set
as for training, or in each iteration the training set may be split randomly,
so that a model is fitted based on the first subset, and the probabilities
are estimated based on the second. The first solution may be advantageous in
situations where data is rare. Set the parameter
ratio_internal_bootstrap to 1 to use the same set for training
as for estimation. Set this parameter to a value of lower than 1 to use the
specified subset of data for training, and the remaining examples for
probability estimation.
If the parameter allow_marginal_skews is not set,
then the support of each subset defined in terms of common base model
predictions does not change from one iteration to the next. Analogously the
class priors do not change. This is the procedure originally described in
[Scholz/2005b] in the context of subgroup discovery.
Setting the allow_marginal_skews option to true
leads to a procedure that changes the marginal weights/probabilities of
subsets, if this is beneficial in a boosting context, and stratifies the two
classes to be equally likely. As for AdaBoost, the total weight upper-bounds
the training error in this case. This bound is reduced more quickly by the
BayesianBoosting operator, however.
In sum, to reproduce the sequential sampling, or knowledge-based sampling,
from [Scholz/2005b] for subgroup discovery, two of the
default parameter settings of this operator have to be changed:
rescale_label_priors must
be set to true, and allow_marginal_skews must
be set to false. In addition, a boolean (binomial) label
has to be used.
The operator requires an example set as its input. To sample out prior knowledge of a different form it is possible to provide another model as an optional additional input. The predictions of this model are used to weight produce an initial weighting of the training set. The ouput of the operator is a classification model applicable for estimating conditional class probabilities or for plain crisp classification. It contains up to the specified number of inner base models. In the case of an optional initial model, this model will also be stored in the output model, in order to produce the same initial weighting during model application.
| Field Summary | |
|---|---|
protected int |
currentIteration
Field for visualizing performance. |
static double |
MIN_ADVANTAGE
Discard models with an advantage of less than the specified value. |
static java.lang.String |
PARAMETER_ALLOW_MARGINAL_SKEWS
Boolean parameter that switches between KBS (if set to false) and a boosting-like reweighting. |
static java.lang.String |
PARAMETER_ITERATIONS
Name of the variable specifying the maximal number of iterations of the learner. |
static java.lang.String |
PARAMETER_RESCALE_LABEL_PRIORS
Boolean parameter to specify whether the label priors should be equally likely after first iteration. |
static java.lang.String |
PARAMETER_USE_SUBSET_FOR_TRAINING
Name of the flag indicating internal bootstrapping. |
| Constructor Summary | |
|---|---|
BayesianBoosting(OperatorDescription description)
Constructor. |
|
| Method Summary | |
|---|---|
java.util.List<ParameterType> |
getParameterTypes()
Adds the parameters "number of iterations" and "model file". |
Model |
learn(ExampleSet exampleSet)
Constructs a Model repeatedly running a weak learner,
reweighting the training example set accordingly, and combining the
hypothesis using the available weighted performance values. |
protected double[] |
prepareWeights(ExampleSet exampleSet)
Creates a weight attribute if not yet done. |
protected double |
reweightExamples(WeightedPerformanceMeasures wp,
ExampleSet exampleSet)
This method reweights the example set with respect to the WeightedPerformanceMeasures object. |
boolean |
supportsCapability(LearnerCapability lc)
Overrides the method of the super class. |
protected Model |
trainBaseModel(ExampleSet exampleSet)
Runs the "embedded" learner on the example set and retuns a model. |
| Methods inherited from class com.rapidminer.operator.learner.meta.AbstractMetaLearner |
|---|
apply, applyInnerLearner, getEstimatedPerformance, getInnerOperatorCondition, getInputClasses, getInputDescription, getMaxNumberOfInnerOperators, getMinNumberOfInnerOperators, getOutputClasses, getWeights, shouldCalculateWeights, shouldEstimatePerformance |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Methods inherited from interface com.rapidminer.operator.learner.Learner |
|---|
getName |
| Field Detail |
|---|
public static final java.lang.String PARAMETER_ITERATIONS
public static final java.lang.String PARAMETER_USE_SUBSET_FOR_TRAINING
public static final java.lang.String PARAMETER_RESCALE_LABEL_PRIORS
public static final java.lang.String PARAMETER_ALLOW_MARGINAL_SKEWS
public static final double MIN_ADVANTAGE
protected int currentIteration
| Constructor Detail |
|---|
public BayesianBoosting(OperatorDescription description)
| Method Detail |
|---|
public boolean supportsCapability(LearnerCapability lc)
supportsCapability in interface LearnersupportsCapability in class AbstractMetaLearner
public Model learn(ExampleSet exampleSet)
throws OperatorException
Model repeatedly running a weak learner,
reweighting the training example set accordingly, and combining the
hypothesis using the available weighted performance values. If the input
contains a model, then this model is used as a starting point for
weighting the examples.
OperatorExceptionprotected double[] prepareWeights(ExampleSet exampleSet)
exampleSet - the example set to be prepared
double[] array containing the class priors.
protected Model trainBaseModel(ExampleSet exampleSet)
throws OperatorException
exampleSet - an ExampleSet to train a model for
Model
OperatorException
protected double reweightExamples(WeightedPerformanceMeasures wp,
ExampleSet exampleSet)
throws OperatorException
WeightedPerformanceMeasures object. Please note that the
weights will not be reset at any time, because they continuously change
from one iteration to the next. This method does not change the priors of
the classes.
wp - the WeightedPerformanceMeasures to useexampleSet - ExampleSet to be reweighted
OperatorExceptionpublic java.util.List<ParameterType> getParameterTypes()
getParameterTypes in interface ParameterHandlergetParameterTypes in class Operator
|
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||