com.rapidminer.operator.validation
Class XValidation

java.lang.Object
  extended by com.rapidminer.tools.AbstractObservable<Operator>
      extended by com.rapidminer.operator.Operator
          extended by com.rapidminer.operator.OperatorChain
              extended by com.rapidminer.operator.validation.ValidationChain
                  extended by com.rapidminer.operator.validation.XValidation
All Implemented Interfaces:
ConfigurationListener, PreviewListener, ResourceConsumer, ParameterHandler, LoggingHandler, Observable<Operator>

public class XValidation
extends ValidationChain

XValidation encapsulates a cross-validation process. The example set S is split up into number_of_validations subsets S_i. The inner operators are applied number_of_validations times using S_i as the test set (input of the second inner operator) and S\backslash S_i training set (input of the first inner operator).

The first inner operator must accept an ExampleSet while the second must accept an ExampleSet and the output of the first (which is in most cases a Model) and must produce a PerformanceVector.

Like other validation schemes the RapidMiner cross validation can use several types of sampling for building the subsets. Linear sampling simply divides the example set into partitions without changing the order of the examples. Shuffled sampling build random subsets from the data. Stratifed sampling builds random subsets and ensures that the class distribution in the subsets is the same as in the whole example set.

The cross validation operator provides several values which can be logged by means of a ProcessLogOperator. Of course the number of the current iteration can be logged which might be useful for ProcessLog operators wrapped inside a cross validation. Beside that, all performance estimation operators of RapidMiner provide access to the average values calculated during the estimation. Since the operator cannot ensure the names of the delivered criteria, the ProcessLog operator can access the values via the generic value names:

Author:
Ingo Mierswa
Keywords:
cross-validation

Field Summary
static java.lang.String PARAMETER_AVERAGE_PERFORMANCES_ONLY
          The parameter name for "Indicates if only performance vectors should be averaged or all types of averagable result vectors"
static java.lang.String PARAMETER_LEAVE_ONE_OUT
          The parameter name for "Set the number of validations to the number of examples.
static java.lang.String PARAMETER_NUMBER_OF_VALIDATIONS
          The parameter name for "Number of subsets for the crossvalidation.
static java.lang.String PARAMETER_SAMPLING_TYPE
          The parameter name for "Defines the sampling type of the cross validation (linear = consecutive subsets, shuffled = random subsets, stratified = random subsets with class distribution kept constant)"
 
Fields inherited from class com.rapidminer.operator.validation.ValidationChain
PARAMETER_CREATE_COMPLETE_MODEL
 
Constructor Summary
XValidation(OperatorDescription description)
           
 
Method Summary
 void estimatePerformance(ExampleSet inputSet)
          This is the main method of the validation chain and must be implemented to estimate a performance of inner operators on the given example set.
 java.util.List<ParameterType> getParameterTypes()
          Returns a list of ParameterTypes describing the parameters of this operator.
protected  MDInteger getTestSetSize(MDInteger originalSize)
           
protected  MDInteger getTrainingSetSize(MDInteger originalSize)
           
protected  void performIteration(SplittedExampleSet splittedES, int iteration)
           
 
Methods inherited from class com.rapidminer.operator.validation.ValidationChain
doWork, evaluate, executeEvaluator, executeLearner, learn, shouldAutoConnect
 
Methods inherited from class com.rapidminer.operator.OperatorChain
addOperator, addOperator, addSubprocess, areSubprocessesExtendable, assumePreconditionsSatisfied, checkDeprecations, checkIO, checkNumberOfInnerOperators, checkProperties, clear, cloneOperator, collectErrors, createProcessTree, createSubprocess, freeMemory, getAllInnerOperators, getAllInnerOperatorsAndMe, getImmediateChildren, getIndexOfOperator, getInnerOperatorCondition, getMaxNumberOfInnerOperators, getMinNumberOfInnerOperators, getNumberOfAllOperators, getNumberOfOperators, getNumberOfSubprocesses, getOperator, getOperatorFromAll, getOperators, getSubprocess, getSubprocesses, isEnabled, lookupOperator, notifyRenaming, performAdditionalChecks, processFinished, processStarts, propagateDirtyness, registerOperator, removeOperator, removeSubprocess, shouldAddNonConsumedInput, shouldReturnInnerOutput, unregisterOperator, updateExecutionOrder, walk
 
Methods inherited from class com.rapidminer.operator.Operator
acceptsInput, addError, addError, addValue, addWarning, apply, apply, checkAll, checkAllExcludingMetaData, checkForStop, clearErrorList, createExperimentTree, createExperimentTree, createFromXML, createFromXML, createFromXML, createMarkedExperimentTree, createMarkedProcessTree, createProcessTree, disconnectPorts, execute, fireUpdate, getAddOnlyAdditionalOutput, getApplyCount, getCompatibilityLevel, getDeliveredOutputClasses, getDeprecationInfo, getDesiredInputClasses, getDOMRepresentation, getEncoding, getErrorList, getExecutionUnit, getExperiment, getIncompatibleVersionChanges, getInput, getInput, getInput, getInputClasses, getInputDescription, getInputPorts, getIODescription, getLog, getLogger, getName, getNumberOfBreakpoints, getOperatorClassName, getOperatorDescription, getOutputClasses, getOutputPorts, getParameter, getParameterAsBoolean, getParameterAsChar, getParameterAsColor, getParameterAsDouble, getParameterAsFile, getParameterAsFile, getParameterAsInputStream, getParameterAsInt, getParameterAsMatrix, getParameterAsRepositoryLocation, getParameterAsString, getParameterHandler, getParameterList, getParameters, getParameterTupel, getParameterType, getParent, getPortOwner, getProcess, getResourceConsumptionEstimator, getRoot, getStartTime, getTransformer, getUserDescription, getValue, getValues, getXML, getXML, getXML, hasBreakpoint, hasBreakpoint, hasInput, inApplyLoop, isDebugMode, isDirty, isExpanded, isParallel, isParameterSet, isRunning, log, log, logError, logNote, logWarning, makeDirty, makeDirtyOnUpdate, preAutoWire, producesOutput, register, remove, removeAndKeepConnections, rename, resume, setBreakpoint, setCompatibilityLevel, setEnabled, setEnclosingProcess, setExpanded, setInput, setListParameter, setPairParameter, setParameter, setParameters, setUserDescription, shouldAutoConnect, shouldStopStandaloneExecution, toString, transformMetaData, writeXML, writeXML
 
Methods inherited from class com.rapidminer.tools.AbstractObservable
addObserver, addObserverAsFirst, fireUpdate, removeObserver
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

PARAMETER_NUMBER_OF_VALIDATIONS

public static final java.lang.String PARAMETER_NUMBER_OF_VALIDATIONS
The parameter name for "Number of subsets for the crossvalidation."

See Also:
Constant Field Values

PARAMETER_LEAVE_ONE_OUT

public static final java.lang.String PARAMETER_LEAVE_ONE_OUT
The parameter name for "Set the number of validations to the number of examples. If set to true, number_of_validations is ignored"

See Also:
Constant Field Values

PARAMETER_SAMPLING_TYPE

public static final java.lang.String PARAMETER_SAMPLING_TYPE
The parameter name for "Defines the sampling type of the cross validation (linear = consecutive subsets, shuffled = random subsets, stratified = random subsets with class distribution kept constant)"

See Also:
Constant Field Values

PARAMETER_AVERAGE_PERFORMANCES_ONLY

public static final java.lang.String PARAMETER_AVERAGE_PERFORMANCES_ONLY
The parameter name for "Indicates if only performance vectors should be averaged or all types of averagable result vectors"

See Also:
Constant Field Values
Constructor Detail

XValidation

public XValidation(OperatorDescription description)
Method Detail

estimatePerformance

public void estimatePerformance(ExampleSet inputSet)
                         throws OperatorException
Description copied from class: ValidationChain
This is the main method of the validation chain and must be implemented to estimate a performance of inner operators on the given example set. The implementation can make use of the provided helper methods in this class.

Specified by:
estimatePerformance in class ValidationChain
Throws:
OperatorException

performIteration

protected void performIteration(SplittedExampleSet splittedES,
                                int iteration)
                         throws OperatorException,
                                ProcessStoppedException
Throws:
OperatorException
ProcessStoppedException

getTestSetSize

protected MDInteger getTestSetSize(MDInteger originalSize)
                            throws UndefinedParameterError
Specified by:
getTestSetSize in class ValidationChain
Throws:
UndefinedParameterError

getTrainingSetSize

protected MDInteger getTrainingSetSize(MDInteger originalSize)
                                throws UndefinedParameterError
Specified by:
getTrainingSetSize in class ValidationChain
Throws:
UndefinedParameterError

getParameterTypes

public java.util.List<ParameterType> getParameterTypes()
Description copied from class: Operator
Returns a list of ParameterTypes describing the parameters of this operator. The default implementation returns an empty list if no input objects can be retained and special parameters for those input objects which can be prevented from being consumed. ATTENTION! This will create new parameterTypes. For calling already existing parameter types use getParameters().getParameterTypes();

Specified by:
getParameterTypes in interface ParameterHandler
Overrides:
getParameterTypes in class ValidationChain


Copyright © 2001-2009 by Rapid-I