com.rapidminer.operator.preprocessing
Class AttributeSubsetPreprocessing

java.lang.Object
  extended by com.rapidminer.operator.Operator
      extended by com.rapidminer.operator.OperatorChain
          extended by com.rapidminer.operator.preprocessing.AttributeSubsetPreprocessing
All Implemented Interfaces:
ConfigurationListener, PreviewListener, ParameterHandler, LoggingHandler

public class AttributeSubsetPreprocessing
extends OperatorChain

This operator can be used to select one attribute (or a subset) by defining a regular expression for the attribute name and applies its inner operators to the resulting subset. Please note that this operator will also use special attributes which makes it necessary for all preprocessing steps which should be performed on special attributes (and are normally not performed on special attributes).

This operator is also able to deliver the additional results of the inner operator if desired.

Afterwards, the remaining original attributes are added to the resulting example set if the parameter "keep_subset_only" is set to false (default).

Please note that this operator is very powerful and can be used to create new preprocessing schemes by combinating it with other preprocessing operators. Hoewever, there are two major restrictions (among some others): first, since the inner result will be combined with the rest of the input example set, the number of examples (data points) is not allowed to be changed inside of the subset preprocessing. Second, attribute role changes will not be delivered to the outside since internally all special attributes will be changed to regular for the inner operators and role changes can afterwards not be delivered.

Author:
Ingo Mierswa, Shevek

Field Summary
static java.lang.String PARAMETER_ATTRIBUTE_NAME_REGEX
          The parameter name for "A regular expression which matches against all attribute names (including special attributes).
static java.lang.String PARAMETER_DELIVER_INNER_RESULTS
          The parameter name for "Indicates if the additional results (other than example set) of the inner operator should also be returned.
static java.lang.String PARAMETER_INVERT_SELECTION
          The parameter name for "Indicates if the attributes which did not match the regular expression should be removed by this operator.
static java.lang.String PARAMETER_KEEP_SUBSET_ONLY
          The parameter name for "Indicates if the attributes which did not match the regular expression should be removed by this operator.
static java.lang.String PARAMETER_PROCESS_SPECIAL_ATTRIBUTES
          The parameter name for "Indicates if special attributes like labels etc. should also be processed.
 
Constructor Summary
AttributeSubsetPreprocessing(OperatorDescription description)
           
 
Method Summary
 IOObject[] apply()
          Applies all inner operators.
 java.lang.Class<?>[] checkIO(java.lang.Class<?>[] input)
          Subclasses will throw an exception if something isn't ok.
 InnerOperatorCondition getInnerOperatorCondition()
          Must return a condition of the IO behaviour of all desired inner operators.
 java.lang.Class<?>[] getInputClasses()
          Returns the classes that are needed as input.
protected  IODescription getIODescription()
          If you find the getInputClasses() and getOuputClasses() methods for some reason not useful, you may override this method.
 int getMaxNumberOfInnerOperators()
          Returns the maximum number of inner operators.
 int getMinNumberOfInnerOperators()
          Returns the minimum number of inner operators.
 java.lang.Class<?>[] getOutputClasses()
          Returns the classes that are guaranteed to be returned by apply() as additional output.
 java.util.List<ParameterType> getParameterTypes()
          Returns a list of ParameterTypes describing the parameters of this operator.
protected  boolean shouldAddNonConsumedInput()
           
 
Methods inherited from class com.rapidminer.operator.OperatorChain
addAddListener, addOperator, addOperator, checkDeprecations, checkNumberOfInnerOperators, checkProperties, clearErrorList, cloneOperator, createExperimentTree, createProcessTree, getAllInnerOperators, getIndexOfOperator, getInnerOperatorForName, getInnerOperatorsXML, getNumberOfAllOperators, getNumberOfOperators, getOperator, getOperatorFromAll, getOperators, performAdditionalChecks, processFinished, processStarts, registerOperator, removeAddListener, removeOperator, shouldReturnInnerOutput, unregisterOperator
 
Methods inherited from class com.rapidminer.operator.Operator
addError, addValue, addWarning, apply, checkForStop, createExperimentTree, createFromXML, createMarkedExperimentTree, createMarkedProcessTree, createProcessTree, getAddOnlyAdditionalOutput, getApplyCount, getDeliveredOutputClasses, getDeprecationInfo, getDesiredInputClasses, getEncoding, getErrorList, getExperiment, getInput, getInput, getInput, getInputDescription, getIOContainerForInApplyLoopBreakpoint, getLog, getName, getOperatorClassName, getOperatorDescription, getParameter, getParameterAsBoolean, getParameterAsColor, getParameterAsDouble, getParameterAsFile, getParameterAsFile, getParameterAsInputStream, getParameterAsInt, getParameterAsMatrix, getParameterAsString, getParameterList, getParameters, getParameterType, getParent, getProcess, getStartTime, getStatus, getUserDescription, getValue, getValues, getXML, hasBreakpoint, hasBreakpoint, hasInput, inApplyLoop, isDebugMode, isEnabled, isExpanded, isParallel, isParameterSet, log, logError, logNote, logWarning, register, remove, rename, resume, setApplyCount, setBreakpoint, setEnabled, setExpanded, setInput, setListParameter, setOperatorParameters, setParameter, setParameters, setParent, setUserDescription, toString, writeXML
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

PARAMETER_ATTRIBUTE_NAME_REGEX

public static final java.lang.String PARAMETER_ATTRIBUTE_NAME_REGEX
The parameter name for "A regular expression which matches against all attribute names (including special attributes)."

See Also:
Constant Field Values

PARAMETER_INVERT_SELECTION

public static final java.lang.String PARAMETER_INVERT_SELECTION
The parameter name for "Indicates if the attributes which did not match the regular expression should be removed by this operator."

See Also:
Constant Field Values

PARAMETER_PROCESS_SPECIAL_ATTRIBUTES

public static final java.lang.String PARAMETER_PROCESS_SPECIAL_ATTRIBUTES
The parameter name for "Indicates if special attributes like labels etc. should also be processed."

See Also:
Constant Field Values

PARAMETER_DELIVER_INNER_RESULTS

public static final java.lang.String PARAMETER_DELIVER_INNER_RESULTS
The parameter name for "Indicates if the additional results (other than example set) of the inner operator should also be returned."

See Also:
Constant Field Values

PARAMETER_KEEP_SUBSET_ONLY

public static final java.lang.String PARAMETER_KEEP_SUBSET_ONLY
The parameter name for "Indicates if the attributes which did not match the regular expression should be removed by this operator."

See Also:
Constant Field Values
Constructor Detail

AttributeSubsetPreprocessing

public AttributeSubsetPreprocessing(OperatorDescription description)
Method Detail

apply

public IOObject[] apply()
                 throws OperatorException
Description copied from class: OperatorChain
Applies all inner operators. The input to this operator becomes the input of the first inner operator. The latter's output is passed to the second inner operator and so on. Note to subclassers: If subclasses (for example wrappers) want to make use of this method remember to call exactly this method (super.apply()) and do not call super.apply(IOContainer) erroneously which will result in an infinite loop.

Overrides:
apply in class OperatorChain
Returns:
the last inner operator's output or the input itself if the chain is empty.
Throws:
OperatorException

getMaxNumberOfInnerOperators

public int getMaxNumberOfInnerOperators()
Description copied from class: OperatorChain
Returns the maximum number of inner operators.

Specified by:
getMaxNumberOfInnerOperators in class OperatorChain

getMinNumberOfInnerOperators

public int getMinNumberOfInnerOperators()
Description copied from class: OperatorChain
Returns the minimum number of inner operators.

Specified by:
getMinNumberOfInnerOperators in class OperatorChain

getInputClasses

public java.lang.Class<?>[] getInputClasses()
Description copied from class: Operator
Returns the classes that are needed as input. May be null or an empty (no desired input). As default, all delivered input objects are consumed and must be also delivered as output in both Operator.getOutputClasses() and Operator.apply() if this is necessary. This default behavior can be changed by overriding Operator.getInputDescription(Class). Subclasses which implement this method should not make use of parameters since this method is invoked by getParameterTypes(). Therefore, parameters are not fully available at this point of time and this might lead to exceptions. Please use InputDescriptions instead.

Specified by:
getInputClasses in class Operator

getOutputClasses

public java.lang.Class<?>[] getOutputClasses()
Description copied from class: Operator

Returns the classes that are guaranteed to be returned by apply() as additional output. Please note that input objects which should not be consumed must also be defined by this method (e.g. an example set which is changed but not consumed in the case of a preprocessing operator must be defined in both, the methods Operator.getInputClasses() and Operator.getOutputClasses()). The default behavior for input consumation is defined by Operator.getInputDescription(Class) and can be changed by overwriting this method. Objects which are not consumed (defined by changing the implementation in Operator.getInputDescription(Class)) must not be defined as additional output in this method.

May deliver null or an empy array (no additional output is produced or guaranteed). Must return the class array of delivered output objects otherwise.

Specified by:
getOutputClasses in class Operator

getInnerOperatorCondition

public InnerOperatorCondition getInnerOperatorCondition()
Description copied from class: OperatorChain
Must return a condition of the IO behaviour of all desired inner operators. If there are no "special" conditions and the chain works similar to a simple operator chain this method should at least return a SimpleChainInnerOperatorCondition. More than one condition should be combined with help of the class CombinedInnerOperatorCondition.

Specified by:
getInnerOperatorCondition in class OperatorChain

checkIO

public java.lang.Class<?>[] checkIO(java.lang.Class<?>[] input)
                             throws IllegalInputException,
                                    WrongNumberOfInnerOperatorsException
Subclasses will throw an exception if something isn't ok. Returns the output that this operator returns when provided with the given input.

Overrides:
checkIO in class OperatorChain
Throws:
IllegalInputException
WrongNumberOfInnerOperatorsException

getIODescription

protected IODescription getIODescription()
Description copied from class: Operator
If you find the getInputClasses() and getOuputClasses() methods for some reason not useful, you may override this method. Otherwise it returns a default IODescription containing the classes returned by the first.

Overrides:
getIODescription in class Operator

shouldAddNonConsumedInput

protected boolean shouldAddNonConsumedInput()
Overrides:
shouldAddNonConsumedInput in class OperatorChain

getParameterTypes

public java.util.List<ParameterType> getParameterTypes()
Description copied from class: Operator
Returns a list of ParameterTypes describing the parameters of this operator. The default implementation returns an empty list if no input objects can be retained and special parameters for those input objects which can be prevented from being consumed.

Specified by:
getParameterTypes in interface ParameterHandler
Overrides:
getParameterTypes in class Operator


Copyright © 2001-2009 by Rapid-I