com.rapidminer.operator.preprocessing.outlier
Class EcodbOperator

java.lang.Object
  extended by com.rapidminer.tools.AbstractObservable<Operator>
      extended by com.rapidminer.operator.Operator
          extended by com.rapidminer.operator.AbstractExampleSetProcessing
              extended by com.rapidminer.operator.preprocessing.outlier.AbstractOutlierDetection
                  extended by com.rapidminer.operator.preprocessing.outlier.EcodbOperator
All Implemented Interfaces:
ConfigurationListener, PreviewListener, ResourceConsumer, ParameterHandler, LoggingHandler, Observable<Operator>

public class EcodbOperator
extends AbstractOutlierDetection

This operator performs a Class Outlier Factor (COF) search. COF outliers (or Class Outliers method) search for observations (objects) those that arouse suspicions, taking into account the class labels according to the definition of Class Outlier by Hewaihi and Saad in "A comparative Study of Outlier Mining and Class Outlier Mining", CS Letters, Vol 1, No 1 (2009)", and "Class Outliers Mining: Distance-Based Approach", International Journal of Intelligent Systems and Technologies, Vol. 2, No. 1, pp 55-68, 2007".

It detects rare / exceptional / suspicious cases with respect group of similar cases. The main key factors of computing COF are the probability of the instance�s class among its neighbors�s classes, the deviation of the instance from the instances of the same class, and the distance between the instance and its k nearest neighbors.

The main concept of ECODB (Enhanced Class Outlier - Distance Based) algorithm is to rank each instance in the dataset D given the parameters N (top N class outliers), and K (the number of nearest neighbors. The Rank finds out the rank of each instance using the formula (COF = PCL(T,K) - norm(deviation(T)) + norm(kDist(T))). where PCL(T,K) is the Probability of the class label of the instance T with respect to the class labels of its K Nearest Neighbors. and norm(Deviation(T)) and norm(KDist(T)) are the normalized value of Deviation(T) and KDist(T) respectively and their value fall into the range [0 - 1]. Deviation(T) is how much the instance T deviates from instances of the same class, and computed by summing the distances between the instance T and every instance belong to the same class of the instance. KDist(T) is the summation of distances between the instance T and its K nearest neighbors.

The ECODB algorithm maintains a list of only the instances of the top N class outliers. The less is the value of COF of an instance, the higher is the priority of the instance to be a class outlier.

The operator supports mixed euclidian distance. The Operator takes an example set and passes it on with an boolean top-n COF outlier status in a new boolean-valued special outlier attribute indicating true (outlier) and false (no outlier), and another special attribute "COF Factor" which measures the degree of being Class Outlier for an object.

Author:
Motaz K. Saad

Field Summary
static java.lang.String PARAMETER_NUMBER_OF_Class_OUTLIERS
          The parameter name for "The number of top-n Class Outliers to be looked for.
static java.lang.String PARAMETER_NUMBER_OF_NEIGHBORS
          The parameter name for "Specifies the k value for the k-th nearest neighbours to be the analyzed.
 
Constructor Summary
EcodbOperator(OperatorDescription description)
           
 
Method Summary
 ExampleSet apply(ExampleSet eSet)
          This method implements the main functionality of the Operator but can be considered as a sort of wrapper to pass the RapidMiner operator chain data deeper into the search space class, so do not expect a lot of things happening here.
protected  java.util.Set<java.lang.String> getOutlierValues()
           
 java.util.List<ParameterType> getParameterTypes()
          Returns a list of ParameterTypes describing the parameters of this operator.
 ResourceConsumptionEstimator getResourceConsumptionEstimator()
          Subclasses can override this method if they are able to estimate the consumed resources (CPU time and memory), based on their input.
 
Methods inherited from class com.rapidminer.operator.preprocessing.outlier.AbstractOutlierDetection
modifyMetaData, writesIntoExistingData
 
Methods inherited from class com.rapidminer.operator.AbstractExampleSetProcessing
doWork, getExampleSetInputPort, getExampleSetOutputPort, getInputPort, getRequiredMetaData, shouldAutoConnect
 
Methods inherited from class com.rapidminer.operator.Operator
acceptsInput, addError, addError, addValue, addWarning, apply, apply, assumePreconditionsSatisfied, checkAll, checkAllExcludingMetaData, checkDeprecations, checkForStop, checkIO, checkProperties, clear, clearErrorList, cloneOperator, collectErrors, createExperimentTree, createExperimentTree, createFromXML, createFromXML, createFromXML, createMarkedExperimentTree, createMarkedProcessTree, createProcessTree, createProcessTree, disconnectPorts, execute, fireUpdate, freeMemory, getAddOnlyAdditionalOutput, getApplyCount, getCompatibilityLevel, getDeliveredOutputClasses, getDeprecationInfo, getDesiredInputClasses, getDOMRepresentation, getEncoding, getErrorList, getExecutionUnit, getExperiment, getIncompatibleVersionChanges, getInput, getInput, getInput, getInputClasses, getInputDescription, getInputPorts, getIODescription, getLog, getLogger, getName, getNumberOfBreakpoints, getOperatorClassName, getOperatorDescription, getOutputClasses, getOutputPorts, getParameter, getParameterAsBoolean, getParameterAsChar, getParameterAsColor, getParameterAsDouble, getParameterAsFile, getParameterAsFile, getParameterAsInputStream, getParameterAsInt, getParameterAsMatrix, getParameterAsRepositoryLocation, getParameterAsString, getParameterHandler, getParameterList, getParameters, getParameterTupel, getParameterType, getParent, getPortOwner, getProcess, getRoot, getStartTime, getTransformer, getUserDescription, getValue, getValues, getXML, getXML, getXML, hasBreakpoint, hasBreakpoint, hasInput, inApplyLoop, isDebugMode, isDirty, isEnabled, isExpanded, isParallel, isParameterSet, isRunning, log, log, logError, logNote, logWarning, lookupOperator, makeDirty, makeDirtyOnUpdate, notifyRenaming, performAdditionalChecks, preAutoWire, processFinished, processStarts, producesOutput, propagateDirtyness, register, registerOperator, remove, removeAndKeepConnections, rename, resume, setBreakpoint, setCompatibilityLevel, setEnabled, setEnclosingProcess, setExpanded, setInput, setListParameter, setPairParameter, setParameter, setParameters, setUserDescription, shouldAutoConnect, shouldStopStandaloneExecution, toString, transformMetaData, unregisterOperator, updateExecutionOrder, walk, writeXML, writeXML
 
Methods inherited from class com.rapidminer.tools.AbstractObservable
addObserver, addObserverAsFirst, fireUpdate, removeObserver
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

PARAMETER_NUMBER_OF_NEIGHBORS

public static final java.lang.String PARAMETER_NUMBER_OF_NEIGHBORS
The parameter name for "Specifies the k value for the k-th nearest neighbours to be the analyzed."

See Also:
Constant Field Values

PARAMETER_NUMBER_OF_Class_OUTLIERS

public static final java.lang.String PARAMETER_NUMBER_OF_Class_OUTLIERS
The parameter name for "The number of top-n Class Outliers to be looked for."

See Also:
Constant Field Values
Constructor Detail

EcodbOperator

public EcodbOperator(OperatorDescription description)
Method Detail

apply

public ExampleSet apply(ExampleSet eSet)
                 throws OperatorException
This method implements the main functionality of the Operator but can be considered as a sort of wrapper to pass the RapidMiner operator chain data deeper into the search space class, so do not expect a lot of things happening here.

Specified by:
apply in class AbstractExampleSetProcessing
Throws:
OperatorException

getParameterTypes

public java.util.List<ParameterType> getParameterTypes()
Description copied from class: Operator
Returns a list of ParameterTypes describing the parameters of this operator. The default implementation returns an empty list if no input objects can be retained and special parameters for those input objects which can be prevented from being consumed. ATTENTION! This will create new parameterTypes. For calling already existing parameter types use getParameters().getParameterTypes();

Specified by:
getParameterTypes in interface ParameterHandler
Overrides:
getParameterTypes in class Operator

getOutlierValues

protected java.util.Set<java.lang.String> getOutlierValues()
Specified by:
getOutlierValues in class AbstractOutlierDetection

getResourceConsumptionEstimator

public ResourceConsumptionEstimator getResourceConsumptionEstimator()
Description copied from class: Operator
Subclasses can override this method if they are able to estimate the consumed resources (CPU time and memory), based on their input. The default implementation returns null.

Specified by:
getResourceConsumptionEstimator in interface ResourceConsumer
Overrides:
getResourceConsumptionEstimator in class Operator


Copyright © 2001-2009 by Rapid-I