com.rapidminer.operator.preprocessing.outlier
Class LOFOutlierOperator

java.lang.Object
  extended by com.rapidminer.tools.AbstractObservable<Operator>
      extended by com.rapidminer.operator.Operator
          extended by com.rapidminer.operator.AbstractExampleSetProcessing
              extended by com.rapidminer.operator.preprocessing.outlier.AbstractOutlierDetection
                  extended by com.rapidminer.operator.preprocessing.outlier.LOFOutlierOperator
All Implemented Interfaces:
ConfigurationListener, PreviewListener, ResourceConsumer, ParameterHandler, LoggingHandler, Observable<Operator>

public class LOFOutlierOperator
extends AbstractOutlierDetection

This operator performs a LOF outlier search. LOF outliers or outliers with a local outlier factor per object are density based outliers according to Breuning, Kriegel, et al.

The approach to find those outliers is based on measuring the density of objects and its relation to each other (referred to as local reachability density). Based on the average ratio of the local reachability density of an object and its k-nearest neighbours (e.g. the objects in its k-distance neighbourhood), a local outlier factor (LOF) is computed. The approach takes a parameter MinPts (actually specifying the "k") and it uses the maximum LOFs for objects in a MinPts range (lower bound and upper bound to MinPts).

Currently, the operator supports cosine, sine or squared distances in addition to the usual euclidian distance which can be specified by the corresponding parameter. In the first step, the objects are grouped into containers. For each object, using a radius screening of all other objects, all the available distances between that object and another object (or group of objects) on the (same) radius given by the distance are associated with a container. That container than has the distance information as well as the list of objects within that distance (usually only a few) and the information, how many objects are in the container.

In the second step, three things are done: (1) The containers for each object are counted in acending order according to the cardinality of the object list within the container (= that distance) to find the k-distances for each object and the objects in that k-distance (all objects in all the subsequent containers with a smaller distance). (2) Using this information, the local reachability densities are computed by using the maximum of the actual distance and the k-distance for each object pair (object and objects in k-distance) and averaging it by the cardinality of the k-neighbourhood and than taking the reciprocal value. (3) The LOF is computed for each MinPts value in the range (actually for all up to upper bound) by averaging the ratio between the MinPts-local reachability-density of all objects in the k-neighbourhood and the object itself. The maximum LOF in the MinPts range is passed as final LOF to each object.

Afterwards LOFs are added as values for a special real-valued outlier attribute in the example set which the operator will return.

Author:
Stephan Deutsch, Ingo Mierswa

Field Summary
static java.lang.String PARAMETER_DISTANCE_FUNCTION
          The parameter name for "choose which distance function will be used for calculating "
static java.lang.String PARAMETER_MINIMAL_POINTS_LOWER_BOUND
          The parameter name for "The lower bound for MinPts for the Outlier test "
static java.lang.String PARAMETER_MINIMAL_POINTS_UPPER_BOUND
          The parameter name for "The upper bound for MinPts for the Outlier test "
 
Constructor Summary
LOFOutlierOperator(OperatorDescription description)
           
 
Method Summary
 ExampleSet apply(ExampleSet eSet)
          This method implements the main functionality of the Operator but can be considered as a sort of wrapper to pass the RapidMiner operator chain data deeper into the SearchSpace class, so do not expect a lot of things happening here.
protected  java.util.Set<java.lang.String> getOutlierValues()
           
 java.util.List<ParameterType> getParameterTypes()
          Returns a list of ParameterTypes describing the parameters of this operator.
 ResourceConsumptionEstimator getResourceConsumptionEstimator()
          Subclasses can override this method if they are able to estimate the consumed resources (CPU time and memory), based on their input.
protected  MetaData modifyMetaData(ExampleSetMetaData metaData)
          Subclasses might override this method to define the meta data transformation performed by this operator.
 
Methods inherited from class com.rapidminer.operator.preprocessing.outlier.AbstractOutlierDetection
writesIntoExistingData
 
Methods inherited from class com.rapidminer.operator.AbstractExampleSetProcessing
doWork, getExampleSetInputPort, getExampleSetOutputPort, getInputPort, getRequiredMetaData, shouldAutoConnect
 
Methods inherited from class com.rapidminer.operator.Operator
acceptsInput, addError, addError, addValue, addWarning, apply, apply, assumePreconditionsSatisfied, checkAll, checkAllExcludingMetaData, checkDeprecations, checkForStop, checkIO, checkProperties, clear, clearErrorList, cloneOperator, collectErrors, createExperimentTree, createExperimentTree, createFromXML, createFromXML, createFromXML, createMarkedExperimentTree, createMarkedProcessTree, createProcessTree, createProcessTree, disconnectPorts, execute, fireUpdate, freeMemory, getAddOnlyAdditionalOutput, getApplyCount, getCompatibilityLevel, getDeliveredOutputClasses, getDeprecationInfo, getDesiredInputClasses, getDOMRepresentation, getEncoding, getErrorList, getExecutionUnit, getExperiment, getIncompatibleVersionChanges, getInput, getInput, getInput, getInputClasses, getInputDescription, getInputPorts, getIODescription, getLog, getLogger, getName, getNumberOfBreakpoints, getOperatorClassName, getOperatorDescription, getOutputClasses, getOutputPorts, getParameter, getParameterAsBoolean, getParameterAsChar, getParameterAsColor, getParameterAsDouble, getParameterAsFile, getParameterAsFile, getParameterAsInputStream, getParameterAsInt, getParameterAsMatrix, getParameterAsRepositoryLocation, getParameterAsString, getParameterHandler, getParameterList, getParameters, getParameterTupel, getParameterType, getParent, getPortOwner, getProcess, getRoot, getStartTime, getTransformer, getUserDescription, getValue, getValues, getXML, getXML, getXML, hasBreakpoint, hasBreakpoint, hasInput, inApplyLoop, isDebugMode, isDirty, isEnabled, isExpanded, isParallel, isParameterSet, isRunning, log, log, logError, logNote, logWarning, lookupOperator, makeDirty, makeDirtyOnUpdate, notifyRenaming, performAdditionalChecks, preAutoWire, processFinished, processStarts, producesOutput, propagateDirtyness, register, registerOperator, remove, removeAndKeepConnections, rename, resume, setBreakpoint, setCompatibilityLevel, setEnabled, setEnclosingProcess, setExpanded, setInput, setListParameter, setPairParameter, setParameter, setParameters, setUserDescription, shouldAutoConnect, shouldStopStandaloneExecution, toString, transformMetaData, unregisterOperator, updateExecutionOrder, walk, writeXML, writeXML
 
Methods inherited from class com.rapidminer.tools.AbstractObservable
addObserver, addObserverAsFirst, fireUpdate, removeObserver
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

PARAMETER_MINIMAL_POINTS_LOWER_BOUND

public static final java.lang.String PARAMETER_MINIMAL_POINTS_LOWER_BOUND
The parameter name for "The lower bound for MinPts for the Outlier test "

See Also:
Constant Field Values

PARAMETER_MINIMAL_POINTS_UPPER_BOUND

public static final java.lang.String PARAMETER_MINIMAL_POINTS_UPPER_BOUND
The parameter name for "The upper bound for MinPts for the Outlier test "

See Also:
Constant Field Values

PARAMETER_DISTANCE_FUNCTION

public static final java.lang.String PARAMETER_DISTANCE_FUNCTION
The parameter name for "choose which distance function will be used for calculating "

See Also:
Constant Field Values
Constructor Detail

LOFOutlierOperator

public LOFOutlierOperator(OperatorDescription description)
Method Detail

apply

public ExampleSet apply(ExampleSet eSet)
                 throws OperatorException
This method implements the main functionality of the Operator but can be considered as a sort of wrapper to pass the RapidMiner operator chain data deeper into the SearchSpace class, so do not expect a lot of things happening here.

Specified by:
apply in class AbstractExampleSetProcessing
Throws:
OperatorException

modifyMetaData

protected MetaData modifyMetaData(ExampleSetMetaData metaData)
Description copied from class: AbstractExampleSetProcessing
Subclasses might override this method to define the meta data transformation performed by this operator.

Overrides:
modifyMetaData in class AbstractOutlierDetection

getParameterTypes

public java.util.List<ParameterType> getParameterTypes()
Description copied from class: Operator
Returns a list of ParameterTypes describing the parameters of this operator. The default implementation returns an empty list if no input objects can be retained and special parameters for those input objects which can be prevented from being consumed. ATTENTION! This will create new parameterTypes. For calling already existing parameter types use getParameters().getParameterTypes();

Specified by:
getParameterTypes in interface ParameterHandler
Overrides:
getParameterTypes in class Operator

getOutlierValues

protected java.util.Set<java.lang.String> getOutlierValues()
Specified by:
getOutlierValues in class AbstractOutlierDetection

getResourceConsumptionEstimator

public ResourceConsumptionEstimator getResourceConsumptionEstimator()
Description copied from class: Operator
Subclasses can override this method if they are able to estimate the consumed resources (CPU time and memory), based on their input. The default implementation returns null.

Specified by:
getResourceConsumptionEstimator in interface ResourceConsumer
Overrides:
getResourceConsumptionEstimator in class Operator


Copyright © 2001-2009 by Rapid-I