com.rapidminer.operator.io
Class C45ExampleSource

java.lang.Object
  extended by com.rapidminer.tools.AbstractObservable<Operator>
      extended by com.rapidminer.operator.Operator
          extended by com.rapidminer.operator.io.AbstractReader<ExampleSet>
              extended by com.rapidminer.operator.io.AbstractExampleSource
                  extended by com.rapidminer.operator.io.C45ExampleSource
All Implemented Interfaces:
ConfigurationListener, PreviewListener, ResourceConsumer, ParameterHandler, LoggingHandler, Observable<Operator>

public class C45ExampleSource
extends AbstractExampleSource

Loads data given in C4.5 format (names and data file). Both files must be in the same directory. You can specify one of the C4.5 files (either the data or the names file) or only the filestem.

For a dataset named "foo", you will have two files: foo.data and foo.names. The .names file describes the dataset, while the .data file contains the examples which make up the dataset.

The files contain series of identifiers and numbers with some surrounding syntax. A | (vertical bar) means that the remainder of the line should be ignored as a comment. Each identifier consists of a string of characters that does not include comma, question mark or colon. Embedded whitespce is also permitted but multiple whitespace is replaced by a single space.

The .names file contains a series of entries that describe the classes, attributes and values of the dataset. Each entry can be terminated with a period, but the period can be omited if it would have been the last thing on a line. The first entry in the file lists the names of the classes, separated by commas. Each successive line then defines an attribute, in the order in which they will appear in the .data file, with the following format:

   attribute-name : attribute-type
 

The attribute-name is an identifier as above, followed by a colon, then the attribute type which must be one of

Here is an example .names file:

   good, bad.
   dur: continuous.
   wage1: continuous.
   wage2: continuous.
   wage3: continuous.
   cola: tc, none, tcf.
   hours: continuous.
   pension: empl_contr, ret_allw, none.
   stby_pay: continuous.
   shift_diff: continuous.
   educ_allw: yes, no.
   ...
 

Foo.data contains the training examples in the following format: one example per line, attribute values separated by commas, class last, missing values represented by "?". For example:

   2,5.0,4.0,?,none,37,?,?,5,no,11,below_average,yes,full,yes,full,good
   3,2.0,2.5,?,?,35,none,?,?,?,10,average,?,?,yes,full,bad
   3,4.5,4.5,5.0,none,40,?,?,?,no,11,average,?,half,?,?,good
   3,3.0,2.0,2.5,tc,40,none,?,5,no,10,below_average,yes,half,yes,full,bad
   ...
 

Author:
Ingo Mierswa

Nested Class Summary
 
Nested classes/interfaces inherited from class com.rapidminer.operator.io.AbstractReader
AbstractReader.ReaderDescription
 
Field Summary
static java.lang.String PARAMETER_C45_FILESTEM
          The parameter name for "The path to either the C4.5 names file, the data file, or the filestem (without extensions).
static java.lang.String PARAMETER_DATAMANAGEMENT
          The parameter name for "Determines, how the data is represented internally.
static java.lang.String PARAMETER_DECIMAL_POINT_CHARACTER
          The parameter name for "Character that is used as decimal point.
 
Constructor Summary
C45ExampleSource(OperatorDescription description)
           
 
Method Summary
 ExampleSet createExampleSet()
          Creates (or reads) the ExampleSet that will be returned by Operator.apply().
 java.util.List<ParameterType> getParameterTypes()
          Returns a list of ParameterTypes describing the parameters of this operator.
protected  boolean supportsEncoding()
           
 
Methods inherited from class com.rapidminer.operator.io.AbstractExampleSource
getGeneratedMetaData, read
 
Methods inherited from class com.rapidminer.operator.io.AbstractReader
addAnnotations, canMakeReaderFor, createReader, doWork, getFileParameterForOperator, isMetaDataCacheable, registerOperator, registerReaderDescription
 
Methods inherited from class com.rapidminer.operator.Operator
acceptsInput, addError, addError, addValue, addWarning, apply, apply, assumePreconditionsSatisfied, checkAll, checkAllExcludingMetaData, checkDeprecations, checkForStop, checkIO, checkProperties, clear, clearErrorList, cloneOperator, collectErrors, createExperimentTree, createExperimentTree, createFromXML, createFromXML, createFromXML, createMarkedExperimentTree, createMarkedProcessTree, createProcessTree, createProcessTree, disconnectPorts, execute, fireUpdate, freeMemory, getAddOnlyAdditionalOutput, getApplyCount, getCompatibilityLevel, getDeliveredOutputClasses, getDeprecationInfo, getDesiredInputClasses, getDOMRepresentation, getEncoding, getErrorList, getExecutionUnit, getExperiment, getIncompatibleVersionChanges, getInput, getInput, getInput, getInputClasses, getInputDescription, getInputPorts, getIODescription, getLog, getLogger, getName, getNumberOfBreakpoints, getOperatorClassName, getOperatorDescription, getOutputClasses, getOutputPorts, getParameter, getParameterAsBoolean, getParameterAsChar, getParameterAsColor, getParameterAsDouble, getParameterAsFile, getParameterAsFile, getParameterAsInputStream, getParameterAsInt, getParameterAsMatrix, getParameterAsRepositoryLocation, getParameterAsString, getParameterHandler, getParameterList, getParameters, getParameterTupel, getParameterType, getParent, getPortOwner, getProcess, getResourceConsumptionEstimator, getRoot, getStartTime, getTransformer, getUserDescription, getValue, getValues, getXML, getXML, getXML, hasBreakpoint, hasBreakpoint, hasInput, inApplyLoop, isDebugMode, isDirty, isEnabled, isExpanded, isParallel, isParameterSet, isRunning, log, log, logError, logNote, logWarning, lookupOperator, makeDirty, makeDirtyOnUpdate, notifyRenaming, performAdditionalChecks, preAutoWire, processFinished, processStarts, producesOutput, propagateDirtyness, register, remove, removeAndKeepConnections, rename, resume, setBreakpoint, setCompatibilityLevel, setEnabled, setEnclosingProcess, setExpanded, setInput, setListParameter, setPairParameter, setParameter, setParameters, setUserDescription, shouldAutoConnect, shouldAutoConnect, shouldStopStandaloneExecution, toString, transformMetaData, unregisterOperator, updateExecutionOrder, walk, writeXML, writeXML
 
Methods inherited from class com.rapidminer.tools.AbstractObservable
addObserver, addObserverAsFirst, fireUpdate, removeObserver
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

PARAMETER_C45_FILESTEM

public static final java.lang.String PARAMETER_C45_FILESTEM
The parameter name for "The path to either the C4.5 names file, the data file, or the filestem (without extensions). Both files must be in the same directory."

See Also:
Constant Field Values

PARAMETER_DATAMANAGEMENT

public static final java.lang.String PARAMETER_DATAMANAGEMENT
The parameter name for "Determines, how the data is represented internally."

See Also:
Constant Field Values

PARAMETER_DECIMAL_POINT_CHARACTER

public static final java.lang.String PARAMETER_DECIMAL_POINT_CHARACTER
The parameter name for "Character that is used as decimal point."

See Also:
Constant Field Values
Constructor Detail

C45ExampleSource

public C45ExampleSource(OperatorDescription description)
Method Detail

createExampleSet

public ExampleSet createExampleSet()
                            throws OperatorException
Description copied from class: AbstractExampleSource
Creates (or reads) the ExampleSet that will be returned by Operator.apply().

Specified by:
createExampleSet in class AbstractExampleSource
Throws:
OperatorException

supportsEncoding

protected boolean supportsEncoding()
Overrides:
supportsEncoding in class AbstractReader<ExampleSet>

getParameterTypes

public java.util.List<ParameterType> getParameterTypes()
Description copied from class: Operator
Returns a list of ParameterTypes describing the parameters of this operator. The default implementation returns an empty list if no input objects can be retained and special parameters for those input objects which can be prevented from being consumed. ATTENTION! This will create new parameterTypes. For calling already existing parameter types use getParameters().getParameterTypes();

Specified by:
getParameterTypes in interface ParameterHandler
Overrides:
getParameterTypes in class AbstractReader<ExampleSet>


Copyright © 2001-2009 by Rapid-I