com.rapidminer.operator.io
Class ExampleSource

java.lang.Object
  extended by com.rapidminer.tools.AbstractObservable<Operator>
      extended by com.rapidminer.operator.Operator
          extended by com.rapidminer.operator.io.AbstractReader<ExampleSet>
              extended by com.rapidminer.operator.io.AbstractExampleSource
                  extended by com.rapidminer.operator.io.ExampleSource
All Implemented Interfaces:
ConfigurationListener, PreviewListener, ResourceConsumer, ParameterHandler, LoggingHandler, Observable<Operator>

public class ExampleSource
extends AbstractExampleSource

This operator reads an example set from (a) file(s). Probably you can use the default parameter values for the most file formats (including the format produced by the ExampleSetWriter, CSV, ...). Please refer to section First steps/File formats for details on the attribute description file set by the parameter attributes used to specify attribute types. You can use the wizard of this operator or the tool Attribute Editor in order to create those meta data .aml files for your datasets.

This operator supports the reading of data from multiple source files. Each attribute (including special attributes like labels, weights, ...) might be read from another file. Please note that only the minimum number of lines of all files will be read, i.e. if one of the data source files has less lines than the others, only this number of examples will be read.

The split points can be defined with regular expressions (please refer to the annex of the RapidMiner tutorial for an overview). The default split parameter ",\s*|;\s*|\s+" should work for most file formats. This regular expression describes the following column separators

A logical XOR is defined by "|". Other useful separators might be "\t" for tabulars, " " for a single whitespace, and "\s" for any whitespace.

Quoting is also possible with ". You can escape quotes with a backslash, i.e. \". Please note that you can change these characters by adjusting the corresponding settings.

Additionally you can specify comment characters which can be used at arbitrary locations of the data lines. Any content after the comment character will be ignored. Unknown attribute values can be marked with empty strings (if this is possible for your column separators) or by a question mark (recommended).

Author:
Simon Fischer, Ingo Mierswa

Nested Class Summary
 
Nested classes/interfaces inherited from class com.rapidminer.operator.io.AbstractReader
AbstractReader.ReaderDescription
 
Field Summary
static java.lang.String PARAMETER_ATTRIBUTES
          The parameter name for "Filename for the XML attribute description file.
static java.lang.String PARAMETER_COLUMN_SEPARATORS
          The parameter name for "Column separators for data files (regular expression)"
static java.lang.String PARAMETER_COMMENT_CHARS
          The parameter name for "Lines beginning with these characters are ignored.
static java.lang.String PARAMETER_DATAMANAGEMENT
          The parameter name for "Determines, how the data is represented internally.
static java.lang.String PARAMETER_DECIMAL_POINT_CHARACTER
          The parameter name for "Character that is used as decimal point.
static java.lang.String PARAMETER_PERMUTATE
          The parameter name for "Indicates if the loaded data should be permuted.
static java.lang.String PARAMETER_QUOTE_CHARACTER
          Specifies the used quoting character.
static java.lang.String PARAMETER_QUOTING_ESCAPE_CHARACTER
          Specifies the used character for escaping quoting.
static java.lang.String PARAMETER_SAMPLE_RATIO
          The parameter name for "The fraction of the data set which should be read (1 = all; only used if sample_size = -1)"
static java.lang.String PARAMETER_SAMPLE_SIZE
          The parameter name for "The exact number of samples which should be read (-1 = use sample ratio; if not -1, sample_ratio will not have any effect)"
static java.lang.String PARAMETER_SKIP_ERROR_LINES
          Indicates if lines leading to errors should be skipped.
static java.lang.String PARAMETER_TRIM_LINES
          Indicates if the lines should be trimmed during reading.
static java.lang.String PARAMETER_USE_COMMENT_CHARACTERS
          The parameter name for "Indicates if a comment character should be used"
static java.lang.String PARAMETER_USE_QUOTES
          The parameter name for "Indicates if quotes should be regarded (slower!).
 
Constructor Summary
ExampleSource(OperatorDescription description)
           
 
Method Summary
 ExampleSet createExampleSet()
          Creates (or reads) the ExampleSet that will be returned by Operator.apply().
 MetaData getGeneratedMetaData()
           
 java.util.List<ParameterType> getParameterTypes()
          Returns a list of ParameterTypes describing the parameters of this operator.
protected  boolean isMetaDataCacheable()
           
protected  boolean supportsEncoding()
           
 
Methods inherited from class com.rapidminer.operator.io.AbstractExampleSource
read
 
Methods inherited from class com.rapidminer.operator.io.AbstractReader
addAnnotations, canMakeReaderFor, createReader, doWork, getFileParameterForOperator, registerOperator, registerReaderDescription
 
Methods inherited from class com.rapidminer.operator.Operator
acceptsInput, addError, addError, addValue, addWarning, apply, apply, assumePreconditionsSatisfied, checkAll, checkAllExcludingMetaData, checkDeprecations, checkForStop, checkIO, checkProperties, clear, clearErrorList, cloneOperator, collectErrors, createExperimentTree, createExperimentTree, createFromXML, createFromXML, createFromXML, createMarkedExperimentTree, createMarkedProcessTree, createProcessTree, createProcessTree, disconnectPorts, execute, fireUpdate, freeMemory, getAddOnlyAdditionalOutput, getApplyCount, getCompatibilityLevel, getDeliveredOutputClasses, getDeprecationInfo, getDesiredInputClasses, getDOMRepresentation, getEncoding, getErrorList, getExecutionUnit, getExperiment, getIncompatibleVersionChanges, getInput, getInput, getInput, getInputClasses, getInputDescription, getInputPorts, getIODescription, getLog, getLogger, getName, getNumberOfBreakpoints, getOperatorClassName, getOperatorDescription, getOutputClasses, getOutputPorts, getParameter, getParameterAsBoolean, getParameterAsChar, getParameterAsColor, getParameterAsDouble, getParameterAsFile, getParameterAsFile, getParameterAsInputStream, getParameterAsInt, getParameterAsMatrix, getParameterAsRepositoryLocation, getParameterAsString, getParameterHandler, getParameterList, getParameters, getParameterTupel, getParameterType, getParent, getPortOwner, getProcess, getResourceConsumptionEstimator, getRoot, getStartTime, getTransformer, getUserDescription, getValue, getValues, getXML, getXML, getXML, hasBreakpoint, hasBreakpoint, hasInput, inApplyLoop, isDebugMode, isDirty, isEnabled, isExpanded, isParallel, isParameterSet, isRunning, log, log, logError, logNote, logWarning, lookupOperator, makeDirty, makeDirtyOnUpdate, notifyRenaming, performAdditionalChecks, preAutoWire, processFinished, processStarts, producesOutput, propagateDirtyness, register, remove, removeAndKeepConnections, rename, resume, setBreakpoint, setCompatibilityLevel, setEnabled, setEnclosingProcess, setExpanded, setInput, setListParameter, setPairParameter, setParameter, setParameters, setUserDescription, shouldAutoConnect, shouldAutoConnect, shouldStopStandaloneExecution, toString, transformMetaData, unregisterOperator, updateExecutionOrder, walk, writeXML, writeXML
 
Methods inherited from class com.rapidminer.tools.AbstractObservable
addObserver, addObserverAsFirst, fireUpdate, removeObserver
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

PARAMETER_ATTRIBUTES

public static final java.lang.String PARAMETER_ATTRIBUTES
The parameter name for "Filename for the XML attribute description file. This file also contains the names of the files to read the data from."

See Also:
Constant Field Values

PARAMETER_SAMPLE_RATIO

public static final java.lang.String PARAMETER_SAMPLE_RATIO
The parameter name for "The fraction of the data set which should be read (1 = all; only used if sample_size = -1)"

See Also:
Constant Field Values

PARAMETER_SAMPLE_SIZE

public static final java.lang.String PARAMETER_SAMPLE_SIZE
The parameter name for "The exact number of samples which should be read (-1 = use sample ratio; if not -1, sample_ratio will not have any effect)"

See Also:
Constant Field Values

PARAMETER_PERMUTATE

public static final java.lang.String PARAMETER_PERMUTATE
The parameter name for "Indicates if the loaded data should be permuted."

See Also:
Constant Field Values

PARAMETER_COLUMN_SEPARATORS

public static final java.lang.String PARAMETER_COLUMN_SEPARATORS
The parameter name for "Column separators for data files (regular expression)"

See Also:
Constant Field Values

PARAMETER_USE_COMMENT_CHARACTERS

public static final java.lang.String PARAMETER_USE_COMMENT_CHARACTERS
The parameter name for "Indicates if a comment character should be used"

See Also:
Constant Field Values

PARAMETER_COMMENT_CHARS

public static final java.lang.String PARAMETER_COMMENT_CHARS
The parameter name for "Lines beginning with these characters are ignored."

See Also:
Constant Field Values

PARAMETER_DECIMAL_POINT_CHARACTER

public static final java.lang.String PARAMETER_DECIMAL_POINT_CHARACTER
The parameter name for "Character that is used as decimal point."

See Also:
Constant Field Values

PARAMETER_USE_QUOTES

public static final java.lang.String PARAMETER_USE_QUOTES
The parameter name for "Indicates if quotes should be regarded (slower!)."

See Also:
Constant Field Values

PARAMETER_QUOTE_CHARACTER

public static final java.lang.String PARAMETER_QUOTE_CHARACTER
Specifies the used quoting character.

See Also:
Constant Field Values

PARAMETER_QUOTING_ESCAPE_CHARACTER

public static final java.lang.String PARAMETER_QUOTING_ESCAPE_CHARACTER
Specifies the used character for escaping quoting.

See Also:
Constant Field Values

PARAMETER_TRIM_LINES

public static final java.lang.String PARAMETER_TRIM_LINES
Indicates if the lines should be trimmed during reading.

See Also:
Constant Field Values

PARAMETER_SKIP_ERROR_LINES

public static final java.lang.String PARAMETER_SKIP_ERROR_LINES
Indicates if lines leading to errors should be skipped.

See Also:
Constant Field Values

PARAMETER_DATAMANAGEMENT

public static final java.lang.String PARAMETER_DATAMANAGEMENT
The parameter name for "Determines, how the data is represented internally."

See Also:
Constant Field Values
Constructor Detail

ExampleSource

public ExampleSource(OperatorDescription description)
Method Detail

getGeneratedMetaData

public MetaData getGeneratedMetaData()
                              throws OperatorException
Overrides:
getGeneratedMetaData in class AbstractExampleSource
Throws:
OperatorException

isMetaDataCacheable

protected boolean isMetaDataCacheable()
Overrides:
isMetaDataCacheable in class AbstractReader<ExampleSet>

createExampleSet

public ExampleSet createExampleSet()
                            throws OperatorException
Description copied from class: AbstractExampleSource
Creates (or reads) the ExampleSet that will be returned by Operator.apply().

Specified by:
createExampleSet in class AbstractExampleSource
Throws:
OperatorException

supportsEncoding

protected boolean supportsEncoding()
Overrides:
supportsEncoding in class AbstractReader<ExampleSet>

getParameterTypes

public java.util.List<ParameterType> getParameterTypes()
Description copied from class: Operator
Returns a list of ParameterTypes describing the parameters of this operator. The default implementation returns an empty list if no input objects can be retained and special parameters for those input objects which can be prevented from being consumed. ATTENTION! This will create new parameterTypes. For calling already existing parameter types use getParameters().getParameterTypes();

Specified by:
getParameterTypes in interface ParameterHandler
Overrides:
getParameterTypes in class AbstractReader<ExampleSet>


Copyright © 2001-2009 by Rapid-I