com.rapidminer.operator.io
Class SimpleExampleSource

java.lang.Object
  extended by com.rapidminer.operator.Operator
      extended by com.rapidminer.operator.io.AbstractReader<ExampleSet>
          extended by com.rapidminer.operator.io.AbstractExampleSource
              extended by com.rapidminer.operator.io.SimpleExampleSource
All Implemented Interfaces:
ConfigurationListener, PreviewListener, ParameterHandler, LoggingHandler
Direct Known Subclasses:
CSVExampleSource

public class SimpleExampleSource
extends AbstractExampleSource

This operator reads an example set from (a) file(s). Probably you can use the default parameter values for the most file formats (including the format produced by the ExampleSetWriter, CSV, ...). In fact, in many cases this operator is more appropriate for CSV based file formats than the CSVExampleSource operator itself since you can better control some of the necessary settings like column separators etc.

In contrast to the usual ExampleSource operator this operator is able to read the attribute names from the first line of the data file. However, there is one restriction: the data can only be read from one file instead of multiple files. If you need a fully flexible operator for data loading you should use the more powerful ExampleSource operator which also provides more parameters tuning for example the quoting mechanism and other specialized settings.

The column split points can be defined with regular expressions (please refer to the annex of the RapidMiner tutorial). The default split parameter ",\s*|;\s*|\s+" should work for most file formats. This regular expression describes the following column separators

A logical XOR is defined by "|". Other useful separators might be "\t" for tabulars, " " for a single whitespace, and "\s" for any whitespace.

Quoting is also possible with ". Escaping a quote is done with \". Additionally you can specify comment characters which can be used at arbitrary locations of the data lines and will skip the remaining part of the lines. Unknown attribute values can be marked with empty strings or a question mark.

Author:
Ingo Mierswa
Keywords:
csv

Field Summary
static java.lang.String PARAMETER_COLUMN_SEPARATORS
           
static java.lang.String PARAMETER_COMMENT_CHARS
          The parameter name for "Lines beginning with these characters are ignored.
static java.lang.String PARAMETER_DATAMANAGEMENT
          The parameter name for "Determines, how the data is represented internally.
static java.lang.String PARAMETER_DECIMAL_POINT_CHARACTER
          The parameter name for "Character that is used as decimal point.
static java.lang.String PARAMETER_FILENAME
           
static java.lang.String PARAMETER_ID_COLUMN
          The parameter name for "Column number of the id attribute (only used if id_name is empty; 0 = none; negative values are counted from the last column)"
static java.lang.String PARAMETER_ID_NAME
          The parameter name for "Name of the id attribute (if empty, the column defined by id_column will be used)"
static java.lang.String PARAMETER_LABEL_COLUMN
          The parameter name for "Column number of the label attribute (only used if label_name is empty; 0 = none; negative values are counted from the last column)"
static java.lang.String PARAMETER_LABEL_NAME
          The parameter name for "Name of the label attribute (if empty, the column defined by label_column will be used)"
static java.lang.String PARAMETER_READ_ATTRIBUTE_NAMES
           
static java.lang.String PARAMETER_SAMPLE_RATIO
          The parameter name for "The fraction of the data set which should be read (1 = all; only used if sample_size = -1)"
static java.lang.String PARAMETER_SAMPLE_SIZE
          The parameter name for "The exact number of samples which should be read (-1 = use sample ratio; if not -1, sample_ratio will not have any effect)"
static java.lang.String PARAMETER_SKIP_ERROR_LINES
           
static java.lang.String PARAMETER_TRIM_LINES
           
static java.lang.String PARAMETER_USE_COMMENT_CHARACTERS
          The parameter name for "Indicates if a comment character should be used"
static java.lang.String PARAMETER_USE_QUOTES
           
static java.lang.String PARAMETER_WEIGHT_COLUMN
          The parameter name for "Column number of the weight attribute (only used if weight_name is empty; 0 = none, negative values are counted from the last column)"
static java.lang.String PARAMETER_WEIGHT_NAME
          The parameter name for "Name of the weight attribute (if empty, the column defined by weight_column will be used)"
 
Constructor Summary
SimpleExampleSource(OperatorDescription description)
           
 
Method Summary
 ExampleSet createExampleSet()
          Creates (or reads) the ExampleSet that will be returned by AbstractReader.apply().
 java.util.List<ParameterType> getParameterTypes()
          Returns a list of ParameterTypes describing the parameters of this operator.
 
Methods inherited from class com.rapidminer.operator.io.AbstractExampleSource
read
 
Methods inherited from class com.rapidminer.operator.io.AbstractReader
apply, getInputClasses, getOutputClasses
 
Methods inherited from class com.rapidminer.operator.Operator
addError, addValue, addWarning, apply, checkDeprecations, checkForStop, checkIO, checkProperties, clearErrorList, cloneOperator, createExperimentTree, createExperimentTree, createFromXML, createMarkedExperimentTree, createMarkedProcessTree, createProcessTree, createProcessTree, getAddOnlyAdditionalOutput, getApplyCount, getDeliveredOutputClasses, getDeprecationInfo, getDesiredInputClasses, getEncoding, getErrorList, getExperiment, getInnerOperatorsXML, getInput, getInput, getInput, getInputDescription, getIOContainerForInApplyLoopBreakpoint, getIODescription, getLog, getName, getOperatorClassName, getOperatorDescription, getParameter, getParameterAsBoolean, getParameterAsColor, getParameterAsDouble, getParameterAsFile, getParameterAsFile, getParameterAsInputStream, getParameterAsInt, getParameterAsMatrix, getParameterAsString, getParameterList, getParameters, getParameterType, getParent, getProcess, getStartTime, getStatus, getUserDescription, getValue, getValues, getXML, hasBreakpoint, hasBreakpoint, hasInput, inApplyLoop, isDebugMode, isEnabled, isExpanded, isParallel, isParameterSet, log, logError, logNote, logWarning, performAdditionalChecks, processFinished, processStarts, register, registerOperator, remove, rename, resume, setApplyCount, setBreakpoint, setEnabled, setExpanded, setInput, setListParameter, setOperatorParameters, setParameter, setParameters, setParent, setUserDescription, toString, unregisterOperator, writeXML
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

PARAMETER_LABEL_NAME

public static final java.lang.String PARAMETER_LABEL_NAME
The parameter name for "Name of the label attribute (if empty, the column defined by label_column will be used)"

See Also:
Constant Field Values

PARAMETER_LABEL_COLUMN

public static final java.lang.String PARAMETER_LABEL_COLUMN
The parameter name for "Column number of the label attribute (only used if label_name is empty; 0 = none; negative values are counted from the last column)"

See Also:
Constant Field Values

PARAMETER_ID_NAME

public static final java.lang.String PARAMETER_ID_NAME
The parameter name for "Name of the id attribute (if empty, the column defined by id_column will be used)"

See Also:
Constant Field Values

PARAMETER_ID_COLUMN

public static final java.lang.String PARAMETER_ID_COLUMN
The parameter name for "Column number of the id attribute (only used if id_name is empty; 0 = none; negative values are counted from the last column)"

See Also:
Constant Field Values

PARAMETER_WEIGHT_NAME

public static final java.lang.String PARAMETER_WEIGHT_NAME
The parameter name for "Name of the weight attribute (if empty, the column defined by weight_column will be used)"

See Also:
Constant Field Values

PARAMETER_WEIGHT_COLUMN

public static final java.lang.String PARAMETER_WEIGHT_COLUMN
The parameter name for "Column number of the weight attribute (only used if weight_name is empty; 0 = none, negative values are counted from the last column)"

See Also:
Constant Field Values

PARAMETER_SAMPLE_RATIO

public static final java.lang.String PARAMETER_SAMPLE_RATIO
The parameter name for "The fraction of the data set which should be read (1 = all; only used if sample_size = -1)"

See Also:
Constant Field Values

PARAMETER_SAMPLE_SIZE

public static final java.lang.String PARAMETER_SAMPLE_SIZE
The parameter name for "The exact number of samples which should be read (-1 = use sample ratio; if not -1, sample_ratio will not have any effect)"

See Also:
Constant Field Values

PARAMETER_DATAMANAGEMENT

public static final java.lang.String PARAMETER_DATAMANAGEMENT
The parameter name for "Determines, how the data is represented internally."

See Also:
Constant Field Values

PARAMETER_USE_COMMENT_CHARACTERS

public static final java.lang.String PARAMETER_USE_COMMENT_CHARACTERS
The parameter name for "Indicates if a comment character should be used"

See Also:
Constant Field Values

PARAMETER_COMMENT_CHARS

public static final java.lang.String PARAMETER_COMMENT_CHARS
The parameter name for "Lines beginning with these characters are ignored."

See Also:
Constant Field Values

PARAMETER_DECIMAL_POINT_CHARACTER

public static final java.lang.String PARAMETER_DECIMAL_POINT_CHARACTER
The parameter name for "Character that is used as decimal point."

See Also:
Constant Field Values

PARAMETER_FILENAME

public static final java.lang.String PARAMETER_FILENAME
See Also:
Constant Field Values

PARAMETER_READ_ATTRIBUTE_NAMES

public static final java.lang.String PARAMETER_READ_ATTRIBUTE_NAMES
See Also:
Constant Field Values

PARAMETER_USE_QUOTES

public static final java.lang.String PARAMETER_USE_QUOTES
See Also:
Constant Field Values

PARAMETER_TRIM_LINES

public static final java.lang.String PARAMETER_TRIM_LINES
See Also:
Constant Field Values

PARAMETER_SKIP_ERROR_LINES

public static final java.lang.String PARAMETER_SKIP_ERROR_LINES
See Also:
Constant Field Values

PARAMETER_COLUMN_SEPARATORS

public static final java.lang.String PARAMETER_COLUMN_SEPARATORS
See Also:
Constant Field Values
Constructor Detail

SimpleExampleSource

public SimpleExampleSource(OperatorDescription description)
Method Detail

createExampleSet

public ExampleSet createExampleSet()
                            throws OperatorException
Description copied from class: AbstractExampleSource
Creates (or reads) the ExampleSet that will be returned by AbstractReader.apply().

Specified by:
createExampleSet in class AbstractExampleSource
Throws:
OperatorException

getParameterTypes

public java.util.List<ParameterType> getParameterTypes()
Description copied from class: Operator
Returns a list of ParameterTypes describing the parameters of this operator. The default implementation returns an empty list if no input objects can be retained and special parameters for those input objects which can be prevented from being consumed.

Specified by:
getParameterTypes in interface ParameterHandler
Overrides:
getParameterTypes in class Operator


Copyright © 2001-2009 by Rapid-I