com.rapidminer.operator.io
Class SimpleExampleSource

java.lang.Object
  extended by com.rapidminer.tools.AbstractObservable<Operator>
      extended by com.rapidminer.operator.Operator
          extended by com.rapidminer.operator.io.AbstractReader<ExampleSet>
              extended by com.rapidminer.operator.io.AbstractExampleSource
                  extended by com.rapidminer.operator.io.SimpleExampleSource
All Implemented Interfaces:
ConfigurationListener, PreviewListener, ResourceConsumer, ParameterHandler, LoggingHandler, Observable<Operator>
Direct Known Subclasses:
CSVExampleSource

Deprecated.

@Deprecated
public class SimpleExampleSource
extends AbstractExampleSource

This operator reads an example set from (a) file(s). Probably you can use the default parameter values for the most file formats (including the format produced by the ExampleSetWriter, CSV, ...). In fact, in many cases this operator is more appropriate for CSV based file formats than the CSVExampleSource operator itself since you can better control some of the necessary settings like column separators etc.

In contrast to the usual ExampleSource operator this operator is able to read the attribute names from the first line of the data file. However, there is one restriction: the data can only be read from one file instead of multiple files. If you need a fully flexible operator for data loading you should use the more powerful ExampleSource operator which also provides more parameters tuning for example the quoting mechanism and other specialized settings.

The column split points can be defined with regular expressions (please refer to the annex of the RapidMiner tutorial). The default split parameter ",\s*|;\s*|\s+" should work for most file formats. This regular expression describes the following column separators

A logical XOR is defined by "|". Other useful separators might be "\t" for tabulars, " " for a single whitespace, and "\s" for any whitespace.

Quoting is also possible with ". Escaping a quote is done with \". Additionally you can specify comment characters which can be used at arbitrary locations of the data lines and will skip the remaining part of the lines. Unknown attribute values can be marked with empty strings or a question mark.

Author:
Ingo Mierswa
Keywords:
csv

Nested Class Summary
 
Nested classes/interfaces inherited from class com.rapidminer.operator.io.AbstractReader
AbstractReader.ReaderDescription
 
Field Summary
static java.lang.String PARAMETER_COLUMN_SEPARATORS
          Deprecated.  
static java.lang.String PARAMETER_COMMENT_CHARS
          Deprecated.  
static java.lang.String PARAMETER_FILENAME
          Deprecated.  
static java.lang.String PARAMETER_ID_COLUMN
          Deprecated. The parameter name for "Column number of the id attribute (only used if id_name is empty; 0 = none; negative values are counted from the last column)"
static java.lang.String PARAMETER_ID_NAME
          Deprecated. The parameter name for "Name of the id attribute (if empty, the column defined by id_column will be used)"
static java.lang.String PARAMETER_LABEL_COLUMN
          Deprecated. The parameter name for "Column number of the label attribute (only used if label_name is empty; 0 = none; negative values are counted from the last column)"
static java.lang.String PARAMETER_LABEL_NAME
          Deprecated. The parameter name for "Name of the label attribute (if empty, the column defined by label_column will be used)"
static java.lang.String PARAMETER_QUOTES_CHARACTER
          Deprecated.  
static java.lang.String PARAMETER_SAMPLE_RATIO
          Deprecated. The parameter name for "The fraction of the data set which should be read (1 = all; only used if sample_size = -1)"
static java.lang.String PARAMETER_SAMPLE_SIZE
          Deprecated. The parameter name for "The exact number of samples which should be read (-1 = use sample ratio; if not -1, sample_ratio will not have any effect)"
static java.lang.String PARAMETER_SKIP_COMMENTS
          Deprecated.  
static java.lang.String PARAMETER_TRIM_LINES
          Deprecated.  
static java.lang.String PARAMETER_USE_FIRST_ROW_AS_ATTRIBUTE_NAMES
          Deprecated.  
static java.lang.String PARAMETER_USE_QUOTES
          Deprecated.  
static java.lang.String PARAMETER_WEIGHT_COLUMN
          Deprecated. The parameter name for "Column number of the weight attribute (only used if weight_name is empty; 0 = none, negative values are counted from the last column)"
static java.lang.String PARAMETER_WEIGHT_NAME
          Deprecated. The parameter name for "Name of the weight attribute (if empty, the column defined by weight_column will be used)"
 
Constructor Summary
SimpleExampleSource(OperatorDescription description)
          Deprecated.  
 
Method Summary
 ExampleSet createExampleSet()
          Deprecated. Creates (or reads) the ExampleSet that will be returned by Operator.apply().
static ExampleSet createExampleSet(java.io.File file, boolean firstRowAsColumnNames, double sampleRatio, int maxLines, java.lang.String separatorRegExpr, char[] comments, int dataRowType, boolean useQuotes, boolean trimLines, boolean skipErrorLines, char decimalPointCharacter, java.nio.charset.Charset encoding, java.lang.String labelName, int labelColumn, java.lang.String idName, int idColumn, java.lang.String weightName, int weightColumn)
          Deprecated.  
 CSVFileReader createReader(java.io.File file)
          Deprecated.  
 MetaData getGeneratedMetaData()
          Deprecated.  
 java.util.List<ParameterType> getParameterTypes()
          Deprecated. Returns a list of ParameterTypes describing the parameters of this operator.
 
Methods inherited from class com.rapidminer.operator.io.AbstractExampleSource
read
 
Methods inherited from class com.rapidminer.operator.io.AbstractReader
addAnnotations, canMakeReaderFor, createReader, doWork, getFileParameterForOperator, isMetaDataCacheable, registerOperator, registerReaderDescription, supportsEncoding
 
Methods inherited from class com.rapidminer.operator.Operator
acceptsInput, addError, addError, addValue, addWarning, apply, apply, assumePreconditionsSatisfied, checkAll, checkAllExcludingMetaData, checkDeprecations, checkForStop, checkIO, checkProperties, clear, clearErrorList, cloneOperator, collectErrors, createExperimentTree, createExperimentTree, createFromXML, createFromXML, createFromXML, createMarkedExperimentTree, createMarkedProcessTree, createProcessTree, createProcessTree, disconnectPorts, execute, fireUpdate, freeMemory, getAddOnlyAdditionalOutput, getApplyCount, getCompatibilityLevel, getDeliveredOutputClasses, getDeprecationInfo, getDesiredInputClasses, getDOMRepresentation, getEncoding, getErrorList, getExecutionUnit, getExperiment, getIncompatibleVersionChanges, getInput, getInput, getInput, getInputClasses, getInputDescription, getInputPorts, getIODescription, getLog, getLogger, getName, getNumberOfBreakpoints, getOperatorClassName, getOperatorDescription, getOutputClasses, getOutputPorts, getParameter, getParameterAsBoolean, getParameterAsChar, getParameterAsColor, getParameterAsDouble, getParameterAsFile, getParameterAsFile, getParameterAsInputStream, getParameterAsInt, getParameterAsMatrix, getParameterAsRepositoryLocation, getParameterAsString, getParameterHandler, getParameterList, getParameters, getParameterTupel, getParameterType, getParent, getPortOwner, getProcess, getResourceConsumptionEstimator, getRoot, getStartTime, getTransformer, getUserDescription, getValue, getValues, getXML, getXML, getXML, hasBreakpoint, hasBreakpoint, hasInput, inApplyLoop, isDebugMode, isDirty, isEnabled, isExpanded, isParallel, isParameterSet, isRunning, log, log, logError, logNote, logWarning, lookupOperator, makeDirty, makeDirtyOnUpdate, notifyRenaming, performAdditionalChecks, preAutoWire, processFinished, processStarts, producesOutput, propagateDirtyness, register, remove, removeAndKeepConnections, rename, resume, setBreakpoint, setCompatibilityLevel, setEnabled, setEnclosingProcess, setExpanded, setInput, setListParameter, setPairParameter, setParameter, setParameters, setUserDescription, shouldAutoConnect, shouldAutoConnect, shouldStopStandaloneExecution, toString, transformMetaData, unregisterOperator, updateExecutionOrder, walk, writeXML, writeXML
 
Methods inherited from class com.rapidminer.tools.AbstractObservable
addObserver, addObserverAsFirst, fireUpdate, removeObserver
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

PARAMETER_LABEL_NAME

public static final java.lang.String PARAMETER_LABEL_NAME
Deprecated. 
The parameter name for "Name of the label attribute (if empty, the column defined by label_column will be used)"

See Also:
Constant Field Values

PARAMETER_LABEL_COLUMN

public static final java.lang.String PARAMETER_LABEL_COLUMN
Deprecated. 
The parameter name for "Column number of the label attribute (only used if label_name is empty; 0 = none; negative values are counted from the last column)"

See Also:
Constant Field Values

PARAMETER_ID_NAME

public static final java.lang.String PARAMETER_ID_NAME
Deprecated. 
The parameter name for "Name of the id attribute (if empty, the column defined by id_column will be used)"

See Also:
Constant Field Values

PARAMETER_ID_COLUMN

public static final java.lang.String PARAMETER_ID_COLUMN
Deprecated. 
The parameter name for "Column number of the id attribute (only used if id_name is empty; 0 = none; negative values are counted from the last column)"

See Also:
Constant Field Values

PARAMETER_WEIGHT_NAME

public static final java.lang.String PARAMETER_WEIGHT_NAME
Deprecated. 
The parameter name for "Name of the weight attribute (if empty, the column defined by weight_column will be used)"

See Also:
Constant Field Values

PARAMETER_WEIGHT_COLUMN

public static final java.lang.String PARAMETER_WEIGHT_COLUMN
Deprecated. 
The parameter name for "Column number of the weight attribute (only used if weight_name is empty; 0 = none, negative values are counted from the last column)"

See Also:
Constant Field Values

PARAMETER_SAMPLE_RATIO

public static final java.lang.String PARAMETER_SAMPLE_RATIO
Deprecated. 
The parameter name for "The fraction of the data set which should be read (1 = all; only used if sample_size = -1)"

See Also:
Constant Field Values

PARAMETER_SAMPLE_SIZE

public static final java.lang.String PARAMETER_SAMPLE_SIZE
Deprecated. 
The parameter name for "The exact number of samples which should be read (-1 = use sample ratio; if not -1, sample_ratio will not have any effect)"

See Also:
Constant Field Values

PARAMETER_FILENAME

public static final java.lang.String PARAMETER_FILENAME
Deprecated. 
See Also:
Constant Field Values

PARAMETER_USE_FIRST_ROW_AS_ATTRIBUTE_NAMES

public static final java.lang.String PARAMETER_USE_FIRST_ROW_AS_ATTRIBUTE_NAMES
Deprecated. 
See Also:
Constant Field Values

PARAMETER_TRIM_LINES

public static final java.lang.String PARAMETER_TRIM_LINES
Deprecated. 
See Also:
Constant Field Values

PARAMETER_SKIP_COMMENTS

public static final java.lang.String PARAMETER_SKIP_COMMENTS
Deprecated. 
See Also:
Constant Field Values

PARAMETER_COMMENT_CHARS

public static final java.lang.String PARAMETER_COMMENT_CHARS
Deprecated. 
See Also:
Constant Field Values

PARAMETER_USE_QUOTES

public static final java.lang.String PARAMETER_USE_QUOTES
Deprecated. 
See Also:
Constant Field Values

PARAMETER_QUOTES_CHARACTER

public static final java.lang.String PARAMETER_QUOTES_CHARACTER
Deprecated. 
See Also:
Constant Field Values

PARAMETER_COLUMN_SEPARATORS

public static final java.lang.String PARAMETER_COLUMN_SEPARATORS
Deprecated. 
See Also:
Constant Field Values
Constructor Detail

SimpleExampleSource

public SimpleExampleSource(OperatorDescription description)
Deprecated. 
Method Detail

createReader

public CSVFileReader createReader(java.io.File file)
                           throws UndefinedParameterError
Deprecated. 
Throws:
UndefinedParameterError

getGeneratedMetaData

public MetaData getGeneratedMetaData()
                              throws OperatorException
Deprecated. 
Overrides:
getGeneratedMetaData in class AbstractExampleSource
Throws:
OperatorException

createExampleSet

public ExampleSet createExampleSet()
                            throws OperatorException
Deprecated. 
Description copied from class: AbstractExampleSource
Creates (or reads) the ExampleSet that will be returned by Operator.apply().

Specified by:
createExampleSet in class AbstractExampleSource
Throws:
OperatorException

createExampleSet

public static ExampleSet createExampleSet(java.io.File file,
                                          boolean firstRowAsColumnNames,
                                          double sampleRatio,
                                          int maxLines,
                                          java.lang.String separatorRegExpr,
                                          char[] comments,
                                          int dataRowType,
                                          boolean useQuotes,
                                          boolean trimLines,
                                          boolean skipErrorLines,
                                          char decimalPointCharacter,
                                          java.nio.charset.Charset encoding,
                                          java.lang.String labelName,
                                          int labelColumn,
                                          java.lang.String idName,
                                          int idColumn,
                                          java.lang.String weightName,
                                          int weightColumn)
                                   throws java.io.IOException,
                                          UserError,
                                          java.lang.IndexOutOfBoundsException
Deprecated. 
Throws:
java.io.IOException
UserError
java.lang.IndexOutOfBoundsException

getParameterTypes

public java.util.List<ParameterType> getParameterTypes()
Deprecated. 
Description copied from class: Operator
Returns a list of ParameterTypes describing the parameters of this operator. The default implementation returns an empty list if no input objects can be retained and special parameters for those input objects which can be prevented from being consumed. ATTENTION! This will create new parameterTypes. For calling already existing parameter types use getParameters().getParameterTypes();

Specified by:
getParameterTypes in interface ParameterHandler
Overrides:
getParameterTypes in class AbstractReader<ExampleSet>


Copyright © 2001-2009 by Rapid-I