com.rapidminer.operator.io
Class SparseFormatExampleSource

java.lang.Object
  extended by com.rapidminer.operator.Operator
      extended by com.rapidminer.operator.io.AbstractReader<ExampleSet>
          extended by com.rapidminer.operator.io.AbstractExampleSource
              extended by com.rapidminer.operator.io.SparseFormatExampleSource
All Implemented Interfaces:
ConfigurationListener, PreviewListener, ParameterHandler, LoggingHandler

public class SparseFormatExampleSource
extends AbstractExampleSource

Reads an example file in sparse format, i.e. lines have the form

 label index:value index:value index:value...
 

Index may be an integer (starting with 1) for the regular attributes or one of the prefixes specified by the parameter list prefix_map. Four possible formats are supported
format_xy:
The label is the last token in each line
format_yx:
The label is the first token in each line
format_prefix:
The label is prefixed by 'l:'
format_separate_file:
The label is read from a separate file specified by label_file
no_label:
The example set is unlabeled.
A detailed introduction to the sparse file format is given in section First steps/File formats/Data files.

Author:
Ingo Mierswa, Simon Fischer
See Also:
SparseFormatDataRowReader

Field Summary
static java.lang.String PARAMETER_ATTRIBUTE_DESCRIPTION_FILE
          The parameter name for "Name of the attribute description file.
static java.lang.String PARAMETER_DATA_FILE
          The parameter name for "Name of the data file.
static java.lang.String PARAMETER_DATAMANAGEMENT
          The parameter name for "Determines, how the data is represented internally.
static java.lang.String PARAMETER_DECIMAL_POINT_CHARACTER
          The parameter name for "Character that is used as decimal point.
static java.lang.String PARAMETER_DIMENSION
          The parameter name for "Dimension of the example space.
static java.lang.String PARAMETER_FORMAT
          The parameter name for "Format of the sparse data file.
static java.lang.String PARAMETER_LABEL_FILE
          The parameter name for "Name of the data file containing the labels.
static java.lang.String PARAMETER_PREFIX_MAP
          The parameter name for "Maps prefixes to names of special attributes.
static java.lang.String PARAMETER_SAMPLE_SIZE
          The parameter name for "The maximum number of examples to read from the data files (-1 = all)"
 
Constructor Summary
SparseFormatExampleSource(OperatorDescription description)
           
 
Method Summary
 ExampleSet createExampleSet()
          Creates (or reads) the ExampleSet that will be returned by AbstractReader.apply().
 java.util.List<ParameterType> getParameterTypes()
          Returns a list of ParameterTypes describing the parameters of this operator.
 
Methods inherited from class com.rapidminer.operator.io.AbstractExampleSource
read
 
Methods inherited from class com.rapidminer.operator.io.AbstractReader
apply, getInputClasses, getOutputClasses
 
Methods inherited from class com.rapidminer.operator.Operator
addError, addValue, addWarning, apply, checkDeprecations, checkForStop, checkIO, checkProperties, clearErrorList, cloneOperator, createExperimentTree, createExperimentTree, createFromXML, createMarkedExperimentTree, createMarkedProcessTree, createProcessTree, createProcessTree, getAddOnlyAdditionalOutput, getApplyCount, getDeliveredOutputClasses, getDeprecationInfo, getDesiredInputClasses, getEncoding, getErrorList, getExperiment, getInnerOperatorsXML, getInput, getInput, getInput, getInputDescription, getIOContainerForInApplyLoopBreakpoint, getIODescription, getLog, getName, getOperatorClassName, getOperatorDescription, getParameter, getParameterAsBoolean, getParameterAsColor, getParameterAsDouble, getParameterAsFile, getParameterAsFile, getParameterAsInputStream, getParameterAsInt, getParameterAsMatrix, getParameterAsString, getParameterList, getParameters, getParameterType, getParent, getProcess, getStartTime, getStatus, getUserDescription, getValue, getValues, getXML, hasBreakpoint, hasBreakpoint, hasInput, inApplyLoop, isDebugMode, isEnabled, isExpanded, isParallel, isParameterSet, log, logError, logNote, logWarning, performAdditionalChecks, processFinished, processStarts, register, registerOperator, remove, rename, resume, setApplyCount, setBreakpoint, setEnabled, setExpanded, setInput, setListParameter, setOperatorParameters, setParameter, setParameters, setParent, setUserDescription, toString, unregisterOperator, writeXML
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

PARAMETER_FORMAT

public static final java.lang.String PARAMETER_FORMAT
The parameter name for "Format of the sparse data file."

See Also:
Constant Field Values

PARAMETER_ATTRIBUTE_DESCRIPTION_FILE

public static final java.lang.String PARAMETER_ATTRIBUTE_DESCRIPTION_FILE
The parameter name for "Name of the attribute description file."

See Also:
Constant Field Values

PARAMETER_DATA_FILE

public static final java.lang.String PARAMETER_DATA_FILE
The parameter name for "Name of the data file. Only necessary if not specified in the attribute description file."

See Also:
Constant Field Values

PARAMETER_LABEL_FILE

public static final java.lang.String PARAMETER_LABEL_FILE
The parameter name for "Name of the data file containing the labels. Only necessary if format is 'format_separate_file'."

See Also:
Constant Field Values

PARAMETER_DIMENSION

public static final java.lang.String PARAMETER_DIMENSION
The parameter name for "Dimension of the example space. Only necessary if parameter 'attribute_description_file' is not set."

See Also:
Constant Field Values

PARAMETER_SAMPLE_SIZE

public static final java.lang.String PARAMETER_SAMPLE_SIZE
The parameter name for "The maximum number of examples to read from the data files (-1 = all)"

See Also:
Constant Field Values

PARAMETER_DATAMANAGEMENT

public static final java.lang.String PARAMETER_DATAMANAGEMENT
The parameter name for "Determines, how the data is represented internally."

See Also:
Constant Field Values

PARAMETER_DECIMAL_POINT_CHARACTER

public static final java.lang.String PARAMETER_DECIMAL_POINT_CHARACTER
The parameter name for "Character that is used as decimal point."

See Also:
Constant Field Values

PARAMETER_PREFIX_MAP

public static final java.lang.String PARAMETER_PREFIX_MAP
The parameter name for "Maps prefixes to names of special attributes."

See Also:
Constant Field Values
Constructor Detail

SparseFormatExampleSource

public SparseFormatExampleSource(OperatorDescription description)
Method Detail

createExampleSet

public ExampleSet createExampleSet()
                            throws OperatorException
Description copied from class: AbstractExampleSource
Creates (or reads) the ExampleSet that will be returned by AbstractReader.apply().

Specified by:
createExampleSet in class AbstractExampleSource
Throws:
OperatorException

getParameterTypes

public java.util.List<ParameterType> getParameterTypes()
Description copied from class: Operator
Returns a list of ParameterTypes describing the parameters of this operator. The default implementation returns an empty list if no input objects can be retained and special parameters for those input objects which can be prevented from being consumed.

Specified by:
getParameterTypes in interface ParameterHandler
Overrides:
getParameterTypes in class Operator


Copyright © 2001-2009 by Rapid-I