com.rapidminer.operator.io
Class XrffExampleSource

java.lang.Object
  extended by com.rapidminer.operator.Operator
      extended by com.rapidminer.operator.io.AbstractReader<ExampleSet>
          extended by com.rapidminer.operator.io.AbstractExampleSource
              extended by com.rapidminer.operator.io.XrffExampleSource
All Implemented Interfaces:
ConfigurationListener, PreviewListener, ParameterHandler, LoggingHandler

public class XrffExampleSource
extends AbstractExampleSource

This operator can read XRFF files known from Weka. The XRFF (eXtensible attribute-Relation File Format) is an XML-based extension of the ARFF format in some sense similar to the original RapidMiner file format for attribute description files (.aml).

Here you get a small example for the IRIS dataset represented as XRFF file:

 <?xml version="1.0" encoding="utf-8"?>
 <dataset name="iris" version="3.5.3">
  <header>
     <attributes>
        <attribute name="sepallength" type="numeric"/>
        <attribute name="sepalwidth" type="numeric"/>
        <attribute name="petallength" type="numeric"/>
        <attribute name="petalwidth" type="numeric"/>
        <attribute class="yes" name="class" type="nominal">
           <labels>
              <label>Iris-setosa</label>
              <label>Iris-versicolor</label>
              <label>Iris-virginica</label>
           </labels>
        </attribute>
     </attributes>
  </header>

  <body>
     <instances>
        <instance>
           <value>5.1</value>
           <value>3.5</value>
           <value>1.4</value>
           <value>0.2</value>
           <value>Iris-setosa</value>
        </instance>
        <instance>
           <value>4.9</value>
           <value>3</value>
           <value>1.4</value>
           <value>0.2</value>
           <value>Iris-setosa</value>
        </instance>
        ...
     </instances>
  </body>
 </dataset>
 

Please note that the sparse XRFF format is currently not supported, please use one of the other options for sparse data files provided by RapidMiner.

Since the XML representation takes up considerably more space since the data is wrapped into XML tags, one can also compress the data via gzip. RapidMiner automatically recognizes a file being gzip compressed, if the file's extension is .xrff.gz instead of .xrff.

Similar to the native RapidMiner data definition via .aml and almost arbitrary data files, the XRFF format contains some additional features. Via the class="yes" attribute in the attribute specification in the header, one can define which attribute should used as a prediction label attribute. Although the RapidMiner terminus for such classes is "label" instead of "class" we support the terminus class in order to not break compatibility with original XRFF files.

Please note that loading attribute weights is currently not supported, please use the other RapidMiner operators for attribute weight loading and writing for this purpose.

Instance weights can be defined via a weight XML attribute in each instance tag. By default, the weight is 1. Here's an example:

 <instance weight="0.75">
  <value>5.1</value>
  <value>3.5</value>
  <value>1.4</value>
  <value>0.2</value>
  <value>Iris-setosa</value>
 </instance>
 

Since the XRFF format does not support id attributes one have to use one of the RapidMiner operators in order to change on of the columns to the id column if desired. This has to be done after loading the data.

Author:
Ingo Mierswa
Keywords:
xrff

Field Summary
static java.lang.String PARAMETER_DATA_FILE
          The parameter name for "The path to the data file.
static java.lang.String PARAMETER_DATAMANAGEMENT
          The parameter name for "Determines, how the data is represented internally.
static java.lang.String PARAMETER_DECIMAL_POINT_CHARACTER
          The parameter name for "Character that is used as decimal point.
static java.lang.String PARAMETER_ID_ATTRIBUTE
          The parameter name for "The (case sensitive) name of the id attribute"
static java.lang.String PARAMETER_LOCAL_RANDOM_SEED
          The parameter name for "Use the given random seed instead of global random numbers (only for permutation, -1: use global).
static java.lang.String PARAMETER_SAMPLE_RATIO
          The parameter name for "The fraction of the data set which should be read (1 = all; only used if sample_size = -1)"
static java.lang.String PARAMETER_SAMPLE_SIZE
          The parameter name for "The exact number of samples which should be read (-1 = use sample ratio; if not -1, sample_ratio will not have any effect)"
 
Constructor Summary
XrffExampleSource(OperatorDescription description)
           
 
Method Summary
 ExampleSet createExampleSet()
          Creates (or reads) the ExampleSet that will be returned by AbstractReader.apply().
 java.util.List<ParameterType> getParameterTypes()
          Returns a list of ParameterTypes describing the parameters of this operator.
 
Methods inherited from class com.rapidminer.operator.io.AbstractExampleSource
read
 
Methods inherited from class com.rapidminer.operator.io.AbstractReader
apply, getInputClasses, getOutputClasses
 
Methods inherited from class com.rapidminer.operator.Operator
addError, addValue, addWarning, apply, checkDeprecations, checkForStop, checkIO, checkProperties, clearErrorList, cloneOperator, createExperimentTree, createExperimentTree, createFromXML, createMarkedExperimentTree, createMarkedProcessTree, createProcessTree, createProcessTree, getAddOnlyAdditionalOutput, getApplyCount, getDeliveredOutputClasses, getDeprecationInfo, getDesiredInputClasses, getEncoding, getErrorList, getExperiment, getInnerOperatorsXML, getInput, getInput, getInput, getInputDescription, getIOContainerForInApplyLoopBreakpoint, getIODescription, getLog, getName, getOperatorClassName, getOperatorDescription, getParameter, getParameterAsBoolean, getParameterAsColor, getParameterAsDouble, getParameterAsFile, getParameterAsFile, getParameterAsInputStream, getParameterAsInt, getParameterAsMatrix, getParameterAsString, getParameterList, getParameters, getParameterType, getParent, getProcess, getStartTime, getStatus, getUserDescription, getValue, getValues, getXML, hasBreakpoint, hasBreakpoint, hasInput, inApplyLoop, isDebugMode, isEnabled, isExpanded, isParallel, isParameterSet, log, logError, logNote, logWarning, performAdditionalChecks, processFinished, processStarts, register, registerOperator, remove, rename, resume, setApplyCount, setBreakpoint, setEnabled, setExpanded, setInput, setListParameter, setOperatorParameters, setParameter, setParameters, setParent, setUserDescription, toString, unregisterOperator, writeXML
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

PARAMETER_DATA_FILE

public static final java.lang.String PARAMETER_DATA_FILE
The parameter name for "The path to the data file."

See Also:
Constant Field Values

PARAMETER_ID_ATTRIBUTE

public static final java.lang.String PARAMETER_ID_ATTRIBUTE
The parameter name for "The (case sensitive) name of the id attribute"

See Also:
Constant Field Values

PARAMETER_DATAMANAGEMENT

public static final java.lang.String PARAMETER_DATAMANAGEMENT
The parameter name for "Determines, how the data is represented internally."

See Also:
Constant Field Values

PARAMETER_DECIMAL_POINT_CHARACTER

public static final java.lang.String PARAMETER_DECIMAL_POINT_CHARACTER
The parameter name for "Character that is used as decimal point."

See Also:
Constant Field Values

PARAMETER_SAMPLE_RATIO

public static final java.lang.String PARAMETER_SAMPLE_RATIO
The parameter name for "The fraction of the data set which should be read (1 = all; only used if sample_size = -1)"

See Also:
Constant Field Values

PARAMETER_SAMPLE_SIZE

public static final java.lang.String PARAMETER_SAMPLE_SIZE
The parameter name for "The exact number of samples which should be read (-1 = use sample ratio; if not -1, sample_ratio will not have any effect)"

See Also:
Constant Field Values

PARAMETER_LOCAL_RANDOM_SEED

public static final java.lang.String PARAMETER_LOCAL_RANDOM_SEED
The parameter name for "Use the given random seed instead of global random numbers (only for permutation, -1: use global)."

See Also:
Constant Field Values
Constructor Detail

XrffExampleSource

public XrffExampleSource(OperatorDescription description)
Method Detail

createExampleSet

public ExampleSet createExampleSet()
                            throws OperatorException
Description copied from class: AbstractExampleSource
Creates (or reads) the ExampleSet that will be returned by AbstractReader.apply().

Specified by:
createExampleSet in class AbstractExampleSource
Throws:
OperatorException

getParameterTypes

public java.util.List<ParameterType> getParameterTypes()
Description copied from class: Operator
Returns a list of ParameterTypes describing the parameters of this operator. The default implementation returns an empty list if no input objects can be retained and special parameters for those input objects which can be prevented from being consumed.

Specified by:
getParameterTypes in interface ParameterHandler
Overrides:
getParameterTypes in class Operator


Copyright © 2001-2009 by Rapid-I