|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectcom.rapidminer.operator.Operator
com.rapidminer.operator.io.AbstractReader<ExampleSet>
com.rapidminer.operator.io.AbstractExampleSource
com.rapidminer.operator.io.XrffExampleSource
public class XrffExampleSource
This operator can read XRFF files known from Weka. The XRFF (eXtensible attribute-Relation File Format) is an XML-based extension of the ARFF format in some sense similar to the original RapidMiner file format for attribute description files (.aml).
Here you get a small example for the IRIS dataset represented as XRFF file:
<?xml version="1.0" encoding="utf-8"?>
<dataset name="iris" version="3.5.3">
<header>
<attributes>
<attribute name="sepallength" type="numeric"/>
<attribute name="sepalwidth" type="numeric"/>
<attribute name="petallength" type="numeric"/>
<attribute name="petalwidth" type="numeric"/>
<attribute class="yes" name="class" type="nominal">
<labels>
<label>Iris-setosa</label>
<label>Iris-versicolor</label>
<label>Iris-virginica</label>
</labels>
</attribute>
</attributes>
</header>
<body>
<instances>
<instance>
<value>5.1</value>
<value>3.5</value>
<value>1.4</value>
<value>0.2</value>
<value>Iris-setosa</value>
</instance>
<instance>
<value>4.9</value>
<value>3</value>
<value>1.4</value>
<value>0.2</value>
<value>Iris-setosa</value>
</instance>
...
</instances>
</body>
</dataset>
Please note that the sparse XRFF format is currently not supported, please use one of the other options for sparse data files provided by RapidMiner.
Since the XML representation takes up considerably more space since the data is wrapped into XML tags, one can also compress the data via gzip. RapidMiner automatically recognizes a file being gzip compressed, if the file's extension is .xrff.gz instead of .xrff.
Similar to the native RapidMiner data definition via .aml and almost arbitrary data files, the XRFF format contains some additional features. Via the class="yes" attribute in the attribute specification in the header, one can define which attribute should used as a prediction label attribute. Although the RapidMiner terminus for such classes is "label" instead of "class" we support the terminus class in order to not break compatibility with original XRFF files.
Please note that loading attribute weights is currently not supported, please use the other RapidMiner operators for attribute weight loading and writing for this purpose.
Instance weights can be defined via a weight XML attribute in each instance tag. By default, the weight is 1. Here's an example:
<instance weight="0.75"> <value>5.1</value> <value>3.5</value> <value>1.4</value> <value>0.2</value> <value>Iris-setosa</value> </instance>
Since the XRFF format does not support id attributes one have to use one of the RapidMiner operators in order to change on of the columns to the id column if desired. This has to be done after loading the data.
| Field Summary | |
|---|---|
static java.lang.String |
PARAMETER_DATA_FILE
The parameter name for "The path to the data file. |
static java.lang.String |
PARAMETER_DATAMANAGEMENT
The parameter name for "Determines, how the data is represented internally. |
static java.lang.String |
PARAMETER_DECIMAL_POINT_CHARACTER
The parameter name for "Character that is used as decimal point. |
static java.lang.String |
PARAMETER_ID_ATTRIBUTE
The parameter name for "The (case sensitive) name of the id attribute" |
static java.lang.String |
PARAMETER_LOCAL_RANDOM_SEED
The parameter name for "Use the given random seed instead of global random numbers (only for permutation, -1: use global). |
static java.lang.String |
PARAMETER_SAMPLE_RATIO
The parameter name for "The fraction of the data set which should be read (1 = all; only used if sample_size = -1)" |
static java.lang.String |
PARAMETER_SAMPLE_SIZE
The parameter name for "The exact number of samples which should be read (-1 = use sample ratio; if not -1, sample_ratio will not have any effect)" |
| Constructor Summary | |
|---|---|
XrffExampleSource(OperatorDescription description)
|
|
| Method Summary | |
|---|---|
ExampleSet |
createExampleSet()
Creates (or reads) the ExampleSet that will be returned by AbstractReader.apply(). |
java.util.List<ParameterType> |
getParameterTypes()
Returns a list of ParameterTypes describing the parameters of this operator. |
| Methods inherited from class com.rapidminer.operator.io.AbstractExampleSource |
|---|
read |
| Methods inherited from class com.rapidminer.operator.io.AbstractReader |
|---|
apply, getInputClasses, getOutputClasses |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Field Detail |
|---|
public static final java.lang.String PARAMETER_DATA_FILE
public static final java.lang.String PARAMETER_ID_ATTRIBUTE
public static final java.lang.String PARAMETER_DATAMANAGEMENT
public static final java.lang.String PARAMETER_DECIMAL_POINT_CHARACTER
public static final java.lang.String PARAMETER_SAMPLE_RATIO
public static final java.lang.String PARAMETER_SAMPLE_SIZE
public static final java.lang.String PARAMETER_LOCAL_RANDOM_SEED
| Constructor Detail |
|---|
public XrffExampleSource(OperatorDescription description)
| Method Detail |
|---|
public ExampleSet createExampleSet()
throws OperatorException
AbstractExampleSourceAbstractReader.apply().
createExampleSet in class AbstractExampleSourceOperatorExceptionpublic java.util.List<ParameterType> getParameterTypes()
Operator
getParameterTypes in interface ParameterHandlergetParameterTypes in class Operator
|
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||