|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectcom.rapidminer.tools.AbstractObservable<Operator>
com.rapidminer.operator.Operator
com.rapidminer.operator.io.AbstractReader<ExampleSet>
com.rapidminer.operator.io.AbstractExampleSource
com.rapidminer.operator.io.C45ExampleSource
public class C45ExampleSource
Loads data given in C4.5 format (names and data file). Both files must be in the same directory. You can specify one of the C4.5 files (either the data or the names file) or only the filestem.
For a dataset named "foo", you will have two files: foo.data and foo.names. The .names file describes the dataset, while the .data file contains the examples which make up the dataset.
The files contain series of identifiers and numbers with some surrounding syntax. A | (vertical bar) means that the remainder of the line should be ignored as a comment. Each identifier consists of a string of characters that does not include comma, question mark or colon. Embedded whitespce is also permitted but multiple whitespace is replaced by a single space.
The .names file contains a series of entries that describe the classes, attributes and values of the dataset. Each entry can be terminated with a period, but the period can be omited if it would have been the last thing on a line. The first entry in the file lists the names of the classes, separated by commas. Each successive line then defines an attribute, in the order in which they will appear in the .data file, with the following format:
attribute-name : attribute-type
The attribute-name is an identifier as above, followed by a colon, then the attribute type which must be one of
continuous If the attribute has a continuous value.discrete [n] The word 'discrete' followed by an integer which
indicates how many values the attribute can take (not recommended, please use the method
depicted below for defining nominal attributes).[list of identifiers] This is a discrete, i.e. nominal, attribute with the
values enumerated (this is the prefered method for discrete attributes). The identifiers
should be separated by commas.ignore This means that the attribute should be ignored - it won't be used.
This is not supported by RapidMiner, please use one of the attribute selection operators after
loading if you want to ignore attributes and remove them from the loaded example set.Here is an example .names file:
good, bad. dur: continuous. wage1: continuous. wage2: continuous. wage3: continuous. cola: tc, none, tcf. hours: continuous. pension: empl_contr, ret_allw, none. stby_pay: continuous. shift_diff: continuous. educ_allw: yes, no. ...
Foo.data contains the training examples in the following format: one example per line, attribute values separated by commas, class last, missing values represented by "?". For example:
2,5.0,4.0,?,none,37,?,?,5,no,11,below_average,yes,full,yes,full,good 3,2.0,2.5,?,?,35,none,?,?,?,10,average,?,?,yes,full,bad 3,4.5,4.5,5.0,none,40,?,?,?,no,11,average,?,half,?,?,good 3,3.0,2.0,2.5,tc,40,none,?,5,no,10,below_average,yes,half,yes,full,bad ...
| Nested Class Summary |
|---|
| Nested classes/interfaces inherited from class com.rapidminer.operator.io.AbstractReader |
|---|
AbstractReader.ReaderDescription |
| Field Summary | |
|---|---|
static java.lang.String |
PARAMETER_C45_FILESTEM
The parameter name for "The path to either the C4.5 names file, the data file, or the filestem (without extensions). |
static java.lang.String |
PARAMETER_DATAMANAGEMENT
The parameter name for "Determines, how the data is represented internally. |
static java.lang.String |
PARAMETER_DECIMAL_POINT_CHARACTER
The parameter name for "Character that is used as decimal point. |
| Constructor Summary | |
|---|---|
C45ExampleSource(OperatorDescription description)
|
|
| Method Summary | |
|---|---|
ExampleSet |
createExampleSet()
Creates (or reads) the ExampleSet that will be returned by Operator.apply(). |
java.util.List<ParameterType> |
getParameterTypes()
Returns a list of ParameterTypes describing the parameters of this operator. |
protected boolean |
supportsEncoding()
|
| Methods inherited from class com.rapidminer.operator.io.AbstractExampleSource |
|---|
getGeneratedMetaData, read |
| Methods inherited from class com.rapidminer.operator.io.AbstractReader |
|---|
addAnnotations, canMakeReaderFor, createReader, doWork, getFileParameterForOperator, isMetaDataCacheable, registerOperator, registerReaderDescription |
| Methods inherited from class com.rapidminer.tools.AbstractObservable |
|---|
addObserver, addObserverAsFirst, fireUpdate, removeObserver |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Field Detail |
|---|
public static final java.lang.String PARAMETER_C45_FILESTEM
public static final java.lang.String PARAMETER_DATAMANAGEMENT
public static final java.lang.String PARAMETER_DECIMAL_POINT_CHARACTER
| Constructor Detail |
|---|
public C45ExampleSource(OperatorDescription description)
| Method Detail |
|---|
public ExampleSet createExampleSet()
throws OperatorException
AbstractExampleSourceOperator.apply().
createExampleSet in class AbstractExampleSourceOperatorExceptionprotected boolean supportsEncoding()
supportsEncoding in class AbstractReader<ExampleSet>public java.util.List<ParameterType> getParameterTypes()
Operator
getParameterTypes in interface ParameterHandlergetParameterTypes in class AbstractReader<ExampleSet>
|
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||