Open Source Software für Big Data Analytics.
Ohne Programmierung.

HomeKontaktSucheSitemapDatenschutzImpressum
  • Deutsch
  • English
Rapid-I. Report the Future. Home Download
Rapid-I Blog
Home Home
Search Search
RSS Feed RSS Feed

 

 

Blog Tags
Login Form





Passwort vergessen?
Noch kein Benutzerkonto?
Registrieren
Tag >> Operator
RapidMinerOperator 5 Oct 2010
Extended Operations for Nominal Values by Ingo Mierswa Comment (0)

One of the next versions of RapidMiner (5.0.011 or the upcoming version 5.1) will provide a nice extension of the expression parser which is for example used for the operator "Generate Attributes".  The operations are performed with the operator "Generate Attributes" and can be used directly within the expressions for the new attributes.

The supported functions include

  • Number to String [str(x)],
  • String to Number [parse(text)],
  • Substring [cut(text, start, length)],
  • Concatenation [concat(text1, text2, text3...)],
  • Replace [replace(text, what, by)],
  • Replace All [replaceAll(text, what, by)],
  • To lower case [lower(text)],
  • To upper case [upper(text)],
  • First position of string in text [index(text, string)],
  • Length [length(text)],
  • Character at position pos in text [char(text, pos)],
  • Compare [compare(text1, text2)],
  • Contains string in text [contains(text, string)],
  • Equals [equals(text1, text2)],
  • Starts with string [starts(text, string)],
  • Ends with string [ends(text, string)],
  • Matches with regular expression exp [matches(text, exp)],
  • Suffix of length [suffix(text, length)],
  • Prefix of length [prefix(text, length)],
  • Trim (remove leading and trailing whitespace) [trim(text)].

It is amazing how many new data transformations you can perform with this simple set of text operations. Actually, I often had to use the operator "Execute Script" for this type of operations which is now no longer necessary.

I have also just uploaded a process on myExperiment , which can be directly downloaded with our Community Extension (but of course you will need the RapidMiner update first ;-) ). The process is named "Extended Operations for Nominal Values" - just like this blog entry.

ScriptRapidMinerOperator 21 Jul 2009
New Feature: Script Operator in RapidMiner 4.5 by Ingo Mierswa Comment (1)

We introduced a new operator in RapidMiner 4.5 called "Script". This is a really powerful tool for professional analysis process design in the (rare) cases where built-in operator are not sufficient to achieve a desired task.

With the Script operator you are able to define arbitrary operations by writing Groovy  scripts (plain Java is also ok if you are not familiar with Groovy). In addition to the usual language syntax, we decided to add some additional syntactic sugar in order to simplify the scripting experience. This leads to a RapidMiner scripting language which will give you the power to perform any preprocessing or modeling you want.

Before we describe the details of the language extensions, here is a short example. We use the task "subtract the mean value from each attribute" discussed in the last blog entry. Of course, this has been possible with traditional RapidMiner operators and usually I would always recommend to use such a process whenever this is possible. However, sometimes those processes become rather large and sometimes one can simply not find the correct process but needs a solution right now. In those cases, the new Script operator really becomes handy.

The following picture shows the process for subtracting the mean value for each attribute:

Much easier than the previous process, eh? Of course, the main part is hidden as a parameter of the Script process. It is the actual RapidMiner script which will be performed by the Script operator. Here is the complete script:

 

ExampleSet exampleSet = operator.getInput(ExampleSet.class);

exampleSet.recalculateAllAttributeStatistics();

for (Attribute attribute : exampleSet.getAttributes()) {
    double mean = exampleSet.getStatistics(attribute, Statistics.AVERAGE);
    String name = attribute.getName();
    for (Example example : exampleSet) {
        example[name] = example[name] - mean;
    }
}

return exampleSet;

 

This is also not too difficult after you get used to it. The first line retrieves the input example set. Please note the word "operator" before the getInput-method indicating that this will be done for the Script operator. After this, all statistics are calculated in the second line.

The outer loop performs the inner tasks for each attribute. The mean value is retrieved and the inner loop subtracts the mean for each value. Please note the simplified way of accessing data and setting it via the attribute name alone.

Don't forget to deliver the result at the end with the return-statement. That's all for now, more information can be found in the documentation of the Script operator. Have fun!

RapidMinerPreprocessingOperator 21 Jul 2009
Subtract Mean Value from each Attribute by Ingo Mierswa Comment (0)

A question which was posted several times in the forum and which is also one often asked during our training courses is the following one:

"How can I calculate the mean value for each attribute and subtract it from the attribute values?"

 Of course, one could use the Normalization operator with a normalization type set to "standardization". But in this case not  only the mean value is subtracted but the value range is also changed in a way so that the standard deviation equals 1. This is of course not alway desired.

The following process shows how you can use the operator FeatureIterator in combination with a standard aggregation and a macro to achieve the desired goal. For each of the features, the mean value is calculated with the operator Aggregation and stored in a macro. Then the operator AttributeConstruction is used where for each feature the mean value is subtracted for each value.

After this has been done, the old features are removed and the new ones are renamed to the old names. That's it. Here is a picture of the process:

 

 

 And here is the complete XML code:

<operator name="Root" class="Process" expanded="yes">
    <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
        <parameter key="target_function"    value="sum"/>
    </operator>
    <operator name="FeatureIterator" class="FeatureIterator" expanded="yes">
        <parameter key="work_on_input"    value="false"/>
        <operator name="Aggregation" class="Aggregation">
            <list key="aggregation_attributes">
              <parameter key="%{loop_feature}"    value="average"/>
            </list>
        </operator>
        <operator name="DataMacroDefinition" class="DataMacroDefinition">
            <parameter key="macro"    value="current_average"/>
            <parameter key="macro_type"    value="data_value"/>
            <parameter key="attribute_name"    value="average(%{loop_feature})"/>
            <parameter key="example_index"    value="1"/>
        </operator>
        <operator name="IOConsumer" class="IOConsumer">
            <parameter key="io_object"    value="ExampleSet"/>
            <parameter key="deletion_type"    value="delete_one"/>
        </operator>
        <operator name="AttributeConstruction" class="AttributeConstruction">
            <list key="function_descriptions">
              <parameter key="norm_%{loop_feature}"    value="%{loop_feature} - %{current_average}"/>
            </list>
        </operator>
    </operator>
    <operator name="AttributeFilter" class="AttributeFilter">
        <parameter key="condition_class"    value="attribute_name_filter"/>
        <parameter key="parameter_string"    value="norm_.*"/>
    </operator>
    <operator name="ChangeAttributeNamesReplace" class="ChangeAttributeNamesReplace">
        <parameter key="replace_what"    value="norm_"/>
    </operator>
</operator>

Have fun!

RapidMinerPreprocessingOperatorModeling 29 Jun 2009
Grouping Models by Ingo Mierswa Comment (0)

In the last blog entry, we have discussed how preprocessing models can be created with RapidMiner and applied on new data sets. In the described setup, it was necessary to use the operator IOSelector twice in order to get the correct ordering of models for model application.

Since preprocessing models are an important feature of RapidMiner, we of course also provide a much easier way of handling the different models and applying them on new data sets. All models - including preprocessing models as well as prediction models - can easily be grouped together with the operator ModelGrouper. So you do not have to cope with several models but with a single model which can be applied on new data sets and performs the preprocessing as well as the prediction. This makes the previously posted process much cleaner and easier to understand. Just have a look into this picture of the process layout:

 

 

Here is the XML setup of the complete process:

<operator name="Root" class="Process" expanded="yes">
    <operator name="DirectMailingExampleSetGenerator (Training Set)" class="DirectMailingExampleSetGenerator">
        <parameter key="number_examples"    value="1000"/>
    </operator>
    <operator name="ChangeAttributeRole (Training Set)" class="ChangeAttributeRole">
        <parameter key="name"    value="name"/>
        <parameter key="target_role"    value="id"/>
    </operator>
    <operator name="Preprocessing Models" class="OperatorChain" expanded="yes">
        <operator name="Nominal2Binominal" class="Nominal2Binominal">
            <parameter key="return_preprocessing_model"    value="true"/>
        </operator>
        <operator name="Nominal2Numerical" class="Nominal2Numerical">
            <parameter key="return_preprocessing_model"    value="true"/>
        </operator>
    </operator>
    <operator name="Training" class="LinearRegression">
        <parameter key="feature_selection"    value="none"/>
    </operator>
    <operator name="ModelGrouper" class="ModelGrouper" breakpoints="after">
    </operator>
    <operator name="DirectMailingExampleSetGenerator (Test Set)" class="DirectMailingExampleSetGenerator">
        <parameter key="number_examples"    value="1000"/>
    </operator>
    <operator name="ChangeAttributeRole (Test Set)" class="ChangeAttributeRole">
        <parameter key="name"    value="name"/>
        <parameter key="target_role"    value="id"/>
    </operator>
    <operator name="ModelApplier" class="ModelApplier">
        <list key="application_parameters">
        </list>
    </operator>
</operator>
 

 

 

  • Share/Bookmark
  • Abbonieren Sie unseren RSS Feed!
  • Sehen Sie sich Videos in unserem YouTube Channel an!
  • Rapid Insight / Inside Rapid-I (Blog)
  • Besuchen Sie Rapid-I bei Facebook und werden Sie Fan!
  • Folgen Sie Rapid-I bei Twitter!
  • Lesen Sie den Rapid-I Newsletter