Open source software for big data analytics.
No programming required.

HomeContact UsSearchSitemapPrivacy PolicyImprint
  • Deutsch
  • English
Rapid-I. Report the Future. Home Download
Rapid-I Blog
Home Home
Search Search
RSS Feed RSS Feed

 

 

Blog Tags
Login Form





Lost Password?
No account yet? Register
Tag >> Script
ScriptRapidMinerOperator 21 Jul 2009
New Feature: Script Operator in RapidMiner 4.5 by Ingo Mierswa Comment (1)

We introduced a new operator in RapidMiner 4.5 called "Script". This is a really powerful tool for professional analysis process design in the (rare) cases where built-in operator are not sufficient to achieve a desired task.

With the Script operator you are able to define arbitrary operations by writing Groovy  scripts (plain Java is also ok if you are not familiar with Groovy). In addition to the usual language syntax, we decided to add some additional syntactic sugar in order to simplify the scripting experience. This leads to a RapidMiner scripting language which will give you the power to perform any preprocessing or modeling you want.

Before we describe the details of the language extensions, here is a short example. We use the task "subtract the mean value from each attribute" discussed in the last blog entry. Of course, this has been possible with traditional RapidMiner operators and usually I would always recommend to use such a process whenever this is possible. However, sometimes those processes become rather large and sometimes one can simply not find the correct process but needs a solution right now. In those cases, the new Script operator really becomes handy.

The following picture shows the process for subtracting the mean value for each attribute:

Much easier than the previous process, eh? Of course, the main part is hidden as a parameter of the Script process. It is the actual RapidMiner script which will be performed by the Script operator. Here is the complete script:

 

ExampleSet exampleSet = operator.getInput(ExampleSet.class);

exampleSet.recalculateAllAttributeStatistics();

for (Attribute attribute : exampleSet.getAttributes()) {
    double mean = exampleSet.getStatistics(attribute, Statistics.AVERAGE);
    String name = attribute.getName();
    for (Example example : exampleSet) {
        example[name] = example[name] - mean;
    }
}

return exampleSet;

 

This is also not too difficult after you get used to it. The first line retrieves the input example set. Please note the word "operator" before the getInput-method indicating that this will be done for the Script operator. After this, all statistics are calculated in the second line.

The outer loop performs the inner tasks for each attribute. The mean value is retrieved and the inner loop subtracts the mean for each value. Please note the simplified way of accessing data and setting it via the attribute name alone.

Don't forget to deliver the result at the end with the return-statement. That's all for now, more information can be found in the documentation of the Script operator. Have fun!

  • Share/Bookmark
  • Stay tuned with our RSS feed!
  • Watch videos on our YouTube channel!
  • Rapid Insight / Inside Rapid-I (Blog)
  • Visit Rapid-I on Facebook and become our fan!
  • Follow Rapid-I on Twitter!
  • Read the Rapid-I Newsletter