Open source software for big data analytics.
No programming required.

HomeContact UsSearchSitemapPrivacy PolicyImprint
  • Deutsch
  • English
Rapid-I. Report the Future. Home Download
Rapid-I Blog
Home Home
Search Search
RSS Feed RSS Feed

 

 

Blog Tags
Login Form





Lost Password?
No account yet? Register
Tag >> R
RExample 15 Dec 2010
Simple Example for R in RapidMiner by Ingo Mierswa Comment (0)

We got a lot of positive feedback after the release of the R extension , which allows the integration of R scripts directly in the analysis processes of RapidMiner. Many people really like this approach and for exactly that reason I would like to ease the first steps for those of you who are less experienced in programming in general and programming with R.


The following example performs probably one of the simplest data transformations you can think of: we want to use R to add two columns of a data set and store the results in a new column called “sum”.


Of course it is even simpler to use a special operator for this task, namely the operator “Generate Attributes”. However, the process below should be simple enough in order to demonstrate some of the necessary R concepts for less experienced users. In a programming lesson, the example below would probably be called “Hello World” example for R in RapidMiner.


Of course you will need a correctly installed R extension in order to be able to follow this short tutorial. Please refer to our forum if you have any problems during the installation. Ok, let’s start. We assume we have a data set with four columns named a1 to a4 and another special attribute, the label. We take this input from our RapidMiner repository which is the first step in the process below:

 


After loading the data with “Retrieve” we simply add a new operator “Execute Script (R)” and connect the output port of Retrieve delivering the data set during execution with the input port of the new operator. We now define the inputs of the script by clicking on the parameter button “inputs” which will open the following dialog:



We define the first input (we only have one) by giving it the name “data”. You can reference the delivered data set then in the script by using this name.
The second definition is the R script itself. Click on the parameter button “script” in order to open a dialog where you can enter an arbitrary R script. This dialog looks like the following one:

 


Here is what the script does:


Line 1: sum_column <- data[1] + data[2]
This line generates a new data vector named “sum_column” and calculates the sum of the first column of data – indicated by the 1 in brackets – with the second one. Please note that we have used the defined name “data” here.

Line 2: complete_data <- c(data, sum_column)
We now concatenate (command: c) the newly generated column “sum_column” with the given data set named “data” and store it under the name “complete_data”.


Line 3: result <- as.data.frame(complete_data)
We now transform the result into a data frame. Data frames are the R concept for data tables or matrices which can consist of columns of mixed types which can also have a name. They are pretty similar to the Example Sets known from RapidMiner. Please note that you have to transform your results to data frames with the command “as.data.frame” if you want to deliver the results back to RapidMiner as an Example Set (see below).


Line 4: colnames(result)[6] = "Sum"
This last step is optional and simply renames the new column to “Sum”. Of course this could also be done afterward with the operator “Rename”.

The final step is to define the results and how they are delivered back to RapidMiner. Simply click on the parameter button named “results” and the following dialog will be shown:

 


Here you can define which variables used in the script should be delivered. In our case it should only be the variable “result” which contains the resulting data set. If the variables are a data frame (see above), you could directly transform it to a RapidMiner Data Table / Example Set. Otherwise, you can only deliver a generic R result.

There you go. Now you can simply run the process and add two columns with R directly within a RapidMiner process. Have fun to try out other data transformations!

I have also uploaded the process to myExperiment with our Community Extension . You can simply download it from there and directly try the scripting operator. The uploaded process also contains a parallel way for this calculation by using the native operator “Generate Attributes” instead.

TradingRProcessExample 6 Dec 2010
RapidMiner and R for Trading Part II: Genetic Optimization by Ingo Mierswa Comment (0)
A couple of weeks ago, the author Neural Concepts has posted a description of a a very interesting application for RapidMiner and its new R Extension. In his blog A Physicist in Wall Street , he has described a complete trading system based on this combination.

Our blog post about this financial data mining application quickly became one of our most often read articles here in this blog and so I am sure that many of you will be really happy to see that Neural Concepts has improved his processes by means of genetic optimization schemes and get much better results now. The following picture shows the return over time:

 

 

The goal still is the prediction of the next day's close price in order to generate buying and selling signals. But now the approach was improved a lot and described in detail in a nice video. Check out the original blog post for more details:

http://aphysicistinwallstreet.blogspot.com/2010/12/genetic-optimization-for-trading-using.html

 

And here is the video showing all steps in detail:

 

  • Share/Bookmark
  • Stay tuned with our RSS feed!
  • Watch videos on our YouTube channel!
  • Rapid Insight / Inside Rapid-I (Blog)
  • Visit Rapid-I on Facebook and become our fan!
  • Follow Rapid-I on Twitter!
  • Read the Rapid-I Newsletter