Although we already had released the English manual for RapidMiner already a few weeks ago and announced this in the forum and on the RapidMiner news page (you see? you shoud really read those news!) shown after starting RapidMiner, still people ask us when the manual will be released.
Here is the more or less official announcement: it is released. The manual consists of about 140 pages and contains also an introduction chapter on the basic ideas of data mining (well, basically only of classification but that does not matter) and the terms used within RapidMiner. We got a lot of nice comments, especially for this first introduction chapter. So please let us know if you like it or not or if you find an error. We are happy to correct or optimize the manual for future releases.
And by the way: the manual is free just as the software itself. Here we mean: "free" like in "free beer". Just download it, read it, and let us know what you think.
You can download the manual from the Rapid-I web page:
The link to the manual can be found in the documentation section of this page (scroll down) below the other download links. We hope you enjoy the manual and that it helps you together with the video tutorials we provide to get the basic concepts of RapidMiner.
P.S.: The new manual replaces the old RapidMiner tutorial and GUI manual known from previous versions of RapidMiner.
So, this is one announcement. We are about to publish a brand-new RapidMiner Extension which allows the seamless integration of R into the RapidMiner user interface as well as into RapidMiner processes.
This is a huge step: both solutions are among the most widely used solutions for any type of data analysis and now users can get both tools integrated. We will present the new R-Extension at the RapidMiner Community Meeting and Conference (RCOMM 2010) for the first time to the general public.
However, you do not have to wait until the RCOMM 2010 in order to get a first idea of what can be done with the new extension. We have prepared a sneak peek video showing how easy R models and scripts can be integrated into the RapidMiner analysis processes.
As you can see, RapidMiner offers a new R perspective consisting of the known R console together with the great plotting facilities of R. All variables as well as R scripts can be stored in the RapidMiner Repository and used from there which helps to organize the usually large number of scripts. Furthermore, widely used modeling methods are directly integrated as RapidMiner operators as usual.
I hope to see you at the RCOMM 2010 and we get a chance to discuss the new R-Extension there. Please let us know what you think right here!
The slogan "We make free software affordable" was written on the first Cygnus Support T-shirt according to John Gilmore. John was one of the founders of Cygnus, probably the first company offering professional services around open source products.
I love the idea behind this slogan. And actually I am fully convinced that using RapidMiner or any other open source software
without the usually great ideas of the inventors,
without their product expertise, and
without any guarantees
at the end is much more expensive than directly connecting to us and work with the pros.
Hence, using the free Community Edition of RapidMiner not only increases your risk of failure, it turns out to be more expensive. The RapidMiner Enterprise Edition delivers critical benefits like stabilized software, direct access to product expertise, and committed response times to help you save time and money.
But why I am writing about this? Well, Rapid-I has just started a promotion campaign for RapidMiner. For example, the RapidMiner Standard Enterprise Edition for 5 analysts plus unlimited technical support including even two days of consultative support is available for 5000 Euro only.
In the longshort run, paying Rapid-I to maintain the software is cheaper than employing a specialist to do the work for you.
Read here for more information about this promotion offer.
At the menu point schedule you will find all talks and workshops which will take place. In total we have
the amazing number of 16 talks from people from all over the world!
3 workshops including one about the new Extension for the integration of R which will be presented for the first time at the RCOMM 2010
6 trainings surrounding the conference for visitors who want to intensify or refresh their knowledge
1 game show: "Who want's to be a Data Miner?" where you can watch the gurus creating processes against the clock and where you can even participate and battle for a price yourself!
I am really excited about the program and the quality of the submissions and looking forward to meeting you in Dortmund. If you don't have registered yet: Visit
Today, we can make a great new announcement to our community members and all users of RapidMiner:
Rapid-I hosts the first RapidMiner Community Meeting and Conference (RCOMM 2010)!
As RapidMiner has once again proved to be the most-used open source data mining tool among the community of data analysts world-wide in a recent poll, it is now the time to give a face to that community. Therefore, Rapid-I hosts the first RapidMiner Community Meeting And Conference (RCOMM 2010) and invites users and developers of RapidMiner to take part and share their RapidMiner experiences with other members of the community. The RCOMM 2010 intends to intensify the community life and strengthen the RapidMiner network by bringing together users and developers of RapidMiner from all backgrounds, may they be scientific or commercial, from the whole variety of applications and from all grades of knowledge. A vital exchange of ideas, application reports, and scientific results will help beginners to advance and will inspire the already advanced leading them to professionalism. Users will profit from in-depth knowledge of developers, who in turn will gain from picking up requirements and ideas for further development.
The RCOMM 2010 encompasses conference talks, in which invited lecturers will discuss aspects of state-of-the-art data mining with RapidMiner. A Call-for-Papers will be issued for those who would like to present their work in that scope. Workshops will be held to give participants a hands-on experience concerning several topics regarding RapidMiner usage. Additionally, attendees of the RCOMM 2010 will also have the option to participate in several courses given by professional RapidMiner consultants in the surrounding of the user meeting.
Call for Papers
We ask all reasearchers and practitioners to submit a paper in PDF format up to six pages, for example about the design of data analysis processes with RapidMiner, text and web mining, sentiment analysis, data mining applications (production, finance...), novel algorithms, or new extensions. More information can be found on the RCOMM 2010 web site.
More Information and Registration
More information about the RCOMM 2010 as well as the possibility to register online can be found at
We have released a new Community Extension for RapidMiner a few weeks ago which you can use to share your RapidMiner and RapidAnalytics processes with data miners all over the word.
Some of you may know the http://www.myexperiment.org/ portal. MyExperiment is a community website where people share workflows of various kinds. It is an active community, and the portal comes with all the nice social network features. The new Community Extension directly connects to myExperiment which means that you can easily upload the current process with a single click. The extension also allows to browse RapidMiner processes on myExperiment and download them to your local machine directly from within RapidMiner.
You should really consider to share interesting data analysis and data transformation processes with others. Why? Well, the obvious answer is that you can discuss your data mining processes with others, exchange workflows with them, and meet data miners working on similar problems which might give you some fresh ideas.
But there is a much more important reason: If you participate in myExperiment and share your RapidMiner processes with the new Community Extension, we all will be finally build a "data analysis process Wiki". I imagine this as a place, where processes for different kinds of problems just wait to be discovered. And just as for the original idea of Wikipedia, the whole thing will only work if people start to share their knowledge with others while hoping that somebody else's knowledge will help them back some day.
So you should download the extension from our update- and installation server in the Help menu of RapidMiner, activate the myExperiment view in the View menu and start to up- and download processes. Happy sharing!
it was quite a time since I have written something in our blog. I was rather busy during the last weeks of 2009 and the new year looks as promising as the last one ended.
RapidMiner 5 is a great success. We get a lot of feedback of users and customers and it is overwhelmingly positive. Thanks for all of your comments and suggestions, we try our best to further improve RapidMiner.
Something which is currently still missing is the latest version of the RapidMiner documentation. For RapidMiner 4.x, the tutorial had more than 700 pages mainly consisting of the operator reference and developer guides. For RapidMiner 5 and future versions, we are currently completely revising the documentation which is about to be finished in German (yes, there will be a German documentation for the first time!) and has to be translated in English which will follow soon.
However, we also made a set of video tutorials which are now available at
Each tutorial takes only three minutes and after viewing all of them, less experienced uses should be able to set up their first processes. In future, we will add additional tutorials from time to time and they will support the written documentation of RapidMiner.
Check out the new tutorials and let us know how you like them:
For the contributers and developers amogst you: The RapidMiner development branch (as well as the stable 4.6 branch) are finally back on sourceforge, accessible under their respective codenames:
This is probably the final episode of our "Approaching Vega" story: RapidMiner 5 Beta will be released during the next days and then you can try all of the cool new features yourself.
We have shown you during the last weeks how RapidMiner 5 handles meta data and automatically transform it during the process design time. This is a key component of RapidMiner 5 since the meta data transformation not only simplifies the graphical user interface by providing, for example, the names of the transformed attributes in interface components. Moreover, the meta data transformation is the foundation of ongoing process checks which will show you possible problems as early as possible and will also assist you by providing hints how to solve problems (see the quick fix discussion below).
However, the meta data transformations are of course only possible if any meta data exists at the first place. And here the new Repositories come into the game: you can have several repositories and you can use them to organize your analysis projects, your data, and your data mining processes.
Data can simply be imported to the repository by drag'n'drop. This makes data integration as easy as possible. Once imported, the data is stored together with its meta data which can hence be used during process design without having the data loaded at all.
Flow Design, Meta Data Transformations, and Repositories are the three main components of RapidMiner 5. Together they simplify your analysis work a lot and extend the possibilities for your data analysis at the same time. Just check out the upcoming RapidMiner 5 Beta.
RapidMiner 5 comes with a docking framework that allows you to select and move around user interface components in order to design the interface according to your needs. Earlier versions of RapidMiner used to present process results in multiple tabs, simply displaying empty space when no results were generated yet. Since every result tab is a freely movable UI component in RM 5, there is no component which would fill up the free space when no result tabs are present - the UI would simply collapse and neighbouring components would take over the free space. This would clearly be ugly, so we started by adding an empty component serving as a place holder reserving space where new results would be added.
It quickly became clear that having the largest part of the result perspective filled with empty space is not particularly less ugly, so we decided to fill it up with something useful. What would be more obvious than to give a new home to the result history? What do you mean, you don't know the result history? Everyone should know the result history. Well. Admittedly, the old result history did not make it into the top ten of RapidMiner's usability charts, but it has always been a nice feature that noone used.
For RM 5, we designed a completly new result history which looks like this:
As you see, the result history presents an entry for each process execution and lists all results, each presented as a thumbnail or textual represenation. Thus, you can go back in time, look at the results produced by earlier versions of your process, possibly re-open them, compare performances, and restore the particular process version if you find it performing better. Having this history readily available, provides terrific assistance for rapid process design.
This way, what was originally intended to be a place holder became one of my favourite RM 5 features.