HomeSearchSitemapLegalContact Us
 
Rapid-I Blog
Home Home
Search Search
RSS Feed RSS Feed
Blog Tags
Login Form





Lost Password?
No account yet? Register
Hosted by
Get RapidMiner -- Data Mining, ETL, OLAP, BI at SourceForge.net. Fast, secure and Free Open Source software downloads
RCOMM 2010RapidMinerConference 6 Jul 2010
First RapidMiner Community Meeting and Conference (RCOMM 2010) by Ingo Mierswa Comment (0)

Today, we can make a great new announcement to our community members and all users of RapidMiner:

Rapid-I hosts the first RapidMiner Community Meeting and Conference (RCOMM 2010)!

As RapidMiner has once again proved to be the most-used open source data mining tool among the community of data analysts world-wide in a recent poll, it is now the time to give a face to that community. Therefore, Rapid-I hosts the first RapidMiner Community Meeting And Conference (RCOMM 2010) and invites users and developers of RapidMiner to take part and share their RapidMiner experiences with other members of the community. The RCOMM 2010 intends to intensify the community life and strengthen the RapidMiner network by bringing together users and developers of RapidMiner from all backgrounds, may they be scientific or commercial, from the whole variety of applications and from all grades of knowledge. A vital exchange of ideas, application reports, and scientific results will help beginners to advance and will inspire the already advanced leading them to professionalism. Users will profit from in-depth knowledge of developers, who in turn will gain from picking up requirements and ideas for further development.

The RCOMM 2010 encompasses conference talks, in which invited lecturers will discuss aspects of state-of-the-art data mining with RapidMiner. A Call-for-Papers will be issued for those who would like to present their work in that scope. Workshops will be held to give participants a hands-on experience concerning several topics regarding RapidMiner usage. Additionally, attendees of the RCOMM 2010 will also have the option to participate in several courses given by professional RapidMiner consultants in the surrounding of the user meeting.

Call for Papers

We ask all reasearchers and practitioners to submit a paper in PDF format up to six pages, for example about the design of data analysis processes with RapidMiner, text and web mining, sentiment analysis, data mining applications (production, finance...), novel algorithms, or new extensions. More information can be found on the RCOMM 2010 web site.

More Information and Registration

More information about the  RCOMM 2010 as well as the possibility to register online can be found at

www.rcomm2010.org

Hope to see you there! Cheers,
Ingo

ExtensionsCommunity 25 May 2010
Share your Processes! by Ingo Mierswa Comment (0)

We have released a new Community Extension for RapidMiner a few weeks ago which you can use to share your RapidMiner and RapidAnalytics processes with data miners all over the word.

Some of you may know the http://www.myexperiment.org/ portal. MyExperiment is a community website where people share workflows of various kinds. It is an active community, and the portal comes with all the nice social network features. The new Community Extension directly connects to myExperiment which means that you can easily upload the current process with a single click. The extension also allows to browse RapidMiner processes on myExperiment and download them to your local machine directly from within RapidMiner.

 

 

You should really consider to share interesting data analysis and data transformation processes with others. Why? Well, the obvious answer is that you can discuss your data mining processes with others, exchange workflows with them, and meet data miners working on similar problems which might give you some fresh ideas.

But there is a much more important reason: If you participate in myExperiment and share your RapidMiner processes with the new Community Extension, we all will be finally build a "data analysis process Wiki". I imagine this as a place, where processes for different kinds of problems just wait to be discovered. And just as for the original idea of Wikipedia, the whole thing will only work if people start to share their knowledge with others while hoping that somebody else's knowledge will help them back some day.

More information about how to use the Community Extension can be found at http://www.e-lico.eu/?q=node/226

So you should download the extension from our update- and installation server in the Help menu of RapidMiner, activate the myExperiment view in the View menu and start to up- and download processes. Happy sharing!

 

VideoRapidMiner 18 Jan 2010
RapidMiner Video Tutorials by Ingo Mierswa Comment (0)

Hey there,

 it was quite a time since I have written something in our blog. I was  rather busy during the last weeks of 2009 and the new year looks as promising as the last one ended.

RapidMiner 5 is a great success. We get a lot of feedback of users and customers and it is overwhelmingly positive.  Thanks for all of your comments and suggestions, we try our best to further improve RapidMiner.

Something which is currently still missing is the latest version of the RapidMiner documentation. For RapidMiner 4.x, the tutorial had more than 700 pages mainly consisting of the operator reference and developer guides. For RapidMiner 5 and future versions, we are currently completely revising the documentation which is about to be finished in German (yes, there will be a German documentation for the first time!) and has to be translated in English which will follow soon.

However, we also made a set of video tutorials which are now available at

 http://rapid-i.com/content/view/189/198/

Each tutorial takes only three minutes and after viewing all of them, less experienced uses should be able to set up their first processes. In future, we will add additional tutorials from time to time and they will support the written documentation of RapidMiner.

 

 

Check out the new tutorials and let us know how you like them:

http://rapid-i.com/content/view/189/198

VegasourceforgeRapidMinerdevelopment 6 Jan 2010
RapidMiner 5 branch back on sourceforge SVN by Simon Fischer Comment (0)

For the contributers and developers amogst you: The RapidMiner development branch (as well as the stable 4.6 branch) are finally back on sourceforge, accessible under their respective codenames:

https://yale.svn.sourceforge.net/svnroot/yale/Vega (5.x)

 https://yale.svn.sourceforge.net/svnroot/yale/Wasat (4.6)

 For performance reasons, these are not the live repositories. They are mirrored between 4:00 and 5:00 am CET.

All the best for 2010!

Simon

VegaRepositoriesRapidMiner 30 Sep 2009
Approaching Vega (Episode VI: Repositories) by Ingo Mierswa Comment (0)

This is probably the final episode of our "Approaching Vega" story: RapidMiner 5 Beta will be released during the next days and then you can try all of the cool new features yourself.

We have shown you during the last weeks how RapidMiner 5 handles meta data and automatically transform it during the process design time. This is a key component of RapidMiner 5 since the meta data transformation not only simplifies the graphical user interface by providing, for example, the names of the transformed attributes in interface components. Moreover, the meta data transformation is the foundation of ongoing process checks which will show you possible problems as early as possible and will also assist you by providing hints how to solve problems (see the quick fix discussion below).

However, the meta data transformations are of course only possible if any meta data exists at the first place. And here the new Repositories come into the game: you can have several repositories and you can use them to organize your analysis projects, your data, and your data mining processes.

 


 

Data can simply be imported to the repository by drag'n'drop. This makes data integration as easy as possible. Once imported, the data is stored together with its meta data which can hence be used during process design without having the data loaded at all.

Flow Design, Meta Data Transformations, and Repositories are the three main components of RapidMiner 5. Together they simplify your analysis work a lot and extend the possibilities for your data analysis at the same time. Just check out the upcoming RapidMiner 5 Beta.

UIRapidMiner 18 Sep 2009
Approaching Vega (Episode V: Making cool things from place holders) by Simon Fischer Comment (0)

RapidMiner 5 comes with a docking framework that allows you to select and move around user interface components in order to design the interface according to your needs. Earlier versions of RapidMiner used to present process results in multiple tabs, simply displaying empty space when no results were generated yet. Since every result tab is a freely movable UI component in RM 5, there is no component which would fill up the free space when no result tabs are present - the UI would simply collapse and neighbouring components would take over the free space. This would clearly be ugly, so we started by adding an empty component serving as a place holder reserving space where new results would be added.

It quickly became clear that having the largest part of the result perspective filled with empty space is not particularly less ugly, so we decided to fill it up with something useful. What would be more obvious than to give a new home to the result history? What do you mean, you don't know the result history? Everyone should know the result history. Well. Admittedly, the old result history did not make it into the top ten of RapidMiner's usability charts, but it has always been a nice feature that noone used.

For RM 5, we designed a completly new result history which looks like this:

As you see, the result history presents an entry for each process execution and lists all results, each presented as a thumbnail or textual represenation. Thus, you can go back in time, look at the results produced by earlier versions of your process, possibly re-open them, compare performances, and restore the particular process version if you find it performing better. Having this history readily available, provides terrific assistance for rapid process design.

This way, what was originally intended to be a place holder became one of my favourite RM 5 features.

VegaRapidMinerquick fixmeta data 16 Sep 2009
Approaching Vega (Episode IV: Quick Fixes) by Ingo Mierswa Comment (0)

The alpha test phase of RapidMiner 5 (internal name: Vega) is about to end and we are looking forward to the upcoming beta test. Today, I would like to describe another great feature of RapidMiner 5, namely the quick fixes. In RapidMiner 5, you will usually retrieve your data from a repository where the data itself together with the meta data is imported and then stored. We will discuss the new repository in one of our next blog entries. One of the main advantages is that we can use the meta data from the repository and let the operators transform it during the process design time.

That means, that the process does not have to be performed in order to get a "picture" of an operator's or even the whole process' outcome. You just have to move your mouse pointer over an output port of an operator and you will get an description of the expected data. This alone is a great feature and has already be mentioned by Simon in one of his posts.

Another nice side effect is that we are now able to better support our users by providing them a collection of hotfixes (we call them "quick fixes") in cases where an operator already detects that it can not be applied on the provided data. Let's think about a simple example: you are going to load the well known Iris data set consisting of numerical attributes only from your repository. You might have decided that you want to model the data with help of the ID3 decision tree learner. Unfortunately, this learning scheme cannot be applied on numerical attributes. In contrast to former RapidMiner versions, this is already detected during process design  time and the user gets a collection of applicable quick fixes, e.g. the user can simply transform the numerical attributes into nominal ones by means of discretization. Double clicking in the quick fix region on the "Problems" tab in the lower part of the screen brings up the quick fix dialog. The quick fix is selected and then applied. That's it: fast and simple.

VegaRapidMinerflow layout 26 Aug 2009
Approaching Vega (Epsiode III: Flow vs. Tree) by Ingo Mierswa Comment (1)

Today I loaded an old process I once designed as an example for one of our customers. The process is not too complicated and only consists of a few operators. In order to test the import mechanism of the alpha version of Vega, I first loaded the process in RapidMiner 4.5 and checked the process setup and the results. Here is what the process looks like as operator tree (the image was taken from RapidMiner 5):

 This process seems to be pretty linear, right? Of course not as all experienced RapidMiner users notice at once. The process setup as a tree only looks quite linear, but the internal result stack (read the entry Simon has posted some days ago) and the two IO multipliers make things a bit more complicated.

The next thing I did was to import this process to RapidMiner 5 and had a look at the process  in the new flow view. Here is the result:

 I only rearranged the locations of some operators and exported the picture above. After 8 years of being a hardliner in defending the operator tree + result stack idea for process design, I got the feeling (again ;-) that this flow layout with the explicit data flows might be much easier to understand. In particular, this is probably true for non-computer-scientists which are not used to concepts like stacks and trees.

 Same process, same results. Although I still like the tree and sometimes (as Simon has pointed out) it is still necessary in order to define the order of independent subprocesses, I am really impressed by the importing capabilities of RapidMiner 5 and the nice look of the graph and I hope that this makes process design much easier  - especially for less experienced users.

And what about efficiency in process design? How does the flow layout compares to the tree in this respect? Well, here the meta data transformation Simon has described is a big help. Unless you turn this feature off, all new operators are automatically wired according to fitting meta data descriptions of the connection ports. So in most cases, you still only have to drag the operator to the right position and RapidMiner does the connection itself. So the effort is about the same as for the tree.

 Clear design, explicit flows, same effort. Looks to me that the new flow design will turn out to become the winner of the challenge "flow vs.  tree".

meta dataflow layout 14 Aug 2009
Approaching Vega (Episode II: Meta Data) by Simon Fischer Comment (0)

Those of you who are using RapidMiner for some time probably came across the "Validate process" button. Pressing this button results in some sanity checks and a dry run of the process which passes around dummy objects to see whether all operators receive the correct input. Whereas this was a helpful feature to check for gross errors in the process setup, it is but a fraction of what RapidMiner 5.0 will offer.

From RapidMiner 5.0 on, the process validation will be much more powerful and detailed. First off, it is no longer necessary to jump to a breakpoint to see what data will arrive at a certain operator. Since we have an explicit data flow, we can easily check where the data assigned to the input of an operator comes from, and which operators it went through up to here.

More importantly, operators also provide much more detailed information about pre- and postconditions. Learners specify what kind of input data and label type they can handle (nominal vs. numeric), whether they can deal with missing values, etc. Preprocessing operators know how they transform the data and annotate their results accordingly.

E.g. one common pitfall was to  create a process for regression learning containing an SVM, but forgetting to set the kernel type to a regression kernel.

 

The above screenshot shows what happens in Vega. The data is loaded from a repository, where it is stored together with meta information about attribute types, statistics, etc. The Normalization operator transforms this meta information: The range of the selected attribute is changed to the interval [0,1]. Finally, the SVM checks its input and reports that it cannot handle a numerical label for the C-SVC kernel type.

All this information is updated on a click. As soon as the kernel type is changed, or the problem is solved in a different way, e.g. by discretizing the data, the error vanishes. All these repair options are offered to the user as quick fixes.

websitebugs 5 Aug 2009
New Bug Tracker by Simon Fischer Comment (0)

Up to today, the Problems and Support forum has been the central place for filing bug reports. As of today, we have installed a new bug tracker at

  http://bugs.rapid-i.com/

The Bugzilla Web interface is easy to use and offers many helpful features for both users and developers.

You can also use the bug tracker to submit feature requests. In particular, the bug tracker contains a voting mechanism so we can see which features are desired by many users and should be implemented first.

For the 5.0 release (Vega) we have added an additional component in the bug tracker  which will help the alpha testers with reporting issues with the new version. Please read the description to see how to best file bug reports for this component. If you did not yet register for the alpha test, you can do so here.

<< Start < Prev 1 2 Next > End >>