Open source software for big data analytics.
No programming required.

HomeContact UsSearchSitemapPrivacy PolicyImprint
  • Deutsch
  • English
Rapid-I. Report the Future. Home Download
Rapid-I Blog
Home Home
Search Search
RSS Feed RSS Feed

 

 

Blog Tags
Login Form





Lost Password?
No account yet? Register
Tag >> Vega
VegasourceforgeRapidMinerdevelopment 6 Jan 2010
RapidMiner 5 branch back on sourceforge SVN by Simon Fischer Comment (1)

For the contributers and developers amogst you: The RapidMiner development branch (as well as the stable 4.6 branch) are finally back on sourceforge, accessible under their respective codenames:

https://yale.svn.sourceforge.net/svnroot/yale/Vega (5.x)

 https://yale.svn.sourceforge.net/svnroot/yale/Wasat (4.6)

 For performance reasons, these are not the live repositories. They are mirrored between 4:00 and 5:00 am CET.

All the best for 2010!

Simon

VegaRepositoriesRapidMiner 30 Sep 2009
Approaching Vega (Episode VI: Repositories) by Ingo Mierswa Comment (0)

This is probably the final episode of our "Approaching Vega" story: RapidMiner 5 Beta will be released during the next days and then you can try all of the cool new features yourself.

We have shown you during the last weeks how RapidMiner 5 handles meta data and automatically transform it during the process design time. This is a key component of RapidMiner 5 since the meta data transformation not only simplifies the graphical user interface by providing, for example, the names of the transformed attributes in interface components. Moreover, the meta data transformation is the foundation of ongoing process checks which will show you possible problems as early as possible and will also assist you by providing hints how to solve problems (see the quick fix discussion below).

However, the meta data transformations are of course only possible if any meta data exists at the first place. And here the new Repositories come into the game: you can have several repositories and you can use them to organize your analysis projects, your data, and your data mining processes.

 


 

Data can simply be imported to the repository by drag'n'drop. This makes data integration as easy as possible. Once imported, the data is stored together with its meta data which can hence be used during process design without having the data loaded at all.

Flow Design, Meta Data Transformations, and Repositories are the three main components of RapidMiner 5. Together they simplify your analysis work a lot and extend the possibilities for your data analysis at the same time. Just check out the upcoming RapidMiner 5 Beta.

VegaRapidMinerquick fixmeta data 16 Sep 2009
Approaching Vega (Episode IV: Quick Fixes) by Ingo Mierswa Comment (0)

The alpha test phase of RapidMiner 5 (internal name: Vega) is about to end and we are looking forward to the upcoming beta test. Today, I would like to describe another great feature of RapidMiner 5, namely the quick fixes. In RapidMiner 5, you will usually retrieve your data from a repository where the data itself together with the meta data is imported and then stored. We will discuss the new repository in one of our next blog entries. One of the main advantages is that we can use the meta data from the repository and let the operators transform it during the process design time.

That means, that the process does not have to be performed in order to get a "picture" of an operator's or even the whole process' outcome. You just have to move your mouse pointer over an output port of an operator and you will get an description of the expected data. This alone is a great feature and has already be mentioned by Simon in one of his posts.

Another nice side effect is that we are now able to better support our users by providing them a collection of hotfixes (we call them "quick fixes") in cases where an operator already detects that it can not be applied on the provided data. Let's think about a simple example: you are going to load the well known Iris data set consisting of numerical attributes only from your repository. You might have decided that you want to model the data with help of the ID3 decision tree learner. Unfortunately, this learning scheme cannot be applied on numerical attributes. In contrast to former RapidMiner versions, this is already detected during process design  time and the user gets a collection of applicable quick fixes, e.g. the user can simply transform the numerical attributes into nominal ones by means of discretization. Double clicking in the quick fix region on the "Problems" tab in the lower part of the screen brings up the quick fix dialog. The quick fix is selected and then applied. That's it: fast and simple.

VegaRapidMinerflow layout 26 Aug 2009
Approaching Vega (Epsiode III: Flow vs. Tree) by Ingo Mierswa Comment (1)

Today I loaded an old process I once designed as an example for one of our customers. The process is not too complicated and only consists of a few operators. In order to test the import mechanism of the alpha version of Vega, I first loaded the process in RapidMiner 4.5 and checked the process setup and the results. Here is what the process looks like as operator tree (the image was taken from RapidMiner 5):

 This process seems to be pretty linear, right? Of course not as all experienced RapidMiner users notice at once. The process setup as a tree only looks quite linear, but the internal result stack (read the entry Simon has posted some days ago) and the two IO multipliers make things a bit more complicated.

The next thing I did was to import this process to RapidMiner 5 and had a look at the process  in the new flow view. Here is the result:

 I only rearranged the locations of some operators and exported the picture above. After 8 years of being a hardliner in defending the operator tree + result stack idea for process design, I got the feeling (again ;-) that this flow layout with the explicit data flows might be much easier to understand. In particular, this is probably true for non-computer-scientists which are not used to concepts like stacks and trees.

 Same process, same results. Although I still like the tree and sometimes (as Simon has pointed out) it is still necessary in order to define the order of independent subprocesses, I am really impressed by the importing capabilities of RapidMiner 5 and the nice look of the graph and I hope that this makes process design much easier  - especially for less experienced users.

And what about efficiency in process design? How does the flow layout compares to the tree in this respect? Well, here the meta data transformation Simon has described is a big help. Unless you turn this feature off, all new operators are automatically wired according to fitting meta data descriptions of the connection ports. So in most cases, you still only have to drag the operator to the right position and RapidMiner does the connection itself. So the effort is about the same as for the tree.

 Clear design, explicit flows, same effort. Looks to me that the new flow design will turn out to become the winner of the challenge "flow vs.  tree".

VegaRapidMinerflow layoutalpha 31 Jul 2009
Approaching Vega (Episode I: The flow layout) by Simon Fischer Comment (39)

As many of you may know by now, we are going to release a new major version of RapidMiner soon. Version 5.0, codename Vega, will come with a whole bunch of new features, some of which I am going to present in more detail in this blog during the next days and weeks.

The first feature  I'm presenting here is the new flow layout which Vega will have in addition to the well-known Tree. (I'm not presenting a screenshot, an early version has been posted here.) For us, the Tree has for many years been a powerful companion, representating data mining processes in a neat and compact way. Overseeing processes the size of a cross validation on Iris with a glimpse, the non-expert was easily lost in the implicit data flows defined by the Tree. I would not be surprised if the one or other experienced RapidMiner trouper was unaware of how this data flow was actually defined, so let's recap for a second.

In earlier RapidMiner versions, at any stage during the execution of a process, all data currently available is kept on a stack. An operator being executed grabs hold of the first object of the desired type (example set, model, etc.) and pushes its results back on top of the stack. Thus, if several example sets have been generated, the operator consuming an example set will always use the one generated by the most recently executed operator. Wherever this behaviour did not suit the needs of the process designer, operators had to be used to rearrange the ordering of the stack: IOSelector (bringing an object to the front of the stack), IOConsumer (deleting useless objects), IOMultiplier (making copies of objects), and combinations of IOStorer and IORetriever (sharing objects across far away places of the process). In complicated processes, this can easily clutter up the process making it hard to follow what is going on.

In Vega we will have a flow layout view which makes this data flow explicit: operators will have input ports and output ports which can be connected to each other in an arbitrary fashion (as long as one doesn't construct cycles). Does this mean we have to say goodbye to the good old Tree? No. One of our design goals was to keep the possibility to use the implicit data flow defined by the Tree allowing for rapid process design wherever this is possible. At the same time, we provide the possibility to use the explicit flow layout whenever the tree would require fumbling with stack rearrangement operators.

The default behaviour of RapidMiner 5.0 will be the following: Whenever an operator is added to the process, the data flow will be constructed as to simulate the one that would have been implicit in pre 5.0 versions. Also, in case one ever gets confused, all connections can be deleted and restored in the old-fashioned way by the click of a button. This behaviour can be customized in various ways. From our experiences made so far, we can say that the new flow layout significantly eases process design, inserting operators at the beginning will not mess up the objects at later stages. At every stage it is perfectly clear which object ends up where. This is particularly true in combination with the new meta data transformation, which I am going to present in one of the next posts. 

For us, one of the challanges that comes with this new feature is to import old processes into RapidMiner 5.0. In these process files, the data flow is not explicitly stated and it cannot easily be simulated since it may, for a few operators, depend on the actual input data and exists only at execution time. We therefore take the following approach: First, the operators are loaded and left unconnected. Then, a data flow is automatically constructed as described above. Here, the stack rearrangement operators (IOConsumer and IOSelector) are taken into account for the construction of the flow, but they are not connected themselves and deleted afterwards. This imports a huge fraction of the existing processes correctly. One problem left open is the small set of operators for which the number of outputs generated depends on the input data, e.g. iterating operator chains and the ValueIterator. For these, we introduce a new collection I/O-Object type. Importing processes that use these operators may be problematic, but are easy to fix manually.

In fact, the import of processes from older RapidMiner versions is one of the main reasons why we will have an alpha test phase starting in mid August.  We seek the support of the community for feedback on which of your old processes are correctly imported, and which aren't. You can register for the alpha test here:

   http://rapid-i.com/content/view/150/177/lang,en/

 Any feedback is greatly appreciated. If you ever wanted to contribute to how RapidMiner is evolving, this is the time to do.

  • Share/Bookmark
  • Stay tuned with our RSS feed!
  • Watch videos on our YouTube channel!
  • Rapid Insight / Inside Rapid-I (Blog)
  • Visit Rapid-I on Facebook and become our fan!
  • Follow Rapid-I on Twitter!
  • Read the Rapid-I Newsletter