|Vega, RapidMiner, flow layout||26 Aug 2009|
|Approaching Vega (Epsiode III: Flow vs. Tree) by Ingo Mierswa||
Today I loaded an old process I once designed as an example for one of our customers. The process is not too complicated and only consists of a few operators. In order to test the import mechanism of the alpha version of Vega, I first loaded the process in RapidMiner 4.5 and checked the process setup and the results. Here is what the process looks like as operator tree (the image was taken from RapidMiner 5):
This process seems to be pretty linear, right? Of course not as all experienced RapidMiner users notice at once. The process setup as a tree only looks quite linear, but the internal result stack (read the entry Simon has posted some days ago) and the two IO multipliers make things a bit more complicated.
The next thing I did was to import this process to RapidMiner 5 and had a look at the process in the new flow view. Here is the result:
I only rearranged the locations of some operators and exported the picture above. After 8 years of being a hardliner in defending the operator tree + result stack idea for process design, I got the feeling (again ;-) that this flow layout with the explicit data flows might be much easier to understand. In particular, this is probably true for non-computer-scientists which are not used to concepts like stacks and trees.
Same process, same results. Although I still like the tree and sometimes (as Simon has pointed out) it is still necessary in order to define the order of independent subprocesses, I am really impressed by the importing capabilities of RapidMiner 5 and the nice look of the graph and I hope that this makes process design much easier - especially for less experienced users.
And what about efficiency in process design? How does the flow layout compares to the tree in this respect? Well, here the meta data transformation Simon has described is a big help. Unless you turn this feature off, all new operators are automatically wired according to fitting meta data descriptions of the connection ports. So in most cases, you still only have to drag the operator to the right position and RapidMiner does the connection itself. So the effort is about the same as for the tree.
Clear design, explicit flows, same effort. Looks to me that the new flow design will turn out to become the winner of the challenge "flow vs. tree".