Open source software for big data analytics.No programming required.
Rapid-I Blog
Blog Tags
 Username Password Remember me Lost Password? No account yet? Register
Tag >> myExperiment
 RCOMM, myExperiment, challenge 5 Sep 2012 Solving Sudokus with RapidMiner by Simon Fischer Comment (0)

This year's RCOMM live data mining challenge, "Who wants to be a data miner?", was a tricky yet fun task for the competitors. The task was to (partially) solve a Sudoku puzzle with RapidMiner. The solution shows that you can achieve virtually any data analysis task you can think of only by using standard RapidMiner processes.

The input data set consisted of examples of the form (x,y,v) where x and y indicated the column and row inside the puzzle and v was the number predefined at this cell. The path to the solution was split into three subtasks:

1. Task one was to generate the space of all combinations of cells and numbers that were possible if there were no predefined numbers in the Sudoku.This task could be solved by starting with a simple data set only containing the numbers one to nine and applying two Cartesian Product operators to generate all combinations of these numbers for x, y, and v. In addition to that, we also generate a new attribute z, which indicates the 3x3-sub-table in which the cell (x,y) lies, using a Generate Attributes operator.
2. This additional attribute z is useful for Subtask 2 which was to eliminate all combinations which are impossible, given a single predefined cell value from the input data set. This was possible by using a combination of a Generate Attributes operator and a Filter Examples operator to identify those combinations (from the set of all combinations generated in Subtask 1) where v and at least one of x, y, or z match a number defined in the input.
3. The resulting process of Subtask 2 could be re-used in Subtask 3 to eliminate all combinations whose impossibility could be inferred from looking at all predefined cell values in the input data set. This could be achieved by using a Loop Examples operator to iterate over all cells and using a nested Execute Process to re-use the process generated in Subtask 2.
Finally, by looking at those cells where only one possible number remains,we can identify a new value that can be inserted into the Sudoku for sure. These cells can be identified by using an Aggregate operator to group the remaining possibilities by x, y, and z and find those groups with a count of exactly one.
4. As a bonus process, we can repeat the process from Subtask 3, iteratively Appending the inferred numbers to the predefined ones. Thus, the data set grows up to a size of 81 which means the 9x9 Sudoku is complete. Finally, we use the Pivot operator and do some polishing to make the result look like this, a completely solved Sudoku:

All the processes mentioned above are accessible from myExperiment as a pack which you can use from within RapidMiner by using the Community Extension. The processes can be directly opened in RapidMiner, but when saving make sure to name them as they are named in myExperiment since an Execute Process operator in a later process may expect it under this name. Process 0 in this pack downloads the initial data from the Web.

 RapidMiner, myExperiment, Extensions, Community 11 Jan 2011 Video on RapidMiner Community Extension (myExperiment) by Simon Fischer Comment (0)

We have already blogged on the RapidMiner Community Extension here and here . The community extension enables you to share your RapidMiner workflows with a large comunity of data miners all over the world on the community platform myExperiment.org.

This can be a great benefit: You can learn about (and from) other people's work, make your own work more visible, get new ideas, and make friends with other data miners. I just made a small video showing how it works. Here it is:

 research, RCOMM, RapidMiner, myExperiment, Extensions, challenge 20 Sep 2010 RCOMM Challenge Processes and Extensions by Simon Fischer Comment (0)

At the RCOMM, we had a challenge in which data miners had to design RapidMiner processes solving unusual tasks. The three tasks were to design a process that creates the lyrics of "99 bottles of beer", apply a model on a data set of which a complete column was lost, and to create a process that computes the Fibonacci numbers. All winning solutions, challenge descriptions, and necessary data preparation processes are now on myExperiment:

I think they are worth looking at since they apply quite some clever tricks.

Furthermore, we have seen a lot of interesting and brand-new RapidMiner Extensions at the conference. One of them, made by the DFKI, assists the data miner in choosing an appropriate learner for their data set and saves you from trying a lot of different learners manually. The extensions is available from our update server and is described here:

Try it out!

 myExperiment, Extension, Community 9 Sep 2010 50 Processes on myExperiment by Ingo Mierswa Comment (0)

Good news for the users of the RapidMiner Community Extension. Up to now, 50 RapidMiner processes were uploaded to the myExperiment portal and can directly be browsed and downloaded into RapidMiner.

MyExperiment is a community website where people share workflows of various kinds. It is an active community, and the portal comes with all the nice social network features:

In one of our previous blog posts , we have described the Community Extension for RapidMiner in detail. The Community  Extension directly connects to myExperiment which means that you can easily upload the process you are currently working on with a single click. The extension also allows to browse RapidMiner processes on myExperiment and download them to your local machine directly from within RapidMiner.

I really like the idea of a data-mining-process-wiki which can serve as a common knowledge source for data analysts worldwide. And I am happy that so many people already wanted to share this knowledge with others, for example this nice process which can be used to replace missing values with other attributes' values:

More information about how to use the Community Extension can be found at http://www.e-lico.eu/?q=node/226

So you should download the extension from our update- and installation server in the Help menu of RapidMiner, activate the myExperiment view in the View menu and start to up- and download processes. Happy sharing!