Open source software for big data analytics.
No programming required.

HomeContact UsSearchSitemapPrivacy PolicyImprint
  • Deutsch
  • English
Rapid-I. Report the Future. Home Download
Rapid-I Blog
Home Home
Search Search
RSS Feed RSS Feed

 

 

Blog Tags
Login Form





Lost Password?
No account yet? Register
RCOMM 14 Sep 2010
RCOMM 2010 - Day 1 by Ingo Mierswa

For the first paragraph I'll make it short: I love it!

After the introduction trainings yesterday, we started with the actual conference today and directly had a lot of really great talks. Katharina Morik began with her invited talk about data mining under constrained resources. This was of course not the first time I heard a talk given by her but as always it was an inspiring experience just listening to Katharina and her visions about what can be expected for data mining taking current and upcoming applications into mind.

In the next session, Kyle Goslin from the ITB in Dublin has presented a cool tutorial wizard tool which can be used to easily design new RapidMiner tutorials which can for example used for lectures. This is probably a great extension helping teachers to use RapidMiner in their data mining courses a lot - so please stay tuned since we will add the Extension soon or directly integrate it in RapidMiner.

The next talk was given by  Christian Kofler from the DFKI in Kaiserslautern and covered a nice integration of landmarking features for meta learning which have been presented by Sarah Abdelmessih later in the afternoon. First, I couldn't believe it but I have seen the first fully integrated (and working!) system for meta learning in my life. The PaREN extension efficiently calculates a set of only four landmarking features and predicts the accuracy of seven learning schemes based on those meta features. Even if the accuracy predictions are not 100% correct by themself, this is incredibly helpful since the ranking was almost perfect. If the accuracy prediction indicate that Naive Bayes is the best way to go, it very likely is the best scheme for the data at hand. Don't try different model types yourself: just ask the PaREN extension in the future. Cool stuff.

Then Floarea Serban from our e-LICO partner (University of Zurich) has presented the workflow planner for intelligent discovery assistance. The goal is quite similar to that of the PaREN Extension but concentrates on the whole data mining process instead of the selection of the optimal learning scheme alone. Again, this was a fantastic demonstration of user support: you simply define the data set and the goal you want to achieve ("discretize all features") and it generates all processes which will solve this task. Great for beginners but also for RapidMiner experts who want to quickly solve routine tasks.

I unfortunately missed the talk by Zoltan Prekopcsak about Cross-Validation: the illusion of reliable performance estimation so I cannot say much about it, sorry. But I heard that it was also a really interesting talk which inspired the participants to a lot of discussion afterwards - and this is almost always a good sign.

Milan Vukiecevic gave a talk about WhiBo, which is like having a mini-RapidMiner within RapidMiner. They divided well known algorithms like decision trees into their components which can now be almost arbitrarily combined. This allows for the easy development of already known algorithms (like the many different decision tree or k-means variants) but also simplifies the detection of new ones. I would love to see a genetic programming approach combining these components automatically for a given data mining task in order to construct the optimal modeling scheme.

Tobias Malbrecht had then shown some small processes for creating reports within RapidMiner.  There were unfortunately some technical issues with the browser, file locking, and the wireless network but I think the participants were able to see the process based reporting style together with a new Portal report generator which will be part of the next release of the Reporting Extension.

The final session ended with a game show "Who wants to be a data miner?" which was hosted by Simon Fischer and Sebastian Land. They challenged the contestants with three well designed tasks: Creating the text of "99 bottles of beer" as example set, impute the values for a missing column, and calculating the Fibonacci numbers with a RapidMiner process. Two well-experienced RapidMiner consultants were not able to solve the third challenge in time - but Matko did great, although he was using RapidMiner for a couple of months only. Congrats to you, Matko!

 Ok, I was really bad at the game show myself but you can't imagine how hard it is to design a recursive process having 50 people watching you while Simon makes funny comments and game show music is playing in the background. However, the great discussions during our dinner tonight helped me to overcome the shame ;-)

I am looking forward to the second day and I expect many more fantastic talks for tomorrow. See you there!

 

Comments (0)add comment

Write comment

busy
  • Share/Bookmark
  • Stay tuned with our RSS feed!
  • Watch videos on our YouTube channel!
  • Rapid Insight / Inside Rapid-I (Blog)
  • Visit Rapid-I on Facebook and become our fan!
  • Follow Rapid-I on Twitter!
  • Read the Rapid-I Newsletter