Open source software for big data analytics.
No programming required.

HomeContact UsSearchSitemapPrivacy PolicyImprint
  • Deutsch
  • English
Rapid-I. Report the Future. Home Download
Rapid-I Blog
Home Home
Search Search
RSS Feed RSS Feed

 

 

Blog Tags
Login Form





Lost Password?
No account yet? Register
RCOMMRapidMinerHadoop 8 Jul 2011
Big data analytics made easy: Radoop by Ingo Mierswa Comment (0)

Those of you who visited RCOMM 2011 already know about Radoop , the powerful combination of RapidMiner with Hadoop. This make big data analytics easier then ever. I missed the talk myself (shame on me!) but we had a lot of fruitful discussions afterwards and from my point of view this will become the next RapidMiner revolution. Below you will find some information about the project.

What is Hadoop?

Hadoop is is a software framework that supports data-intensive distributed applications. It is based on Google now well-known map & reduce paradigm which makes it an excellent tool for analyzing large data sets. In principle, Hadoop is able to work with thousands of computing nodes on petabytes of data.

 

 

What about Hive and Mahout?

Hive is a data warehouse infrastructure built on top of Hadoop, i.e. it uses the distributed file system of Hadoop and the efficient access technologies. Hive was initially developed by Facebook and is now used and developed by many other companies for their distributed data warehouse.

Mahout is a machine learning library already offering many scalable machine learning libraries implemented as well on top of Hadoop and its map & reduce paradigm. Hence, Mahout is one of the first distributed data analytics framework making use of the power of Hadoop.

You will see below that both frameworks will be tightly integrated with RapidMiner.

What can RapidMiner bring into the game?

Hadoop is great for large scale analytics, but it lacks an easy-to-use graphical interface. RapidMiner is an excellent tool for data analytics, but unless the analyst is not performing some nasty tricks, the data size is limited by the memory available. So we have the algorithms, the support for analytical process design, the user interface, and of course the community with a demand for large-scale analytics.

RapidMiner + Hadoop = Radoop

Radoop combines the strengths of RapidMiner and Hadoop. The result is a RapidMiner extension for editing and running ETL, data analytics and machine learning processes over Hadoop. The developers have closely integrated the highly optimized data analytics capabilities of Hive and Mahout, and the user-friendly interface of RapidMiner to form a powerful and easy-to-use data analytics solution for Hadoop.

Here is the presentation of Zoltán Prekopcsák which he made at the RCOMM 2011:

 

 

Right now, a restricted beta phase has started and you can apply for it at http://radoop.eu/ . More information about Radoop can be found at http://blog.radoop.eu/.

RCOMM 28 Jun 2011
RCOMM 2011 Review - Day 2 by Ingo Mierswa Comment (0)

The second day started with another invited talk, namely Matthias Reif of the Deutsche Forschungszentrum Künstliche Intelligenz (DFKI) talking about Towards Next-Generation Data Mining. Matthias has presented very interesting insights about new trends in data analysis, including the data-driven recommendation of classifiers and the prediction of classifier accuracy and resource consumption. He depicted the integration of those techniques into server-based solutions like RapidAnalytics which will be the next step towards a collaborative data analysis in the cloud. Very fascinating! Matej Mertik of the Faculty of Information Studies in Novo mesto  then presented an application of RapidMiner in the medial domain. I must admit that I did not fully get the connection between feature selection and the presented game of life approach but I am sure that we will get a chance to sort these things out later on. The session was concluded by Andrew Chisholm of the ITB with a talk about possibilities of cluster evaluations. This was really a great talk – within 30 minutes Andrew has perfectly explained his route through the pitfalls around unsupervised data analysis on a real-world problem. Andrew is an experienced speaker and told a great story with many nice ideas behind – it was really a pleasure to listen to him.

The second session on this day covered new Extensions for RapidMiner and RapidAnalytics. The first talk of Radim Burget of the Brno University of Technology discussed their new Image Mining Extension which is already available on out marketplace (see below). It looks great and I will certainly give it a try soon! Afterwards, Milos Jovanovic of the University of Belgrade presented a combination of their WhiBo toolkit presented last year with a genetic programming approach. The result is an optimized decision tree composed of the single steps and sub-algorithms known from different decision trees and their implementations. This is pretty close to some ideas of my masters and PhD thesis so I very much liked this idea (go guys and make it multi-objective next!) ;-)

 

Simon Fischer presented new Extensions and the Rapid-I Marketplace.

 

Simon Fischer of Rapid-I concluded this session with an overview of upcoming RapidMiner Extensions and new features which will be release during the next weeks and months, including the new operator recommender . Simon has also presented the new marketplace (http://marketplace.rapid-i.com ), which serves as a central store for RapidMiner Extensions and analytical algorithms. Simon then showed some of the business analytics features of RapidAnalytics, namely the pixel-precise report designer and the integration of analytical results into interactive web-based reports.

The last session on the second day covered text and web mining. Felix Jungermann of the TU Dortmund presented new techniques for handling tree structures in RapidMiner. He showcased these extensions for information extraction and relation detection. Bruno Ohana of the ITB in Dublin then presented a hot topic right now: sentiment analysis and opinion mining – of course done with RapidMiner. This was an interesting talk and a comparison to other approaches demonstrated the very high quality of the results. The last talk of the conference by Clemens Forster of the Vienna University of Economics and Business also covered sentiment analysis in customer feedbacks.

 

Live music in the Temple Bar.

We made a trip to the Temple Bar district afterwards and visited some of the most famous pubs in Dublin. I tasted strawberry beer (well, interesting…) and had listened to good music. It was almost a miracle that more than 20 participants managed to visit the certification exam on Friday after this evening ;-)

Photos on the Rapid-I page on Facebook: http://www.facebook.com/media/set/?set=a.197594186959171.63782.120786031306654

Thanks again for the great evening and also for the conference – I hope that we will all see each other next year at the latest!

RCOMM 27 Jun 2011
RCOMM 2011 Review - Day 1 by Ingo Mierswa Comment (0)

Directly after the RCOMM, I was on vacation and therefore did not get the time to write something about RCOMM 2011 until now, sorry. Here is a review for those who visited the conference or who want to learn more about what happened in Dublin a couple of weeks ago.

RCOMM 2011 was again a huge success! It was great to meet many users of RapidMiner and RapidAnalytics again after the first community meeting in Dortmund last year. Many visitors from RCOMM 2010 also found their way to Dublin and started to build the core community around RapidMiner. So first of all: Thanks to all who attended and especially to those who contributed to the conference by giving a talk about their analysis work or new RapidMiner Extensions.

 

 A couple of participants enjoying the 2011 version of our now-famous game show "Who wants to be a Data Miner?"

Monday was a public holiday in Ireland, and so we started on Tuesday with two parallel half-day training courses, one for beginners and one for more experienced analysts. A second set of parallel training sessions took place on Friday morning directly before the exam.

 

Day 1

The actual conference started on Wednesday morning with an invited talk of Prof. Dr. Fionn Murtagh, who is the director of the Science Foundation Ireland and an experienced data analyst. He pointed out the usefulness of ultrametrics for clustering and exemplified this through a wide range of case studies. These included the Colombian social violence between 1990 and 2004 as well as some very interesting insights into optimal movie plots. Fionn is a great scientist and I enjoyed his talk very much. Matko Bošnjak and Nino Antulov-Fantulin of the Ruder Boškovic Institute then described analysis processes for recommendation systems in RapidMiner. They presented ready-to-go workflows which can simply be used or easily adapted to own situations and I am sure that many users will find those really useful. The templates are available on myExperiment with the RapidMiner Community Extension . They also pointed out the currently running data analysis challenge on TunedIT for recommender systems. Still 11 days left – you should consider participation! Beside the 5500 Euro price money, Rapid-I is also sponsoring a free trip to next year’s RCOMM 2012 for the best RapidMiner process!

I unfortunately missed the next session since I have a business event in parallel. Benjamin Schowe of the TU Dortmund presented his work about feature selection methods in RapidMiner and, afterwards, Marcin Blachnik of the University of Bielsko-Biala presented a new Extension for instance selection. I really was interested also in the next talk about Radoop , a combination of RapidMiner with the map & reduce framework Hadoop, but for a second time already I missed the talk of Zoltan Prekopcsak of the Budapest University of Technology and Economics. Nothing personal, Zoltan, and I am sure that we will collaborate in future anyway!

The next session covered various applications of RapidMiner and RapidAnalytics. I was in particular excited to see a first contribution which already has used the new RapidMiner server RapidAnalytics given by Gábor Nagy of the Budapest University of Technology and Economics introducing a stock price prediction system based on RapidAnalytics. Afterwards, Simon Jupp of the University of Manchester presented a combination of RapidMiner with Taverna , a web service based workflow system offering lots of functionality for bioinformatics. The final talk in this session was given by Milan Vukicevic of the University of Belgrade about the classification of electricity customers with WhiBo decision trees.

The next session was divided into two parts: first, I gave a workshop about some basics of loop & macro usage. This was sort of a preparation for this year's game show "Who wants to be a Data Miner?". We got three tasks (hobbit genealogy, drawing a spiral, and distinguishing between vodka and presidents). I won the first one myself (yeah!), the second task was solved by Benjamin and Matko defended his title from 2010 in the third one. Thanks to all participants and congratulations to Benjamin and Matko!

 

Tomorrow I will add another post describing the second day of the conference and the certification exam. Stay tuned!

 

RapidMinerMarketplaceExtensions 30 May 2011
Rapid-I Marketplace Launched by Simon Fischer Comment (0)

Over the years, many of you have been developing new RapidMiner Extensions dedicated to a broad set of topics. Whereas these extensions are easy to install in RapidMiner - just download and place them in the plugins folder - the hard part is to find them in the vastness that is the Internet. Extensions made by ourselves at Rapid-I, on the other hand,  are distributed by the update server making them searchable and installable directly inside RapidMiner.

We thought that this was a bit unfair, so we decieded to open up the update server to the public, and not only this, we even gave it a new look and name. The Rapid-I Marketplace is available in beta mode at http://rapidupdate.de:8180/ . You can use the Web interface to browse, comment, and rate the extensions, and you can use the update functionality in RapidMiner by going to the preferences and entering http://rapidupdate.de:8180/UpdateServer/ as the update server URL. (Once the beta test is complete, we will change the port back to 80 so we won't have any firewall problems.)

As an Extension developer, just register with the Marketplace and drop me an email (fischer at rapid-i dot com) so I can give you permissions to upload your own extension. Upload is simple provided you use the standard RapidMiner Extension build process and will boost visibility of your extension.

Looking forward to see many new extensions there soon!

RapidMinerPoll 26 May 2011
RapidMiner and R most popular solutions - Thanks by Ingo Mierswa Comment (2)

RapidMiner and R were again the most popular tools - open source but also overall - followed by SAS.

In the annual poll on the leading data analysis portal KDnuggets.com , RapidMiner was again selected as the most widely used solution for data analytics in the world. The questions was:

Which data mining/analytic tools you used in the past 12 months for a real project?

About 30% of all analysts select RapidMiner as solution for their analytical tasks in 2011. RapidMiner was already the most popular tool in 2010 and  defended the title in 2011.

The poll had a record participation with more than 1,100 voters. Among them, 43% used only commercial software, 32% only free software, and 25% both. The average number of tools per user was 2.2. Interestingly, the survey of KDnuggets readers also revealed that, in terms of using open source software, Western Europe is up among the forerunners again.


Thank you for your votes!

This success was only possible by the huge efforts of our community. I would really like to thank all supporters - I am sure that more people become aware of RapidMiner now!

Read the full story here: http://www.kdnuggets.com/2011/05/tools-used-analytics-data-mining.html

Kind regards,
Ingo

 

 

RapidMiner 12 May 2011
Please support RapidMiner at the KDNuggets poll 2011 by Ingo Mierswa Comment (3)
We really like our work and give our best to provide you a feature-rich data mining platform. And as you of course all know, the Community Edition of RapidMiner is completely free of charge. Isn't that nice?

But from time to time, we will need something back and this is the reason why I ask for your support here:

On his really great data mining web site KDnuggets, Gregory asks once in a year his visitors for the data mining tools they have used within the last months. And here is where you come into this game: please vote for RapidMiner in the annual poll of KDnuggets and help us to get more widely known among analysts and researchers worldwide. This, at the end, will of course help to further improve RapidMiner and so you will actually get something back for only a small amount of your time.

Direct Link to the Poll at KDNuggets: http://www.kdnuggets.com/

Things are incredibly simple:

  • Visit the web site KDnuggets: http://www.kdnuggets.com
  • Select RapidMiner in the poll box on the bottom right
  • Click on "Submit Vote".


That's it! It's really easy and costs only a second...

 

 ...but a giant leap for RapidMiner!

 

Let me end this post and request with a big thank you for participating in this poll as well as for the many comments and feature requests we got during the last years. Things like that help us to improve RapidMiner. So help to spread the word so that we will get more comments in future and further improve it.

Cheers,
Ingo

reportingRapidAnalytics Video Tutorialdesign 11 May 2011
RapidAnalytics 8: Using Style Bundles by Simon Fischer Comment (0)

Throughout our RapidAnalytics video series we have designed a report with various components, but we have always left the settings that determine the look of the report at their default values. In fact, there are very many of these settings, and it would be tedious to define them for every report and every component individually.

In this post I will show how you can use style bundles to define the look of your reports once and apply it to your reports with a single click.

RapidMinerEvents 2 May 2011
Suggestion System for TV Programs: MythMiner by Ingo Mierswa Comment (0)

Balázs Bárány has released a great new application (called MythMiner ) a couple of months ago. The goal for this application is to suggest TV programs based on the programs recorded by the user so far. This is similar to Tivo but it is based on the open source video recorder software MythTV . With MythMiner, the users of MythTV can now receive daily suggestions for new TV programs they might be interested in. MythMiner can even automatically schedule recordings of interesting programs if the user likes. Here is the link to the system:

http://tud.at/programm/mythminer/

 

 MythTV Logo

 

MythMiner makes use of RapidMiner or RapidAnalytics and is basically a pretty complex analysis process  which performs a classification for "interestingness" based on the previously recorded programs. More information can be found in this forum post .

Balázs Bárány is presenting MythMiner in a talk at the Linuxwochen in Vienna on Saturday, May 7th, 2011. So if you are coming from Austria, I am sure that visiting his talk will be an interesting and inspiring experience.

Schedule: http://www.linuxwochen.at/index.php?option=com_content&view=article&id=216&Itemid=40

SurveyRapidMiner 28 Apr 2011
Please Vote and Support RapidMiner by Ingo Mierswa Comment (0)

Hi,

I just wanted to point out that there is a new round of the well-known survey of Rexer Analytics. The insights delivered by this survey are of great value for us since they allow us to see where we can further improve RapidMiner and our other products.

And beside of that: if many of you state that RapidMiner is really a true masterpiece of Software, more people will become aware of it which also helps a lot :-)

So please take part in the survey (it only takes about 10 minutes) and point out that you use RapidMiner and why you like it. Thanks for that!

Link to the survey: http://www.rexeranalytics.com/Data-Miner-Survey-2011-Intro2.html

 Access Code: 5R6RF

 Thanks for supporting RapidMiner and for spreading the word!

 Please vote for RapidMiner

Cheers,

Ingo

RapidMinerchallenge 27 Apr 2011
New Contest: Euro 5500 Price Money and Free Trip to RCOMM 2012 by Ingo Mierswa Comment (2)

A new data mining challenge has been launched within the e-LICO project.

The challenge is called ECML/PKDD Discovery Challenge 2011: VideoLectures.Net recommender system. The contest consists of two recommender system problem tasks and a side-by workflow contest in which Rapid-I sponsors the best RapidMiner workflow with an free admission to RCOMM 2012 including flight and hotel.

 

 ECML/PKDD Challenge


The total award of the challenge sums to Euro 5500 - plus the free trip to the RCOMM 2012!


All details on the challenge can be found at

http://www.ecmlpkdd2011.org/challenge.php.

We are looking forward to your RapidMiner workflows.

Good luck!

  • Share/Bookmark
  • Stay tuned with our RSS feed!
  • Watch videos on our YouTube channel!
  • Rapid Insight / Inside Rapid-I (Blog)
  • Visit Rapid-I on Facebook and become our fan!
  • Follow Rapid-I on Twitter!
  • Read the Rapid-I Newsletter