Open Source Software für Big Data Analytics.
Ohne Programmierung.

HomeKontaktSucheSitemapDatenschutzImpressum
  • Deutsch
  • English
Rapid-I. Report the Future. Home Download
Rapid-I Blog
Home Home
Search Search
RSS Feed RSS Feed

 

 

Blog Tags
Login Form





Passwort vergessen?
Noch kein Benutzerkonto?
Registrieren
RapidMiner 18 Jun 2013
RapidMiner and R by Ingo Mierswa Comment (0)

RapidMiner brings the power of predictive analytics to the business user, with no programming required. 

However, analysts sometimes want to add custom code to their analytical processes.  This brief presentation reviews how to add custom R scripts to the enterprise-ready environment of RapidMiner, using an R extension.

 

 

RapidMiner and R are both widely used.   Our R integration offers the best of both worlds in enterprise environments. Here you can also see the R extension in action:

 

 

Have fun trying it .

RapidMinerKDnuggets 6 Jun 2013
RapidMiner Voted Most Used Analytics Software in KDNuggets Poll by Ingo Mierswa Comment (0)

Rapid-I and its users today are celebrating the top ranking of RapidMiner in KDNuggets’ 14th annual poll of Predictive Analytics, Big Data, Data Mining and Data Science Software software use. Our community has spoken and we are thrilled to be recognized as the top solution. Even more exciting is that so many users rely solely on Rapid-I solutions to get the job done (30.9 percent solely use our free solutions, while 52.4 percent solely use the commercial versions).

 

Usage percentages for popular solutions

 During this process, it was also rewarding to hear from many users who have had positive experiences with Rapid-I solutions and trusted their business and research needs to our platform. For more information about the poll, read our news release here: http://rapid-i.com/content/view/404/1/

Social NetworksRapidMinerRapidMinerRapid-IRadoopRadoopProcessModelingmahoutmachine learningHadoopHadoophackathonhack/reduceEventdatingClusteringClusteringbostonBlogBig DataBig DataAnalysis 27 Nov 2012
Radoop Team wins hack/reduce hackathon in Boston by Giuseppe Taibi Comment (0)

hack/reduce brands itself as Boston's Big Data hacking space. Backed by a who's who of Boston tech powerhouses, ranging from Harvard and MIT to Google and Microsoft, to the State of Massachusetts and top-tier VCs, hack/reduce is located in the historic Kendall Boiler and Tank building that gives the name to the vibrant Kendall Square technology district, brimming with startup excitement.

True to its mission of "helping Boston create the talent and the technologies that will shape our future in a big data-driven economy,” hack/reduce organized its first hackathon on Nov. 17. We at Rapid-I love Big Data so this was a terrific opportunity to mingle with the Boston Big Data community. Rapid-I's popular open source visual environment for data analysis RapidMiner can easily work on Big Data via Radoop, a RapidMiner extension that adds all the necessary operators to the standard set, so working on Big Data is as easy as drag-and-drop, no coding required. In addition to supporting Map/Reduce, Radoop includes a number of Machine Learning operators based on the powerful Mahout open source library. Mahout is known for being powerful, yet hard to use. Thanks to Radoop, working with Mahout is a breeze.

The day began with a tutorial on Hadoop by Greg Lu, a Software Engineer at hopper who is also the Technical Director of hack/reduce. Then teams were formed. The response to our "Big Data hacking without coding" pitch was terrific and our team quickly grew from four to over 20 members. We used Skype to keep everybody on the same page and troubleshoot. That worked great, especially since we had the original developers of Radoop online from Budapest, Hungary. We turned on the video chat and the remote team really felt like being in Cambridge.

Hackathon was great. At some point we had 25 people on Skype. The Radoop team from Hungary supported us during the entire 10 hours of the hackathon. At first, using a visual environment for a hackathon may sound counterintuitive, but in reality our teammates were really happy to be able to work at a higher conceptual level without having to wrestle with capricious code statements. In fact, our Radoop team was only bound by the power of the Hadoop cluster that we were working on. Because of the ease of use of Radoop, everybody was able to experiment with the data sets and the Hadoop cluster. As a result, the cluster was under stress and slowed down while trying to keep up with the number of job requests. The hackathon also helped the Radoop development team uncover a bug that slowed down processing of a clustering algorithm. (The bug is now fixed.)

Our team worked on a 25GB dating profiles database provided by Mate1.com. Other available databases included carbon dioxide measurements, Amazon.com product database, stock market prices, wikipedia and more (full list of Datasets available on the hackathon wiki). We were interested in performing cluster analysis to explore the similarities among user profiles. The Mate1 user profile attributes included age, gender, eye color, smoking habits, dating preferences, astrological signs, physical fitness, political views and many others.

For this task, we applied a K-Means clustering operator to the dataset, then used RapidMiner to create a scatter matrix plot to explore how the profile attributes were related to each other. We found out that most of the members only filled out the minimum number of fields on the profile. Also, for whatever reason, people with the same eye color also identify with the same body type. In almost every comparison we noticed that many people chose not to specify a value for an attribute. People definitely tend to enter the minimum information necessary to create a profile and start browsing other people profiles. One of the frustrations was the fact that the data set was normalized so we did not really know what was the exact meaning for a certain attribute value. Towards the end we started to reverse engineer this by creating our own profile on the Mate1.com website but then we ran out of time.

We also conducted an analysis to verify the "Half Your Age Plus 7 Rule" referring to the age difference among partners that is considered socially acceptable. More specifically, we mined the dating database to answer the question "What is the Oldest / Youngest Person that you are wiling to date?". In an very entertaining presentation, one team member exposed the harsh fact that for Gender "2,” the rule holds generally true, while for Gender "3,” there is a big difference in the form of members in their 20s and 30s willing to date partners much older than the 7+ rule. The database provided did not specify a text label for the gender, only a number, so feel free to guess which is which.

The main sponsor of the hackathon was hopper, a startup focused on redefining travel using Big Data, which is among the founders of hack/reduce.

Other teams also presented interesting work ranging from to a cool iPad app made by Praveen Aravamudham with a spinning earth globe mapping the CO2 emissions around the world, to the analysis of the most used words in Wikipedia (United States is the most used word).

Right after the team’s final presentation, all hackathon participants were given the opportunity to vote for the team that they thought produced the most interesting work. The Radoop team was off to a great start in the polls and led the race all the way until Andree Coude, VP Technology at hopper, declared the voting process over and Radoop team winner.

Now we are figuring out how to make the best use of the award of $1,000/month of computing power at SoftLayer. Stay tuned.

The video of the final presentation is available at: http://www.ustream.tv/recorded/27101415

Boston Team Members:

Sheamus McGovern - CTO, Capital Market Exchange and machine learning blogger

Todd Cioffi - Director, Technical Training, Navis Learning

Joe Rothermich - Data Scientist and Co-Founder, PeopleHedge

Dan Gerlanc - Predictive Analytics and Visualization Consultant and Founder, Enplus Advisors

Daniel Colonnese - WebSphere Managing Consultant, Lighthouse Computer Services

Sridhar Alla - CTO, eIQnetworks

Kleber Gallardo - CEO, Alivia Technology

Giuseppe Taibi - CEO, Rapid-I North America

Budapest Team Members:

Zoltán Prekopcsák - CEO and Co-Founder, Radoop

Péter Hellinger - Senior Software Engineer and Co-Founder, Radoop

Gabor Makrai - Chief developer and Co-Founder, Radoop

 

Photo Gallery

hack/reduce Radoop Team

Team Radoop hacking away

hack/reduce hackathon

RapidMiner process

Radoop Process Using Mahout K-Means Clustering Operator

hack/reduce Hackathon Voting Results

Rapid-I Team

Radoop Team - hack/reduce Hackathon

RCOMMBooks 1 Oct 2012
Books on Data Mining by Marius Helf Comment (0)
The Rapid-I team keeps on mining and we excavated two great books for our users. The first one, Data Mining for the Masses by Matthew North, is a very practical book for beginners and intermediate data miners, whereas The Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani and Jerome Friedman provides a deep insight into the mathematical models driving the heart of every data analysis. It is not really hot off the press, but has not lost its glamour since the release of the first edition a couple of years ago.

The book is targeted at readers with a statistical, mathematical or informatics background who want to understand not only how to use an operator in RapidMiner, but also why it works. The reader should not be afraid of mathematical formulas, but he will be rewarded by a decent understanding of many methods implemented in RapidMiner and of the connections and inter-relationships between different learning algorithms: what do decision trees and rule learning algorithms have in common? Should you try an SVM if k-NN fails on your data?

The Elements of Statistical Learning can be considered a standard book used in many data mining lectures around the world, which may be attributed to the fact that it does not just contain all the detailed information, but also presents them with relatively simple explanations - keeping in mind of course, that understanding complex topics will always require a whole lot of effort. The book is downloadable from the author's website.

Data Mining for the Masses, on the other side, takes a practical approach and, as the name implies, aims at a broader range of readers. Those of you who have visited this year's RCOMM already had the opportunity to follow the presentation of the book by the author himself: Matt's comprehensive book gives a detailed and profound introduction to data mining. All major concepts of data mining are covered in a well structured manner using real-life examples, most of which are solved completely with RapidMiner.

Actually, the book begins one step before the data analysis and explains the meaning of data mining itself, and also does not leave out ethical concerns the responsible data miner should keep in mind. Because of the easy style of writing and the good examples, this book is suited not only for IT professionals and college students who want to take a deeper look onto data mining, but for anyone who wants to learn how to get the most out of their data.


RCOMMmyExperimentchallenge 5 Sep 2012
Solving Sudokus with RapidMiner by Simon Fischer Comment (0)

 This year's RCOMM live data mining challenge, "Who wants to be a data miner?", was a tricky yet fun task for the competitors. The task was to (partially) solve a Sudoku puzzle with RapidMiner. The solution shows that you can achieve virtually any data analysis task you can think of only by using standard RapidMiner processes.

The input data set consisted of examples of the form (x,y,v) where x and y indicated the column and row inside the puzzle and v was the number predefined at this cell. The path to the solution was split into three subtasks:

  1. Task one was to generate the space of all combinations of cells and numbers that were possible if there were no predefined numbers in the Sudoku.This task could be solved by starting with a simple data set only containing the numbers one to nine and applying two Cartesian Product operators to generate all combinations of these numbers for x, y, and v. In addition to that, we also generate a new attribute z, which indicates the 3x3-sub-table in which the cell (x,y) lies, using a Generate Attributes operator.
  2. This additional attribute z is useful for Subtask 2 which was to eliminate all combinations which are impossible, given a single predefined cell value from the input data set. This was possible by using a combination of a Generate Attributes operator and a Filter Examples operator to identify those combinations (from the set of all combinations generated in Subtask 1) where v and at least one of x, y, or z match a number defined in the input.
  3. The resulting process of Subtask 2 could be re-used in Subtask 3 to eliminate all combinations whose impossibility could be inferred from looking at all predefined cell values in the input data set. This could be achieved by using a Loop Examples operator to iterate over all cells and using a nested Execute Process to re-use the process generated in Subtask 2.
    Finally, by looking at those cells where only one possible number remains,we can identify a new value that can be inserted into the Sudoku for sure. These cells can be identified by using an Aggregate operator to group the remaining possibilities by x, y, and z and find those groups with a count of exactly one.
  4. As a bonus process, we can repeat the process from Subtask 3, iteratively Appending the inferred numbers to the predefined ones. Thus, the data set grows up to a size of 81 which means the 9x9 Sudoku is complete. Finally, we use the Pivot operator and do some polishing to make the result look like this, a completely solved Sudoku:

Solution of the Sudoku puzzle in RapidMiner

All the processes mentioned above are accessible from myExperiment as a pack which you can use from within RapidMiner by using the Community Extension. The processes can be directly opened in RapidMiner, but when saving make sure to name them as they are named in myExperiment since an Execute Process operator in a later process may expect it under this name. Process 0 in this pack downloads the initial data from the Web.

RCOMMRapidMiner 9 Aug 2012
Last chance to register for third RapidMiner conference RCOMM 2012 by Ingo Mierswa Comment (0)

The third RapidMiner Community Meeting and Conference (RCOMM 2012) is quickly approaching and we are very excited about a great program full of talks, success stories, and demonstrations. The RCOMM 2012 will be held in at the Budapest University of Technology and Economics (BME), Budapest, Hungary on August 28 thru 31, 2012.

Normal registration rate ends on August 13th so we recommend to register now to make use of the granted discounts!

 

 

What to expect?

RCOMM 2012 offers more than 20 presentations, a social program, and our famous game show "Who wants to be a data miner?" The presentations include:

  • Mining Machine 2 Machine Data (Katharina Morik, TU Dortmund University)
  • Handling Big Data (Andras Benczur, MTA SZTAKI)
  • Introduction of RapidAnalytics at Telenor (Telenor and United Consult)
  • among many others.

Check the full program...

About RCOMM

Presentations aim and practitioners using or extending RapidMiner for commercial or scientific use. Topics include analysis processes, use cases, success stories, best practice recommendations, or descriptions of software packages building upon or extending RapidMiner and RapidAnalytics.

Another important highlight of the conference will be the presentation of the new book "Data Mining for the Masses" by Matthew North from Washington & Jefferson College making use of RapidMiner.

Learn more about the full program...

 

Registration

RCOMM 2012
RapidMiner Community Meeting and Conference (RCOMM 2012)
August 28 - 31, 2012
BME, Budapest, Hungary

Register now - last chance for discounted prices!
The RCOMM 2012 will take place at the Budapest University of Technology and Economics (BME), Budapest, Hungary.

Looking forward to meeting you all in Budapest!

RapidMiner 17 May 2012
Please vote for RapidMiner at KDNuggets 2012 by Ingo Mierswa Comment (0)

We at Rapid-I really like our work and give our best to provide you a feature-rich data mining platform. And as you of course all know, the Community Edition of RapidMiner is completely free of charge. Isn't that nice?

But today, we will need YOUR support!

On his really great data mining web site KDnuggets, Gregory asks once in a year his visitors for the data mining tools they have used within the last months. And here is where you come into this game: please vote for RapidMiner in the annual poll of KDnuggets and help us to get more widely known among analysts and researchers worldwide. This, at the end, will of course help to further improve RapidMiner and so you will actually get something back for only a small amount of your time.

Direct Link to the Poll at KDNuggets: http://www.kdnuggets.com/2012/05/new-poll-analytics-data-mining-software-used.html

Things are incredibly simple:

  1. Visit the web site KDnuggets: http://www.kdnuggets.com/2012/05/new-poll-analytics-data-mining-software-used.html
  2. Select RapidMiner and / or RapidAnalytics (in the poll box on the bottom right)
  3. Click on "Submit Vote" and confirm via mail

That's it! It's really easy and costs only a second... And please don't worry: Gregory will not use your mail adress for any other purpose than for this confimation.

Please vote for RapidMiner at the KDNuggets Poll 2012

 

Let me end this post and request with a big thank you for participating in this poll as well as for the many comments and feature requests we got during the last years. Things like that help us to improve RapidMiner. So help to spread the word so that we will get more comments in future and further improve it.

Cheers,
Ingo

RapidMinerIntro 21 Mar 2012
Step by Step Introduction to RapidMiner by Ingo Mierswa Comment (3)

Hi folks,

I just have stumbled upon a very nice step-by-step introduction to RapidMiner written by Dr. Scott Turner which has been published as a guest post on the blog The Number Crunching Life . Dr. Scott Turner won the Machine March Madness prediction contest last year, and who was the co-winner of the Sweet 16 contest from two years ago. Check out his great blog all about algorithmic prediction of NCAA basketball.

So if you are learning to work with RapidMiner right now or know somebody who just have started, this post definitely might be interesting to you:

http://blog.smellthedata.com/2012/03/using-rapidminer-to-predict-march.html

Have fun reading this introduction!

 

 


 

 

RCOMM 8 Mar 2012
RCOMM 2012 in Budapest by Ingo Mierswa Comment (0)
We are happy to present you the third RapidMiner Community Meeting and Conference (RCOMM 2012). The RCOMM 2012 will be held in at the Budapest University of Technology and Economics in Budapest, Hungary on August 28 thru 31, 2012.




Last years' RCOMMs have been a great success with lots of participants, many great talks, and workshops surrounding the conference. RCOMM 2012 intends to intensify the community life again and strengthen the RapidMiner network by bringing together users and developers of RapidMiner from all backgrounds. Presentations can be about applications (processes, use cases, best practice recommendations) or descriptions of software packages building upon or extending RapidMiner.

More information at http://www.rcomm2012.com

Call for Papers

Presentations aim at researchers and practitioners using or extending RapidMiner for scientific or commercial use. Topics include analysis processes, use cases, success stories, best practice recommendations or descriptions of software packages building upon or extending RapidMiner. Learn more about how to submit a paper to RCOMM 2012 at

http://www.rcomm2012.com

Hope to see you in Budapest and I am looking forward to your contributions.
RadoopBig Data 23 Feb 2012
Big Data? Big Analytics! by Ingo Mierswa Comment (0)

This is probably the most exciting announcement of the last months: Radoop and RapidMiner are partners now! Read below more about this disruptive technology for Big Data Analytics.

 

Big Data Analytics with RapidMiner and Radoop


You want Hadoop? You will love Radoop!
Hadoop has become a defacto standard for working with Big Data. The Hadoop framework supports data-intensive distributed applications which makes it an excellent tool for analyzing large data sets. In principle, Hadoop is able to work with thousands of computing nodes on petabytes of data. The problem is: the creation of those data transformation and analysis jobs means scripting, coding, hacking - which is a real pain in terms of maintenance and integration.

Don't bother with this 90s-style of coding any longer! Radoop (learn more about Radoop in a previous blog post ) offers the best of all worlds: the powerful but yet flexible graphical user interface of RapidMiner together with the power of Hadoop. Radoop closely integrates the highly optimized data analytics capabilities of Hadoop clusters, the distributed data warehouse Hive, and Mahout into the user-friendly interface of RapidMiner. This results in a powerful and easy-to-use data analytics solution for Hadoop.

Everybody talks about Big Data now.
While others talk about big data and how to overcome the related issues, we are happy to already announce the solution for the easy creation of data transformations and analytical processes based on Hadoop. This makes RapidMiner + Radoop the first enterprise-ready solution for Big Data Analytics based on Hadoop worldwide.

Partnership
Radoop is a disruptive technology. And it is the result of the hard work of two experienced teams around RapidMiner / RapidAnalytics (Rapid-I, Germany) and the Radoop extension (Radoop, Hungary). Radoop's active engagement has been one of the key factors for this revolution in Big Data Analytics. The people of Radoop are highly skilled and committed professionals; this is reflected in the amazing quality of the extension. As a consequence, we are really happy about this partnership and looking forward to even more exciting developments around Big Data Analytics during the next months.

More information in our official press release at http://rapid-i.com/content/view/358/1/

<< Anfang < Vorherige 1 2 3 4 5 6 7 8 9 10 Nächste > Ende >>
  • Share/Bookmark
  • Abbonieren Sie unseren RSS Feed!
  • Sehen Sie sich Videos in unserem YouTube Channel an!
  • Rapid Insight / Inside Rapid-I (Blog)
  • Besuchen Sie Rapid-I bei Facebook und werden Sie Fan!
  • Folgen Sie Rapid-I bei Twitter!
  • Lesen Sie den Rapid-I Newsletter