Open Source Software für Big Data Analytics.
Ohne Programmierung.

HomeKontaktSucheSitemapDatenschutzImpressum
  • Deutsch
  • English
Rapid-I. Report the Future. Home Download
Rapid-I Blog
Home Home
Search Search
RSS Feed RSS Feed

 

 

Blog Tags
Login Form





Passwort vergessen?
Noch kein Benutzerkonto?
Registrieren
Tag >> RapidMiner
Social NetworksRapidMinerRapidMinerRapid-IRadoopRadoopProcessModelingmahoutmachine learningHadoopHadoophackathonhack/reduceEventdatingClusteringClusteringbostonBlogBig DataBig DataAnalysis 27 Nov 2012
Radoop Team wins hack/reduce hackathon in Boston by Giuseppe Taibi Comment (0)

hack/reduce brands itself as Boston's Big Data hacking space. Backed by a who's who of Boston tech powerhouses, ranging from Harvard and MIT to Google and Microsoft, to the State of Massachusetts and top-tier VCs, hack/reduce is located in the historic Kendall Boiler and Tank building that gives the name to the vibrant Kendall Square technology district, brimming with startup excitement.

True to its mission of "helping Boston create the talent and the technologies that will shape our future in a big data-driven economy,” hack/reduce organized its first hackathon on Nov. 17. We at Rapid-I love Big Data so this was a terrific opportunity to mingle with the Boston Big Data community. Rapid-I's popular open source visual environment for data analysis RapidMiner can easily work on Big Data via Radoop, a RapidMiner extension that adds all the necessary operators to the standard set, so working on Big Data is as easy as drag-and-drop, no coding required. In addition to supporting Map/Reduce, Radoop includes a number of Machine Learning operators based on the powerful Mahout open source library. Mahout is known for being powerful, yet hard to use. Thanks to Radoop, working with Mahout is a breeze.

The day began with a tutorial on Hadoop by Greg Lu, a Software Engineer at hopper who is also the Technical Director of hack/reduce. Then teams were formed. The response to our "Big Data hacking without coding" pitch was terrific and our team quickly grew from four to over 20 members. We used Skype to keep everybody on the same page and troubleshoot. That worked great, especially since we had the original developers of Radoop online from Budapest, Hungary. We turned on the video chat and the remote team really felt like being in Cambridge.

Hackathon was great. At some point we had 25 people on Skype. The Radoop team from Hungary supported us during the entire 10 hours of the hackathon. At first, using a visual environment for a hackathon may sound counterintuitive, but in reality our teammates were really happy to be able to work at a higher conceptual level without having to wrestle with capricious code statements. In fact, our Radoop team was only bound by the power of the Hadoop cluster that we were working on. Because of the ease of use of Radoop, everybody was able to experiment with the data sets and the Hadoop cluster. As a result, the cluster was under stress and slowed down while trying to keep up with the number of job requests. The hackathon also helped the Radoop development team uncover a bug that slowed down processing of a clustering algorithm. (The bug is now fixed.)

Our team worked on a 25GB dating profiles database provided by Mate1.com. Other available databases included carbon dioxide measurements, Amazon.com product database, stock market prices, wikipedia and more (full list of Datasets available on the hackathon wiki). We were interested in performing cluster analysis to explore the similarities among user profiles. The Mate1 user profile attributes included age, gender, eye color, smoking habits, dating preferences, astrological signs, physical fitness, political views and many others.

For this task, we applied a K-Means clustering operator to the dataset, then used RapidMiner to create a scatter matrix plot to explore how the profile attributes were related to each other. We found out that most of the members only filled out the minimum number of fields on the profile. Also, for whatever reason, people with the same eye color also identify with the same body type. In almost every comparison we noticed that many people chose not to specify a value for an attribute. People definitely tend to enter the minimum information necessary to create a profile and start browsing other people profiles. One of the frustrations was the fact that the data set was normalized so we did not really know what was the exact meaning for a certain attribute value. Towards the end we started to reverse engineer this by creating our own profile on the Mate1.com website but then we ran out of time.

We also conducted an analysis to verify the "Half Your Age Plus 7 Rule" referring to the age difference among partners that is considered socially acceptable. More specifically, we mined the dating database to answer the question "What is the Oldest / Youngest Person that you are wiling to date?". In an very entertaining presentation, one team member exposed the harsh fact that for Gender "2,” the rule holds generally true, while for Gender "3,” there is a big difference in the form of members in their 20s and 30s willing to date partners much older than the 7+ rule. The database provided did not specify a text label for the gender, only a number, so feel free to guess which is which.

The main sponsor of the hackathon was hopper, a startup focused on redefining travel using Big Data, which is among the founders of hack/reduce.

Other teams also presented interesting work ranging from to a cool iPad app made by Praveen Aravamudham with a spinning earth globe mapping the CO2 emissions around the world, to the analysis of the most used words in Wikipedia (United States is the most used word).

Right after the team’s final presentation, all hackathon participants were given the opportunity to vote for the team that they thought produced the most interesting work. The Radoop team was off to a great start in the polls and led the race all the way until Andree Coude, VP Technology at hopper, declared the voting process over and Radoop team winner.

Now we are figuring out how to make the best use of the award of $1,000/month of computing power at SoftLayer. Stay tuned.

The video of the final presentation is available at: http://www.ustream.tv/recorded/27101415

Boston Team Members:

Sheamus McGovern - CTO, Capital Market Exchange and machine learning blogger

Todd Cioffi - Director, Technical Training, Navis Learning

Joe Rothermich - Data Scientist and Co-Founder, PeopleHedge

Dan Gerlanc - Predictive Analytics and Visualization Consultant and Founder, Enplus Advisors

Daniel Colonnese - WebSphere Managing Consultant, Lighthouse Computer Services

Sridhar Alla - CTO, eIQnetworks

Kleber Gallardo - CEO, Alivia Technology

Giuseppe Taibi - CEO, Rapid-I North America

Budapest Team Members:

Zoltán Prekopcsák - CEO and Co-Founder, Radoop

Péter Hellinger - Senior Software Engineer and Co-Founder, Radoop

Gabor Makrai - Chief developer and Co-Founder, Radoop

 

Photo Gallery

hack/reduce Radoop Team

Team Radoop hacking away

hack/reduce hackathon

RapidMiner process

Radoop Process Using Mahout K-Means Clustering Operator

hack/reduce Hackathon Voting Results

Rapid-I Team

Radoop Team - hack/reduce Hackathon

Social NetworksRapidMinerRapidMinerRapid-IRadoopRadoopProcessModelingmahoutmachine learningHadoopHadoophackathonhack/reduceEventdatingClusteringClusteringbostonBlogBig DataBig DataAnalysis 27 Nov 2012
Radoop Team wins hack/reduce hackathon in Boston by Giuseppe Taibi Comment (0)

hack/reduce brands itself as Boston's Big Data hacking space. Backed by a who's who of Boston tech powerhouses, ranging from Harvard and MIT to Google and Microsoft, to the State of Massachusetts and top-tier VCs, hack/reduce is located in the historic Kendall Boiler and Tank building that gives the name to the vibrant Kendall Square technology district, brimming with startup excitement.

True to its mission of "helping Boston create the talent and the technologies that will shape our future in a big data-driven economy,” hack/reduce organized its first hackathon on Nov. 17. We at Rapid-I love Big Data so this was a terrific opportunity to mingle with the Boston Big Data community. Rapid-I's popular open source visual environment for data analysis RapidMiner can easily work on Big Data via Radoop, a RapidMiner extension that adds all the necessary operators to the standard set, so working on Big Data is as easy as drag-and-drop, no coding required. In addition to supporting Map/Reduce, Radoop includes a number of Machine Learning operators based on the powerful Mahout open source library. Mahout is known for being powerful, yet hard to use. Thanks to Radoop, working with Mahout is a breeze.

The day began with a tutorial on Hadoop by Greg Lu, a Software Engineer at hopper who is also the Technical Director of hack/reduce. Then teams were formed. The response to our "Big Data hacking without coding" pitch was terrific and our team quickly grew from four to over 20 members. We used Skype to keep everybody on the same page and troubleshoot. That worked great, especially since we had the original developers of Radoop online from Budapest, Hungary. We turned on the video chat and the remote team really felt like being in Cambridge.

Hackathon was great. At some point we had 25 people on Skype. The Radoop team from Hungary supported us during the entire 10 hours of the hackathon. At first, using a visual environment for a hackathon may sound counterintuitive, but in reality our teammates were really happy to be able to work at a higher conceptual level without having to wrestle with capricious code statements. In fact, our Radoop team was only bound by the power of the Hadoop cluster that we were working on. Because of the ease of use of Radoop, everybody was able to experiment with the data sets and the Hadoop cluster. As a result, the cluster was under stress and slowed down while trying to keep up with the number of job requests. The hackathon also helped the Radoop development team uncover a bug that slowed down processing of a clustering algorithm. (The bug is now fixed.)

Our team worked on a 25GB dating profiles database provided by Mate1.com. Other available databases included carbon dioxide measurements, Amazon.com product database, stock market prices, wikipedia and more (full list of Datasets available on the hackathon wiki). We were interested in performing cluster analysis to explore the similarities among user profiles. The Mate1 user profile attributes included age, gender, eye color, smoking habits, dating preferences, astrological signs, physical fitness, political views and many others.

For this task, we applied a K-Means clustering operator to the dataset, then used RapidMiner to create a scatter matrix plot to explore how the profile attributes were related to each other. We found out that most of the members only filled out the minimum number of fields on the profile. Also, for whatever reason, people with the same eye color also identify with the same body type. In almost every comparison we noticed that many people chose not to specify a value for an attribute. People definitely tend to enter the minimum information necessary to create a profile and start browsing other people profiles. One of the frustrations was the fact that the data set was normalized so we did not really know what was the exact meaning for a certain attribute value. Towards the end we started to reverse engineer this by creating our own profile on the Mate1.com website but then we ran out of time.

We also conducted an analysis to verify the "Half Your Age Plus 7 Rule" referring to the age difference among partners that is considered socially acceptable. More specifically, we mined the dating database to answer the question "What is the Oldest / Youngest Person that you are wiling to date?". In an very entertaining presentation, one team member exposed the harsh fact that for Gender "2,” the rule holds generally true, while for Gender "3,” there is a big difference in the form of members in their 20s and 30s willing to date partners much older than the 7+ rule. The database provided did not specify a text label for the gender, only a number, so feel free to guess which is which.

The main sponsor of the hackathon was hopper, a startup focused on redefining travel using Big Data, which is among the founders of hack/reduce.

Other teams also presented interesting work ranging from to a cool iPad app made by Praveen Aravamudham with a spinning earth globe mapping the CO2 emissions around the world, to the analysis of the most used words in Wikipedia (United States is the most used word).

Right after the team’s final presentation, all hackathon participants were given the opportunity to vote for the team that they thought produced the most interesting work. The Radoop team was off to a great start in the polls and led the race all the way until Andree Coude, VP Technology at hopper, declared the voting process over and Radoop team winner.

Now we are figuring out how to make the best use of the award of $1,000/month of computing power at SoftLayer. Stay tuned.

The video of the final presentation is available at: http://www.ustream.tv/recorded/27101415

Boston Team Members:

Sheamus McGovern - CTO, Capital Market Exchange and machine learning blogger

Todd Cioffi - Director, Technical Training, Navis Learning

Joe Rothermich - Data Scientist and Co-Founder, PeopleHedge

Dan Gerlanc - Predictive Analytics and Visualization Consultant and Founder, Enplus Advisors

Daniel Colonnese - WebSphere Managing Consultant, Lighthouse Computer Services

Sridhar Alla - CTO, eIQnetworks

Kleber Gallardo - CEO, Alivia Technology

Giuseppe Taibi - CEO, Rapid-I North America

Budapest Team Members:

Zoltán Prekopcsák - CEO and Co-Founder, Radoop

Péter Hellinger - Senior Software Engineer and Co-Founder, Radoop

Gabor Makrai - Chief developer and Co-Founder, Radoop

 

Photo Gallery

hack/reduce Radoop Team

Team Radoop hacking away

hack/reduce hackathon

RapidMiner process

Radoop Process Using Mahout K-Means Clustering Operator

hack/reduce Hackathon Voting Results

Rapid-I Team

Radoop Team - hack/reduce Hackathon

RCOMMRapidMiner 9 Aug 2012
Last chance to register for third RapidMiner conference RCOMM 2012 by Ingo Mierswa Comment (0)

The third RapidMiner Community Meeting and Conference (RCOMM 2012) is quickly approaching and we are very excited about a great program full of talks, success stories, and demonstrations. The RCOMM 2012 will be held in at the Budapest University of Technology and Economics (BME), Budapest, Hungary on August 28 thru 31, 2012.

Normal registration rate ends on August 13th so we recommend to register now to make use of the granted discounts!

 

 

What to expect?

RCOMM 2012 offers more than 20 presentations, a social program, and our famous game show "Who wants to be a data miner?" The presentations include:

  • Mining Machine 2 Machine Data (Katharina Morik, TU Dortmund University)
  • Handling Big Data (Andras Benczur, MTA SZTAKI)
  • Introduction of RapidAnalytics at Telenor (Telenor and United Consult)
  • among many others.

Check the full program...

About RCOMM

Presentations aim and practitioners using or extending RapidMiner for commercial or scientific use. Topics include analysis processes, use cases, success stories, best practice recommendations, or descriptions of software packages building upon or extending RapidMiner and RapidAnalytics.

Another important highlight of the conference will be the presentation of the new book "Data Mining for the Masses" by Matthew North from Washington & Jefferson College making use of RapidMiner.

Learn more about the full program...

 

Registration

RCOMM 2012
RapidMiner Community Meeting and Conference (RCOMM 2012)
August 28 - 31, 2012
BME, Budapest, Hungary

Register now - last chance for discounted prices!
The RCOMM 2012 will take place at the Budapest University of Technology and Economics (BME), Budapest, Hungary.

Looking forward to meeting you all in Budapest!

RapidMiner 17 May 2012
Please vote for RapidMiner at KDNuggets 2012 by Ingo Mierswa Comment (0)

We at Rapid-I really like our work and give our best to provide you a feature-rich data mining platform. And as you of course all know, the Community Edition of RapidMiner is completely free of charge. Isn't that nice?

But today, we will need YOUR support!

On his really great data mining web site KDnuggets, Gregory asks once in a year his visitors for the data mining tools they have used within the last months. And here is where you come into this game: please vote for RapidMiner in the annual poll of KDnuggets and help us to get more widely known among analysts and researchers worldwide. This, at the end, will of course help to further improve RapidMiner and so you will actually get something back for only a small amount of your time.

Direct Link to the Poll at KDNuggets: http://www.kdnuggets.com/2012/05/new-poll-analytics-data-mining-software-used.html

Things are incredibly simple:

  1. Visit the web site KDnuggets: http://www.kdnuggets.com/2012/05/new-poll-analytics-data-mining-software-used.html
  2. Select RapidMiner and / or RapidAnalytics (in the poll box on the bottom right)
  3. Click on "Submit Vote" and confirm via mail

That's it! It's really easy and costs only a second... And please don't worry: Gregory will not use your mail adress for any other purpose than for this confimation.

Please vote for RapidMiner at the KDNuggets Poll 2012

 

Let me end this post and request with a big thank you for participating in this poll as well as for the many comments and feature requests we got during the last years. Things like that help us to improve RapidMiner. So help to spread the word so that we will get more comments in future and further improve it.

Cheers,
Ingo

RapidMinerIntro 21 Mar 2012
Step by Step Introduction to RapidMiner by Ingo Mierswa Comment (3)

Hi folks,

I just have stumbled upon a very nice step-by-step introduction to RapidMiner written by Dr. Scott Turner which has been published as a guest post on the blog The Number Crunching Life . Dr. Scott Turner won the Machine March Madness prediction contest last year, and who was the co-winner of the Sweet 16 contest from two years ago. Check out his great blog all about algorithmic prediction of NCAA basketball.

So if you are learning to work with RapidMiner right now or know somebody who just have started, this post definitely might be interesting to you:

http://blog.smellthedata.com/2012/03/using-rapidminer-to-predict-march.html

Have fun reading this introduction!

 

 


 

 

ReleaseRapidMinerPlotter 23 Jan 2012
New Plotters for RapidMiner by Marius Helf Comment (1)
After quite some time of hard development, the Rapid-I team is proud to announce the birth of its latest baby: a brand new plot component presenting you a shiny, powerful and flexible visualization of your data and process results.

The new plotters support bar charts, area charts, scatter and series plots with a single configuration. Instead of preselecting a diagram type from a list of templates the new plotters allow you to freely choose the visualization type of each attribute. You can plot more than one attribute at a time, create additional y-axes, combine aggregated bar charts with scatter plots and add a number of error indicators if you feel the need for it. Enough talking, this is what the new plotters can do for you (of course with your all-time favourite data set):

 

What do we see in this plot? As you might recognize, the points depict a scatter plot of two attributes of the Iris dataset, namely sepal length versus sepal width, where sepal length is placed on the domain axis (x-axis) and sepal width on the left range axis (y-axis). The colors and also the shapes of the points are chosen accordingly to the label of the data point. This is also represented in the legend on the right.

Talking about the legend, you might want to have a closer look on it. The upper part reveals the plots in this diagram. The first entry labelled sepal length (cm) with the circle in front of it shows us, that the plot consists of single data points, i.e. it is the scatter plot we just talked about. The missing color and quite undefined shape tells us to look at the bottom part of the legend to get the semantics for colors and shapes: moving our attention here we discover that each unique color and shape represent one of the label values iris setosa, iris virginica and iris versicolor.

Now everything left to explain is the bar chart, which is also easily spotted in the legend: it is a histogram of Iris, grouped by label,  over the sepal length. Note that the heights of the bars refer to a second range axis on the right.

The attentive reader will have noted that the bars are slightly transparent: this shows another feature of our new plotters - everything is formattable and customizable, starting at customizable presets and gradients for the plot colors, different shapes for each data series, plot and legend background up to the fonts of the title and the axes. What else do you desire? Bars oriented from left to right instead of vertical ones? No problem, two clicks and you are done. Aggregate your data to calculate averages and plot the standard deviation of each data point? No problem, everything is possible :)

The true plotter experts will even be able to beam good old Iris to New York and celebrate the arrival of the new plot engine with a fireworks never seen before in RapidMiner:

Oh yes, this truly is the Iris dataset. Can you guess from the legend what you are seeing?

We hope that we could awake your interest for this new feature. It will be part of RapidMiner 5.2 beta which is expected to be shipped at the end of this week. As usual you will be notified via RapidMiner's auto update about its availability, or you can just download from our website.

TrainingRapidMiner 19 Jan 2012
Practical Data Mining Lectures from Simafore by Ingo Mierswa Comment (0)

From time to time, we post articles about how specific analysis methods work and how those methods and approaches can be done with RapidMiner.
Our colleagues from Simafore, an US-based consultancy company for advanced analytics, also follow this approach and describe many applications of data mining in real-world scenarios together with practical examples done with RapidMiner.

So I thought their blog might be interesting to you, especially for those of you not already familiar with the deepest aspects of data modeling.  For most of their blog posts, there is also a white paper explaining more details about the method application and how to perform this with RapidMiner.

Here is a small selection of topics:

A Simple Explanation of Decision Tree Modeling based on Entropies

Link: http://www.simafore.com/blog/bid/94454/A-simple-explanation-of-how-entropy-fuels-a-decision-tree-model

Description of some of the basics of decision trees. Simple and hardly any math, I like the plots explaining the basic idea of the entropy as splitting criterion (although we actually calculate gain ratio differently than explained...)

White Paper: www.simafore.com/Download-ebook-Decision-Tree-Articles-Digest/

 Data Distribution

 

Logistic Regression for Business Analytics using RapidMiner

Link: http://www.simafore.com/blog/bid/57924/Logistic-regression-for-business-analytics-using-RapidMiner-Part-2

Same as above, but this time for modeling with logistic regression.
 Easy to read and covering all basic ideas together with some examples. If you are not familiar with the topic yet, part 1 (see below) might help.

White Paper: http://www.simafore.com/download-ebook-Logistic-regression-articles-digest/

Part 1 (Basics): http://www.simafore.com/blog/bid/57801/Logistic-regression-for-business-analytics-using-RapidMiner-Part-1

Deploy Model: http://www.simafore.com/blog/bid/82024/How-to-deploy-a-logistic-regression-model-using-RapidMiner

Advanced Information: http://www.simafore.com/blog/bid/99443/Understand-3-critical-steps-in-developing-logistic-regression-models

 

Logistic Regression in RapidMiner

 

Feature Selection and Linear Regression

There are also two articles about feature selection and linear regression:

http://www.simafore.com/blog/bid/80639/Feature-Selection-for-predictive-analytics-using-mutual-information

http://www.simafore.com/blog/bid/81836/2-ways-to-select-predictors-for-regression-models-using-RapidMiner

White Paper: http://www.simafore.com/Download-ebook-Predicting-Sales-using-linear-regression/


And I am sure, there is more to come. Please visit Simafore's blog at

http://www.simafore.com/blog/

researchRapidMinerData Mining 9 Jan 2012
The Intelligent Discovery Assistant by Simon Fischer Comment (0)

Imagine all you would have to do for creating a data mining process was to select a data set and specify what you want to do with the data, e.g. predictive modelling. Wouldn't that save a lot of work?

Within the research project "e-LICO", funded by the EU within the 7th Framework Programme, the Intelligent Discovery Assistant (IDA) was  developed, and it does precisely that. It comes with its own perspective (marked with the silhouette of a friendly butler) that contains all you need: The repository and the assistant itself. To use it, follow three simple steps:

  1. Drag a data set into one of the slots. It will be automatically detected as training data, test data or apply data, depending on whether it has a label or not.
  2. Select a goal. The most frequent one is probably "Predictive Modelling". All goals have comments, so you see what they can be used for.
  3. Select "Fetch plans" and wait a bit to get a list of processes that solve your problem. Once the planning completes, select one of the processes (you can see a preview at the right) and run it. Alternatively, select multiple (selecting none means selecting all) and evaluate them on your data in a batch.

The assistant strives to generate processes that are compatible with your data. To do so, it performs a lot of clever operations, e.g., it automatically replaces missing values if missing values exist and this is required by the learning algorithm or performs a normalization when using a distance-based learner.

You can install the extension directly by using the Rapid-I Marketplace instead of the old update server. Just go to the preferences and enter http://rapidupdate.de:8180/UpdateServer as the update URL. Alternatively, just download it directly and place it in RapidMiner's lib\plugins folder.

Since the workflow planning happens in Prolog, this extension  automatically installs a Prolog engine (XSB Prolog plus Flora 2). It will do so when it first starts. These can only be installed into a specific directory, so you must run RapidMiner as administrator when using the extension for the first time. (On Windows, righ-click and "Run as administrator").

If you try out the extension, we ask you to participate in the user survey so we can keep improving the extension. You can easily open the survey by installing the extension and clicking on the third button in the toolbar (the one with the letter box).

The IDA was developed as a collaboration mainly between the University of Zurich (Jörg-Uwe Kietz and Floarea Serban) and Rapid-I.

RapidMinerBook 7 Dec 2011
Call for Chapters for a RapidMiner Book on Use Cases by Ingo Mierswa Comment (2)

Great news for those of you who are waiting for an official RapidMiner book: we recently made some progress on the long lost manual and below you can find even something new: more information and a call for chapters for the upcoming book about how to use RapidMiner in different application areas.

Editors:
Dr. Markus Hofmann, Institute of Technology Blanchardstown, Ireland
Ralf Klinkenberg, Chief Business Development Officer, Rapid-I, Germany

RapidMiner Book

More information about the book and a call for chapters can be found below or at
http://www.rapidminerbook.com.

Introduction

RapidMiner has, without a doubt, serious impact in relation to software choice when it comes to data mining and predictive analytics. Thanks to its open source license model, RapidMiner spread quickly and is now deployed by hundreds of thousands of users in more than 60 countries world-wide. It is often referenced as a true competitor when compared to proprietary commercial solutions. However, like for many other open source solutions, a lack of application-oriented documentation is often a barrier to use the software. The proposed book wants to address this issue and lower this barrier by demonstrating how to apply RapidMiner in many relevant areas.

The proposed book will be an introductory book to RapidMiner focusing on use cases to explain the functionality and most frequently used operators. The aim is not to produce another data mining book and certain knowledge of data analysis concepts and techniques can be expected when drafting chapter proposals.

More info can be found here: http://www.rapidminerbook.com.

Overall Objectives

The book will provide high-quality practical articles in relation to use cases that showcase RapidMiner as a leading data mining software. Each Use Case has to be accomponied with a dataset. While reading the chapter the learner can follow and implement the use case in RapidMiner 5.

Recommended Topics and Themes

Original papers on all aspects of data analysis that RapidMiner caters for are invited. Submissions must not duplicate work that any of the authors has published elsewhere or submitted in parallel to any other books, conferences or workshops with proceedings. In addition, it is not always necessary to produce the best possible mining process on the data. Instead, the aim is to use the data to explain a set of operators in a practical manner (step by step process).

Possible topics covering all aspects of data mining may include (but are not limited to):

  • Data Exploration and Visualisation
  • Introduction to Data Mining (Theory Chapter)
  • RapidMiner GUI Intro
  • Classification Basic
  • Text Mining
  • Classification Advanced / Direct Mailing
  • Predictive Maintenance / Machine Failure Prevention / Quality Assurance
  • Customer/Credit Scoring
  • Financial Forecasting
  • Marketing Channel Analysis
  • Web-Content Mining (Sentiment Analysis)
  • Educational Data Mining
  • Customer Segmentation
  • Image Mining
  • Automated Reporting
  • RapidAnalytics

Submission Procedure

Researchers and practitioners are invited to submit on or before December 31, 2011, a 2 to 3 page manuscript proposal clearly explaining the use case of the proposed chapter and the operators that will be introduced. Authors of accepted proposals will be notified by January 31, 2012. The following should be kept in mind:

  • The proposed project should include a sample mining process.
  • You need to submit your Curriculum Vitae with the chapter proposal.
  • The data needs to be publicly available so that future readers of the book can reproduce the use cases.
  • The aim is not to produce the perfect process but to use and explain an appropriate number of operators.
  • Chapter proposals can be submitted as MS Word or PDF file.
  • Chapters need to be submitted using the LaTex template available on http://www.rapidminerbook.com

Full chapters are expected to be submitted by May 31, 2012. All submitted chapters will be reviewed by at least two reviewers. Various publishing strategies and publishers are currently considered.

Important Dates

Manuscript proposal for book chapter (2-3 pages): December 31, 2011
Notification to authors of submitted chapters: January 31, 2012
First Draft of the chapters from authors: May 31, 2012
Reviews back to authors: June 30, 2012
Revised Chapters back from authors: July 31, 2012
Final notification to the authors: August 31, 2012
Final camera-ready chapters from authors: September 30, 2012

Please e-mail all inquiries and proposal submissions to markus.hofmann@itb.ie.

Contact

Dr. Markus Hofmann

Department of Informatics, School of Engineering and Informatics
Institute of Technology Blanchardstown (ITB)
Blanchardstown Road North
Dublin 15
Ireland
VideosRapidMinerETL 15 Sep 2011
Video Series about ETL with RapidMiner by Ingo Mierswa Comment (0)

He did it again!

Here is the first video, please find the rest in Neils blog (see links above):

 

 

Please visit the

We are sure that we speak for the many users out there when we thank you, Neil, for putting these efforts into producing those videos - they are certainly helping a lot!

<< Anfang < Vorherige 1 2 3 4 5 6 Nächste > Ende >>
  • Share/Bookmark
  • Abbonieren Sie unseren RSS Feed!
  • Sehen Sie sich Videos in unserem YouTube Channel an!
  • Rapid Insight / Inside Rapid-I (Blog)
  • Besuchen Sie Rapid-I bei Facebook und werden Sie Fan!
  • Folgen Sie Rapid-I bei Twitter!
  • Lesen Sie den Rapid-I Newsletter