HomeSearchSitemapLegalContact Us
  • Deutsch
  • English
 
Rapid-I Blog
Home Home
Search Search
RSS Feed RSS Feed

 

 

Blog Tags
Login Form





Lost Password?
No account yet? Register
ReleaseRapidMinerPlotter 23 Jan 2012
New Plotters for RapidMiner by Marius Helf Comment (0)
After quite some time of hard development, the Rapid-I team is proud to announce the birth of its latest baby: a brand new plot component presenting you a shiny, powerful and flexibel visualization of your data and process results.

The new plotters support bar charts, area charts, scatter and series plots with a single configuration. Instead of preselecting a diagram type from a list of templates the new plotters allow you to freely choose the visualization type of each attribute. You can plot more than one attribute at a time, create additional y-axes, combine aggregated bar charts with scatter plots and add a number of error indicators if you feel the need for it. Enough talking, this is what the new plotters can do for you (of course with your all-time favourite data set):

 

What do we see in this plot? As you might recognize, the points depict a scatter plot of two attributes of the Iris dataset, namely sepal length versus sepal width, where sepal length is placed on the domain axis (x-axis) and sepal width on the left range axis (y-axis). The colors and also the shapes of the points are chosen accordingly to the label of the data point. This is also represented in the legend on the right.

Talking about the legend, you might want to have a closer look on it. The upper part reveals the plots in this diagram. The first entry labelled sepal length (cm) with the circle in front of it shows us, that the plot consists of single data points, i.e. it is the scatter plot we just talked about. The missing color and quite undefined shape tells us to look at the bottom part of the legend to get the semantics for colors and shapes: moving our attention here we discover that each unique color and shape represent one of the label values iris setosa, iris virginica and iris versicolor.

Now everything left to explain is the bar chart, which is also easily spotted in the legend: it is a histogram of Iris, grouped by label,  over the sepal length. Note that the heights of the bars refer to a second range axis on the right.

The attentive reader will have noted that the bars are slightly transparent: this shows another feature of our new plotters - everything is formattable and customizable, starting at customizable presets and gradients for the plot colors, different shapes for each data series, plot and legend background up to the fonts of the title and the axes. What else do you desire? Bars oriented from left to right instead of vertical ones? No problem, two clicks and you are done. Aggregate your data to calculate averages and plot the standard deviation of each data point? No problem, everything is possible :)

The true plotter experts will even be able to beam good old Iris to New York and celebrate the arrival of the new plot engine with a fireworks never seen before in RapidMiner:

Oh yes, this truly is the Iris dataset. Can you guess from the legend what you are seeing?

We hope that we could awake your interest for this new feature. It will be part of RapidMiner 5.2 beta which is expected to be shipped at the end of this week. As usual you will be notified via RapidMiner's auto update about its availability, or you can just download from our website.

TrainingRapidMiner 19 Jan 2012
Practical Data Mining Lectures from Simafore by Ingo Mierswa Comment (0)

From time to time, we post articles about how specific analysis methods work and how those methods and approaches can be done with RapidMiner.
Our colleagues from Simafore, an US-based consultancy company for advanced analytics, also follow this approach and describe many applications of data mining in real-world scenarios together with practical examples done with RapidMiner.

So I thought their blog might be interesting to you, especially for those of you not already familiar with the deepest aspects of data modeling.  For most of their blog posts, there is also a white paper explaining more details about the method application and how to perform this with RapidMiner.

Here is a small selection of topics:

A Simple Explanation of Decision Tree Modeling based on Entropies

Link: http://www.simafore.com/blog/bid/94454/A-simple-explanation-of-how-entropy-fuels-a-decision-tree-model

Description of some of the basics of decision trees. Simple and hardly any math, I like the plots explaining the basic idea of the entropy as splitting criterion (although we actually calculate gain ratio differently than explained...)

White Paper: www.simafore.com/Download-ebook-Decision-Tree-Articles-Digest/

 Data Distribution

 

Logistic Regression for Business Analytics using RapidMiner

Link: http://www.simafore.com/blog/bid/57924/Logistic-regression-for-business-analytics-using-RapidMiner-Part-2

Same as above, but this time for modeling with logistic regression.
 Easy to read and covering all basic ideas together with some examples. If you are not familiar with the topic yet, part 1 (see below) might help.

White Paper: http://www.simafore.com/download-ebook-Logistic-regression-articles-digest/

Part 1 (Basics): http://www.simafore.com/blog/bid/57801/Logistic-regression-for-business-analytics-using-RapidMiner-Part-1

Deploy Model: http://www.simafore.com/blog/bid/82024/How-to-deploy-a-logistic-regression-model-using-RapidMiner

Advanced Information: http://www.simafore.com/blog/bid/99443/Understand-3-critical-steps-in-developing-logistic-regression-models

 

Logistic Regression in RapidMiner

 

Feature Selection and Linear Regression

There are also two articles about feature selection and linear regression:

http://www.simafore.com/blog/bid/80639/Feature-Selection-for-predictive-analytics-using-mutual-information

http://www.simafore.com/blog/bid/81836/2-ways-to-select-predictors-for-regression-models-using-RapidMiner

White Paper: http://www.simafore.com/Download-ebook-Predicting-Sales-using-linear-regression/


And I am sure, there is more to come. Please visit Simafore's blog at

http://www.simafore.com/blog/

researchRapidMinerData Mining 9 Jan 2012
The Intelligent Discovery Assistant by Simon Fischer Comment (0)

Imagine all you would have to do for creating a data mining process was to select a data set and specify what you want to do with the data, e.g. predictive modelling. Wouldn't that save a lot of work?

Within the research project "e-LICO", funded by the EU within the 7th Framework Programme, the Intelligent Discovery Assistant (IDA) was  developed, and it does precisely that. It comes with its own perspective (marked with the silhouette of a friendly butler) that contains all you need: The repository and the assistant itself. To use it, follow three simple steps:

  1. Drag a data set into one of the slots. It will be automatically detected as training data, test data or apply data, depending on whether it has a label or not.
  2. Select a goal. The most frequent one is probably "Predictive Modelling". All goals have comments, so you see what they can be used for.
  3. Select "Fetch plans" and wait a bit to get a list of processes that solve your problem. Once the planning completes, select one of the processes (you can see a preview at the right) and run it. Alternatively, select multiple (selecting none means selecting all) and evaluate them on your data in a batch.

The assistant strives to generate processes that are compatible with your data. To do so, it performs a lot of clever operations, e.g., it automatically replaces missing values if missing values exist and this is required by the learning algorithm or performs a normalization when using a distance-based learner.

You can install the extension directly by using the Rapid-I Marketplace instead of the old update server. Just go to the preferences and enter http://rapidupdate.de:8180/UpdateServer as the update URL. Alternatively, just download it directly and place it in RapidMiner's lib\plugins folder.

Since the workflow planning happens in Prolog, this extension  automatically installs a Prolog engine (XSB Prolog plus Flora 2). It will do so when it first starts. These can only be installed into a specific directory, so you must run RapidMiner as administrator when using the extension for the first time. (On Windows, righ-click and "Run as administrator").

If you try out the extension, we ask you to participate in the user survey so we can keep improving the extension. You can easily open the survey by installing the extension and clicking on the third button in the toolbar (the one with the letter box).

The IDA was developed as a collaboration mainly between the University of Zurich (Jörg-Uwe Kietz and Floarea Serban) and Rapid-I.

RapidMinerBook 7 Dec 2011
Call for Chapters for a RapidMiner Book on Use Cases by Ingo Mierswa Comment (0)

Great news for those of you who are waiting for an official RapidMiner book: we recently made some progress on the long lost manual and below you can find even something new: more information and a call for chapters for the upcoming book about how to use RapidMiner in different application areas.

Editors:
Dr. Markus Hofmann, Institute of Technology Blanchardstown, Ireland
Ralf Klinkenberg, Chief Business Development Officer, Rapid-I, Germany

RapidMiner Book

More information about the book and a call for chapters can be found below or at
http://www.rapidminerbook.com.

Introduction

RapidMiner has, without a doubt, serious impact in relation to software choice when it comes to data mining and predictive analytics. Thanks to its open source license model, RapidMiner spread quickly and is now deployed by hundreds of thousands of users in more than 60 countries world-wide. It is often referenced as a true competitor when compared to proprietary commercial solutions. However, like for many other open source solutions, a lack of application-oriented documentation is often a barrier to use the software. The proposed book wants to address this issue and lower this barrier by demonstrating how to apply RapidMiner in many relevant areas.

The proposed book will be an introductory book to RapidMiner focusing on use cases to explain the functionality and most frequently used operators. The aim is not to produce another data mining book and certain knowledge of data analysis concepts and techniques can be expected when drafting chapter proposals.

More info can be found here: http://www.rapidminerbook.com.

Overall Objectives

The book will provide high-quality practical articles in relation to use cases that showcase RapidMiner as a leading data mining software. Each Use Case has to be accomponied with a dataset. While reading the chapter the learner can follow and implement the use case in RapidMiner 5.

Recommended Topics and Themes

Original papers on all aspects of data analysis that RapidMiner caters for are invited. Submissions must not duplicate work that any of the authors has published elsewhere or submitted in parallel to any other books, conferences or workshops with proceedings. In addition, it is not always necessary to produce the best possible mining process on the data. Instead, the aim is to use the data to explain a set of operators in a practical manner (step by step process).

Possible topics covering all aspects of data mining may include (but are not limited to):

  • Data Exploration and Visualisation
  • Introduction to Data Mining (Theory Chapter)
  • RapidMiner GUI Intro
  • Classification Basic
  • Text Mining
  • Classification Advanced / Direct Mailing
  • Predictive Maintenance / Machine Failure Prevention / Quality Assurance
  • Customer/Credit Scoring
  • Financial Forecasting
  • Marketing Channel Analysis
  • Web-Content Mining (Sentiment Analysis)
  • Educational Data Mining
  • Customer Segmentation
  • Image Mining
  • Automated Reporting
  • RapidAnalytics

Submission Procedure

Researchers and practitioners are invited to submit on or before December 31, 2011, a 2 to 3 page manuscript proposal clearly explaining the use case of the proposed chapter and the operators that will be introduced. Authors of accepted proposals will be notified by January 31, 2012. The following should be kept in mind:

  • The proposed project should include a sample mining process.
  • You need to submit your Curriculum Vitae with the chapter proposal.
  • The data needs to be publicly available so that future readers of the book can reproduce the use cases.
  • The aim is not to produce the perfect process but to use and explain an appropriate number of operators.
  • Chapter proposals can be submitted as MS Word or PDF file.
  • Chapters need to be submitted using the LaTex template available on http://www.rapidminerbook.com

Full chapters are expected to be submitted by May 31, 2012. All submitted chapters will be reviewed by at least two reviewers. Various publishing strategies and publishers are currently considered.

Important Dates

Manuscript proposal for book chapter (2-3 pages): December 31, 2011
Notification to authors of submitted chapters: January 31, 2012
First Draft of the chapters from authors: May 31, 2012
Reviews back to authors: June 30, 2012
Revised Chapters back from authors: July 31, 2012
Final notification to the authors: August 31, 2012
Final camera-ready chapters from authors: September 30, 2012

Please e-mail all inquiries and proposal submissions to markus.hofmann@itb.ie.

Contact

Dr. Markus Hofmann

Department of Informatics, School of Engineering and Informatics
Institute of Technology Blanchardstown (ITB)
Blanchardstown Road North
Dublin 15
Ireland
Web serviceRapidAnalytics Video TutorialApplications 21 Nov 2011
File Objects and POSTing to Web Services by Simon Fischer Comment (0)

As of RapidMiner 5.1.11, we have introduced a new kind of I/O-Object in RapidMiner: File Objects. File Objects are generated by opening local files, URLs, looping over directories or ZIP files, etc. They are then parsed by Read CSV or Read Excel and converted to an example set. All this was possible before, partly by using macros, but it is now much simpler and more flexible.

Foremost, however, it offers a new way of sending input data to a process exposed as a Web service in RapidAnalytics: The body of the HTTP POST request is transformed into such a File Object and can then be parsed as a part of the process. This makes the definition of the input format of a Web service very flexible and provides a simple means to create Web services that classify data tables.

Watch this video for more details:

 

RCOMMEventsData MiningContest 2 Nov 2011
Who Wants to be a Data Miner? (RCOMM 2011) by Simon Fischer Comment (0)

One of the most fun events at the annual RapidMiner Community Meeting and Conference (RCOMM) is the live data mining process design competition "Who Wants to be a Data Miner?" In this competition, participants must design RapidMiner processes for a given goal within a few minutes. The tasks are related to data mining and data analysis, but are rather uncommon. In fact, most of the challenges ask for things RapidMiner was never supposed to do.

In 2010, we had posted the winning processes immediately after the conference. This year we did not do so because the processes depend on input files which could not easily be attached to these processes on myExperiment. As of RapidMiner 5.1.11 we have a new way of handling files making it easier to link RapidMiner processes against data files on the Web (more on this in this blog to come soon). Therefore, all data files are uploaded to Rapid-I webspace now, and the processes are also on myExperiment bundled in a pack .

The 2011 challenges were quite fun and were dealing with Hobbits, Vodka, and our latest, brand new product: RapidDraw. The processes are quite instructive and are worth playing around with. With the RapidMiner Community Extension you can download the processes directly from myExperiment into RapidMiner (just search for RCOMM). Alternatively, view the pack description on myExperiment.

Plotter 22 Sep 2011
The RapidMiner Plotters 16: Bars by Ingo Mierswa Comment (2)

Overview "Bars"

  • Summary: Perform simple aggregations on your data (like sums, min or max) and show those values with respect to defined groups
  • Number of Dimensions: 2, one for the grouping and one for the (aggregated) values
  • Data Types: Numerical, Nominal

This is the next post of a series describing all RapidMiner plotters in detail. A list of the plotters discussed so far can be found at the end of this article including the links to them. Since many options and controls of these plotters are also relevant for the one discussed here - as well as for many other plotters - I recommend to check out the first parts of this series before reading this one.

Before we start our discussion about the plotters Bars,  we will again first have a look:

 

As you can see, the bar plotter consists of several bars representing values (on the y-axis) for selected groups (on the x-axis). In principle, the bar plotter is very similar to the plotters Pie, Pie 3D, and Ring which we have discussed in a previous blog post . The basic idea of this type of charts is to present a number of numerical value where each value represents a group. There are two typical application areas for this:

  • You have a data set with two columns, one column with a set of (un-)ordered nominal values and a second one containing a numerical value for each group;
  • You again have a data set with a nominal and a numerical column, but now you have each nominal value several times in your table. The goal then is to aggregate the numerical values for each group defined by each of the nominal labels.

It is important to see that in the first case, each nominal value only occurs once and hence there is no need for any calculation on the numerical values. In the second case, you usually would like to perform simple aggregations on your numerical data (like sums, min or max) or at least to calculate the count of your nominal values for each group. Hence, you would like to show those calculated / aggregated values with respect to the defined groups.

Each (calculated) number will be presented by a bar where the height of the bar corresponds to the absolute value. This differs from the Pie charts, where each slice represents the relative amount the number builds of the total sum. Look at the example above, where we used the famous Iris data set and where you can see the different average values for attribute "a3" with respect to the three groups defined by the labels / classes.

As always, you can find a list of settings on the left. The first setting is the Group-By Column. This will typically be a nominal-valued column from your data set which defined the groups into which the data set will be divided and presented by the elements of the chart. The setting Legend Column changes the labels at the bars to the values of the selected column. Since the only useful option is None or the grouping column, it can be ignored in most cases and will probably be removed in one of the next versions anyway.

The next important setting is the Value Column. Here you can select the usually numerical column which is used for value calculation. If you only have one row for each nominal value in the grouping column, you most often already have aggregated values ready for displaying. In other cases, you will have to define a matching Aggregation function, for example the sum or average of the values in each group. There are two additional settings which can be used to further fine-tune the plotting: Absolute Values means that only absolute values of the value column are used as input for the aggregation function. And the setting Use Only Distinct means that each value only is used exactly once in the aggregation, i.e. additional equal values are ignored.

The next setting allows to rotate the labels on the x-axis by 90 degree which allows to read longer labels or prevent label overlapping in case of large amounts of groups. Finally, you can define the orientation of the bar plotter, i.e. if the bars should be displayed vertically (default) or horizontally.

Other parts of the plotter series:

VideosRapidMinerETL 15 Sep 2011
Video Series about ETL with RapidMiner by Ingo Mierswa Comment (0)

He did it again!

Here is the first video, please find the rest in Neils blog (see links above):

 

 

Please visit the

We are sure that we speak for the many users out there when we thank you, Neil, for putting these efforts into producing those videos - they are certainly helping a lot!

Plotter 8 Sep 2011
The RapidMiner Plotters 15: Pie, Pie 3D, and Ring by Ingo Mierswa Comment (0)

Overview "Pie", "Pie 3D", and "Ring"

  • Summary: Perform simple aggregations on your data (like sums, min or max) and show those values with respect to defined groups
  • Number of Dimensions: 2, one for the grouping and one for the (aggregated) values
  • Data Types: Numerical, Nominal

This is the next post of a series describing all RapidMiner plotters in detail. A list of the plotters discussed so far can be found at the end of this article including the links to them. Since many options and controls of these plotters are also relevant for the one discussed here - as well as for many other plotters - I recommend to check out the first parts of this series before reading this one.

Before we start our discussion about the plotters Pie, Pie 3D, and Ring, we will again first have a look:

 

 

 

The three plotters Pie, Pie 3D, and Ring are very similar to each other. We will demonstrate all plotter functions with the Pie chart and show screenshots for the other two plotters later on. The basic idea of this type of charts - which also include Bar charts which will be discussed in the next part of the series - is to present a number of numerical value where each value represents a group. There are two typical application areas for this:

  • You have a data set with two columns, one column with a set of (un-)ordered nominal values and a second one containing a numerical value for each group;
  • You again have a data set with a nominal and a numerical column, but now you have each nominal value several times in your table. The goal then is to aggregate the numerical values for each group defined by each of the nominal labels.

It is important to see that in the first case, each nominal value only occurs once and hence there is no need for any calculation on the numerical values. In the second case, you usually would like to perform simple aggregations on your numerical data (like sums, min or max) or at least to calculate the count of your nominal values for each group. Hence, you would like to show those calculated / aggregated values with respect to the defined groups.

The charts Pie, Pie 3D, and Ring are different to almost all other types of charts: there is no background or scales involved. Instead of that, each (calculated) number will be presented by a slice of the pie where the area of the slice corresponds to the relative amount the number builds of the total sum. Look at the example above, where we used the famous Iris data set and where you can see the different average values for attribute "a3" with respect to the three groups defined by the labels / classes.

 

 

As always, you can find a list of settings on the left. The first setting is the Group-By Column. This will typically be a nominal-valued column from your data set which defined the groups into which the data set will be divided and presented by the elements of the chart. The setting Legend Column changes the labels at the slices to the values of the selected column. Since the only useful option is None or the grouping column, it can be ignored in most cases and will probably be removed in one of the next versions anyway.

The next important setting is the Value Column. Here you can select the usually numerical column which is used for value calculation. If you only have one row for each nominal value in the grouping column, you most often already have aggregated values ready for displaying. In other cases, you will have to define a matching Aggregation function, for example the sum or average of the values in each group. There are two additional settings which can be used to further fine-tune the plotting: Absolute Values means that only absolute values of the value column are used as input for the aggregation function. And the setting Use Only Distinct means that each value only is used exactly once in the aggregation, i.e. additional equal values are ignored.

 

 

The last possible setting, which is only available for Pie and Ring but not for Pie 3D, is the definition of so-called Explosion Groups. You can here select one or several of the possible groups and move them out of the rest with the slider Explosion Amount. This can help to highlight selected groups as shown in the first picture above.

Other parts of the plotter series:

Untagged  11 Aug 2011
Interview on DecisionStats by Ingo Mierswa Comment (0)

It was quite calm in the Rapid-I blog during the last weeks, sorry for that... It's vacation time and those of us who have to stay are quite busy these days.

In the meantime, you might be interested in an interview given by Simon and myself to Ajay Ohri of DecisionStats . We are talking about the new Rapid-I marketplace and new extensions , big data analytics, hadoop, and mobile computing for business analytics.

 

<< Start < Prev 1 2 3 4 5 6 7 8 9 10 Next > End >>
  • Share/Bookmark
  • Stay tuned with our RSS feed!
  • Watch videos on our YouTube channel!
  • Rapid Insight / Inside Rapid-I (Blog)
  • Visit Rapid-I on Facebook and become our fan!
  • Follow Rapid-I on Twitter!
  • Read the Rapid-I Newsletter