Open source software for big data analytics.
No programming required.

HomeContact UsSearchSitemapPrivacy PolicyImprint
  • Deutsch
  • English
Rapid-I. Report the Future. Home Download
Rapid-I Blog
Home Home
Search Search
RSS Feed RSS Feed

 

 

Blog Tags
Login Form





Lost Password?
No account yet? Register
RapidMinerPreprocessingModeling 29 Jun 2009
Preprocessing Models by Ingo Mierswa Comment (0)

A really nice feature of RapidMiner is the possibility to create preprocessing models, i.e. models which are not used for predictions but for transformations of the data set.

Most of the preprocessing operators support the generation of preprocessing models by simply activating the parameter "return_preprocessing_model". The following process, for example, generates two preprocessing models for the transformation from nominal attributes into binominal attributes (each attribute only has two different values now) and a second one for transforming all binominal attributes into numerical ones (consisting of 0 and 1 instead). This is very common preprocessing chain for the transformation of data sets containing nominal attributes into data sets consisting of numerical attributes only.

But what can be done if this transformation should not only be performed during the model training phase of your data mining project but also during the model application phase (scoring)? In that case, the preprocessing models become really handy since they can easily be applied to new data sets exactly like it is known for the usual prediction models. Check out the process below to see the details!

<operator name="Root" class="Process" expanded="yes">
    <operator name="DirectMailingExampleSetGenerator (Training Set)" class="DirectMailingExampleSetGenerator">
        <parameter key="number_examples"	value="1000"/>
    </operator>
    <operator name="ChangeAttributeRole (Training Set)" class="ChangeAttributeRole">
        <parameter key="name"	value="name"/>
        <parameter key="target_role"	value="id"/>
    </operator>
    <operator name="Preprocessing Models" class="OperatorChain" expanded="yes">
        <operator name="Nominal2Binominal" class="Nominal2Binominal">
            <parameter key="return_preprocessing_model"	value="true"/>
        </operator>
        <operator name="Nominal2Numerical" class="Nominal2Numerical">
            <parameter key="return_preprocessing_model"	value="true"/>
        </operator>
    </operator>
    <operator name="Training" class="LinearRegression" breakpoints="after">
        <parameter key="feature_selection"	value="none"/>
    </operator>
    <operator name="DirectMailingExampleSetGenerator (Test Set)" class="DirectMailingExampleSetGenerator">
        <parameter key="number_examples"	value="1000"/>
    </operator>
    <operator name="ChangeAttributeRole (Test Set)" class="ChangeAttributeRole">
        <parameter key="name"	value="name"/>
        <parameter key="target_role"	value="id"/>
    </operator>
    <operator name="IOSelector" class="IOSelector">
        <parameter key="io_object"	value="Model"/>
        <parameter key="select_which"	value="3"/>
    </operator>
    <operator name="ModelApplier" class="ModelApplier">
        <list key="application_parameters">
        </list>
    </operator>
    <operator name="IOSelector (2)" class="IOSelector">
        <parameter key="io_object"	value="Model"/>
        <parameter key="select_which"	value="2"/>
    </operator>
    <operator name="ModelApplier (2)" class="ModelApplier">
        <list key="application_parameters">
        </list>
    </operator>
    <operator name="ModelApplier (3)" class="ModelApplier">
        <list key="application_parameters">
        </list>
    </operator>
</operator>
RapidMinerRapid-I 24 Jun 2009
RapidMiner 5.0 by Ingo Mierswa Comment (1)

For all of you who are eager to get news about the next major release of RapidMiner, we have a little surpise: a first screenshot of the current state of the new graph based process design of RapidMiner 5.0.

In the next version of RapidMiner, it will be possible to use a data flow design like that known from Clementine  as well as the operator tree based design known from previous versions of RapidMiner. Here is a small glimpse:

Please note that this is not the final look of the data flow visualization but you should at least get the feeling what will be possible with RapidMiner 5.0. And as you can see, keeping both options (the tree as well as the flow) fills the gap between the efficiency of the operator tree (drawing all those arrows takes soooo much time  Wink ) and the easiness of a flow / graph based layout when it comes to visualizing and understanding the data flows.

From time to time we will reveal some details about RapidMiner 5 so please stay tuned and let us know what you think.

Cheers,
Ingo

Blog 22 Jun 2009
Welcome to our Data Mining & RapidMiner Blog by Ingo Mierswa Comment (7)

Hello,

 

greetings to all readers who found this blog about data mining in general and doing data mining with RapidMiner in special. From time to time, we will post

  • interesting facts from the world of data mining,
  • hints for optimizing your work with RapidMiner,
  • processes which show advanced data transformation and analysis  solutions, and
  • hot news directly out of the RapidMiner development kitchen.

 

So please stay tuned and come back from time to time. And please let us know what you think and post a comment.

 

Cheers,

Ingo

<< Start < Prev 11 12 13 Next > End >>
  • Share/Bookmark
  • Stay tuned with our RSS feed!
  • Watch videos on our YouTube channel!
  • Rapid Insight / Inside Rapid-I (Blog)
  • Visit Rapid-I on Facebook and become our fan!
  • Follow Rapid-I on Twitter!
  • Read the Rapid-I Newsletter