Pages: [1]
  Print  
Author Topic: Standardization as a preprocessor  (Read 3434 times)
bcourtney
Guest
« on: June 19, 2008, 10:08:20 AM »

In addition to normalization as a preprocessing operator, a standardization preprocessing operator would be very handy.
Logged
Tobias Malbrecht
Global Moderator
Sr. Member
*****
Posts: 293



WWW
« Reply #1 on: June 19, 2008, 10:10:33 AM »

Hi,

what exactly do you mean with standardization?

Regards,
Tobias
Logged

Tobias Malbrecht
Director of Product Marketing
RapidMiner
bcourtney
Guest
« Reply #2 on: June 19, 2008, 10:25:06 AM »

Normalize -
A collection of numeric data is normalized by subtracting the minimum value from all values and dividing by the range of the data. This yields data with a similarly shaped histogram but with all values between 0 and 1.

Standardize -
A collection of numeric data is standardized by subtracting a measure of central location (such as the mean or median) and by dividing by some measure of spread (such as the standard deviation, interquartile range or range). This yields data with a similarly shaped histogram with values centered around 0.

-- It would be useful to be able to specify with a dropdown the the measure of the spread whether it is standard deviation, interquartile range, 5-95% etc, does this make sense? Also the selection of what you call doing a z-transform looks to be mutally exclusive with scaling, there is not a reason for this that I can tell and thus requires two operations.
Logged
Tobias Malbrecht
Global Moderator
Sr. Member
*****
Posts: 293



WWW
« Reply #3 on: June 19, 2008, 10:49:47 AM »

Hi,

ok, I got the point. These two tasks are indeed both contained in the Normalization operator where the normalization task is executed if the z-transform parameter is disabled. If it is enabled, the mean is substracted from all values and the difference is divided by the standard deviation of the values. We will think about extending the operator to support other scaling strategies as well for the next release.

Regards,
Tobias
Logged

Tobias Malbrecht
Director of Product Marketing
RapidMiner
Stefan_E
Jr. Member
**
Posts: 54


« Reply #4 on: October 01, 2008, 10:56:48 PM »

Tobias,

related I'm looking for a method to just subtract the mean without then dividing by the standard deviation.

This is eg. useful for NearestNeighbor learning, in case of measurement offsets: You don't want a z-Normalization there as that would effectively change influence of some parameters to the classification.

Regards,
Stefan
Logged
Ingo Mierswa
Administrator
Hero Member
*****
Posts: 1226



WWW
« Reply #5 on: October 09, 2008, 07:12:25 PM »

Hi,

Quote
related I'm looking for a method to just subtract the mean without then dividing by the standard deviation.

That's possible with the latest CVS version (and the following nice process):

Code:
<operator name="Root" class="Process" expanded="yes">
    <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
        <parameter key="number_examples" value="200"/>
        <parameter key="target_function" value="sum classification"/>
    </operator>
    <operator name="FeatureNameFilter" class="FeatureNameFilter">
        <parameter key="filter_special_features" value="true"/>
        <parameter key="skip_features_with_name" value="label"/>
    </operator>
    <operator name="ExampleSetTranspose" class="ExampleSetTranspose">
    </operator>
    <operator name="AttributeAggregation" class="AttributeAggregation">
        <parameter key="aggregation_attributes" value="att_.*"/>
        <parameter key="aggregation_function" value="average"/>
        <parameter key="attribute_name" value="mean"/>
    </operator>
    <operator name="FeatureIterator" class="FeatureIterator" expanded="yes">
        <parameter key="filter" value="att_.*"/>
        <parameter key="work_on_input" value="false"/>
        <operator name="AttributeConstruction" class="AttributeConstruction">
            <list key="function_descriptions">
              <parameter key="mean_%{loop_feature}" value="%{loop_feature}-mean"/>
            </list>
        </operator>
    </operator>
    <operator name="AttributeFilter" class="AttributeFilter">
        <parameter key="condition_class" value="attribute_name_filter"/>
        <parameter key="parameter_string" value="mean_.*"/>
    </operator>
    <operator name="ExampleSetTranspose (2)" class="ExampleSetTranspose">
    </operator>
    <operator name="AttributeFilter (2)" class="AttributeFilter">
        <parameter key="condition_class" value="attribute_name_filter"/>
        <parameter key="invert_filter" value="true"/>
        <parameter key="parameter_string" value="id"/>
    </operator>
</operator>

Cheers,
Ingo
Logged

Did you try our new Marketplace? Upload or download new Extensions, add comments, and organize your operators. Have a look at  http://marketplace.rapid-i.com
Pages: [1]
  Print  
 
Jump to: