| RapidMiner, Preprocessing, Operator | 21 Jul 2009 |
| Subtract Mean Value from each Attribute by Ingo Mierswa | Comment (0) |
A question which was posted several times in the forum and which is also one often asked during our training courses is the following one:
"How can I calculate the mean value for each attribute and subtract it from the attribute values?"
Of course, one could use the Normalization operator with a normalization type set to "standardization". But in this case not only the mean value is subtracted but the value range is also changed in a way so that the standard deviation equals 1. This is of course not alway desired.
The following process shows how you can use the operator FeatureIterator in combination with a standard aggregation and a macro to achieve the desired goal. For each of the features, the mean value is calculated with the operator Aggregation and stored in a macro. Then the operator AttributeConstruction is used where for each feature the mean value is subtracted for each value.
After this has been done, the old features are removed and the new ones are renamed to the old names. That's it. Here is a picture of the process:

And here is the complete XML code:
<operator name="Root" class="Process" expanded="yes">
<operator name="ExampleSetGenerator" class="ExampleSetGenerator">
<parameter key="target_function" value="sum"/>
</operator>
<operator name="FeatureIterator" class="FeatureIterator" expanded="yes">
<parameter key="work_on_input" value="false"/>
<operator name="Aggregation" class="Aggregation">
<list key="aggregation_attributes">
<parameter key="%{loop_feature}" value="average"/>
</list>
</operator>
<operator name="DataMacroDefinition" class="DataMacroDefinition">
<parameter key="macro" value="current_average"/>
<parameter key="macro_type" value="data_value"/>
<parameter key="attribute_name" value="average(%{loop_feature})"/>
<parameter key="example_index" value="1"/>
</operator>
<operator name="IOConsumer" class="IOConsumer">
<parameter key="io_object" value="ExampleSet"/>
<parameter key="deletion_type" value="delete_one"/>
</operator>
<operator name="AttributeConstruction" class="AttributeConstruction">
<list key="function_descriptions">
<parameter key="norm_%{loop_feature}" value="%{loop_feature} - %{current_average}"/>
</list>
</operator>
</operator>
<operator name="AttributeFilter" class="AttributeFilter">
<parameter key="condition_class" value="attribute_name_filter"/>
<parameter key="parameter_string" value="norm_.*"/>
</operator>
<operator name="ChangeAttributeNamesReplace" class="ChangeAttributeNamesReplace">
<parameter key="replace_what" value="norm_"/>
</operator>
</operator>
Have fun!

