Pages: [1]
  Print  
Author Topic: weighted nearest neighbor + crossvalidation  (Read 2672 times)
Ulli
Guest
« on: May 28, 2008, 01:35:47 PM »

Hi, I cannot figure out how to integrate learning feature weights in a nearest
neighbor algorithm using 10 fold cross-validation.
Nearest Neighbor and cross validation alone is no problem. But the
usage of weights complicate this a lot. The weights should be learned on the
training data and using the cross-validation operator applied on the evaluation data.
Is this possible to do this with the GUI or do I have to write the cross validation
myself without employing the cross validation operator?

Thank you for any help.
Logged
Ingo Mierswa
Administrator
Hero Member
*****
Posts: 1226



WWW
« Reply #1 on: May 28, 2008, 02:12:25 PM »

Hi Ulli,

no coding is necessary for this (actually, problems like these were the reason for the modular operator concept or RapidMiner). This can actually be done with nested cross validations, i.e. an outer cross valivation where the learner is embedded into a feature weighting scheme like EvolutionaryWeighting containing an inner cross validation for optimizing the weights. However, it is even more comfortable to use the operator "WrapperXValidation" as outer cross validation for this task. From the operator info dialog (F1) of this operator:

Quote
This operator evaluates the performance of feature weighting and selection algorithms. The first inner operator is the algorithm to be evaluated itself. It must return an attribute weights vector which is applied on the test data. This fold is used to create a new model using the second inner operator and retrieve a performance vector using the third inner operator. This performance vector serves as a performance indicator for the actual algorithm. This implementation of a MethodValidationChain works similar to the XValidation.

And here are the inner conditions (also from the operator info dialog:

Quote
  • Operator 1 (Wrapper) must be able to handle [ExampleSet] and must deliver [AttributeWeights].
  • Operator 2 (Training) must be able to handle [ExampleSet] and must deliver [Model].
  • Operator 3 (Testing) must be able to handle [ExampleSet, Model] and must deliver [PerformanceVector].

So this is how could setup a process for Nearest Neighbors together with evolutionary attribute weighting:

Code:
<operator name="Root" class="Process" expanded="yes">
    <operator name="DataGeneration" class="OperatorChain" expanded="yes">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="number_examples" value="200"/>
            <parameter key="number_of_attributes" value="3"/>
            <parameter key="target_function" value="sum classification"/>
        </operator>
        <operator name="NoiseGenerator" class="NoiseGenerator">
            <parameter key="label_noise" value="0.0"/>
            <list key="noise">
            </list>
            <parameter key="random_attributes" value="3"/>
        </operator>
        <operator name="Normalization" class="Normalization">
            <parameter key="z_transform" value="false"/>
        </operator>
    </operator>
    <operator name="WrapperXValidation" class="WrapperXValidation" expanded="yes">
        <parameter key="number_of_validations" value="5"/>
        <operator name="EvolutionaryWeighting" class="EvolutionaryWeighting" expanded="yes">
            <parameter key="maximum_number_of_generations" value="20"/>
            <parameter key="p_crossover" value="0.5"/>
            <parameter key="population_size" value="2"/>
            <operator name="XValidation" class="XValidation" expanded="yes">
                <parameter key="number_of_validations" value="5"/>
                <operator name="WeightLearner" class="NearestNeighbors">
                    <parameter key="k" value="5"/>
                </operator>
                <operator name="OperatorChain" class="OperatorChain" expanded="yes">
                    <operator name="ModelApplier" class="ModelApplier">
                        <list key="application_parameters">
                        </list>
                    </operator>
                    <operator name="Performance" class="Performance">
                    </operator>
                </operator>
            </operator>
        </operator>
        <operator name="WeightedModelLearner" class="NearestNeighbors">
            <parameter key="k" value="5"/>
        </operator>
        <operator name="WeightedApplierChain" class="OperatorChain" expanded="yes">
            <operator name="WeightedModelApplier" class="ModelApplier">
                <list key="application_parameters">
                </list>
            </operator>
            <operator name="WeightedPerformance" class="Performance">
            </operator>
        </operator>
    </operator>
</operator>


The process will run several minutes. After the process has finished, the performance is delivered together with an averaged weight vector from all runs. This vector for example could be saved and applied on new data sets for application. In the example above, the found weights should be something like

Code:
att1          0.9181578856281039
att3          0.8079093341177875
att2          0.5669022824248217
random1       0.4395652799419607
random2       0.25727249709958755
random        0.047672333763268744

Cheers,
Ingo
Logged

Did you try our new Marketplace? Upload or download new Extensions, add comments, and organize your operators. Have a look at  http://marketplace.rapid-i.com
Pages: [1]
  Print  
 
Jump to: