Pages: [1]
  Print  
Author Topic: Help with workaround for Tools.handleAverages  (Read 1256 times)
fig
Newbie
*
Posts: 4


« on: February 11, 2009, 05:45:47 AM »

Hi,

It seems that IteratingPerformanceAverage does not handle nested averages properly, as demonstrated by the following process (which is a toy example of 2x2 Cross Validation):
Code:
<operator name="Root" class="Process" expanded="yes">
    <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
        <parameter key="number_of_attributes" value="20"/>
        <parameter key="target_function" value="random"/>
    </operator>
    <operator name="IteratingPerformanceAverage" class="IteratingPerformanceAverage" expanded="yes">
        <parameter key="iterations" value="2"/>
        <operator name="XValidation" class="XValidation" expanded="yes">
            <parameter key="number_of_validations" value="2"/>
            <parameter key="sampling_type" value="shuffled sampling"/>
            <operator name="LinearRegression" class="LinearRegression">
            </operator>
            <operator name="OperatorChain" class="OperatorChain" expanded="yes">
                <operator name="ModelApplier" class="ModelApplier">
                </operator>
                <operator name="RegressionPerformance" class="RegressionPerformance">
                    <parameter key="absolute_error" value="true"/>
                    <parameter key="main_criterion" value="absolute_error"/>
                </operator>
                <operator name="ProcessLog" class="ProcessLog">
                    <list key="log">
                      <parameter key="run" value="operator.XValidation.value.applycount"/>
                      <parameter key="fold" value="operator.XValidation.value.iteration"/>
                      <parameter key="error" value="operator.RegressionPerformance.value.absolute_error"/>
                    </list>
                </operator>
            </operator>
        </operator>
    </operator>
</operator>

After running the experiment the process log shows:

run  fold   error
1   0   0.249
1   1   0.278
2   0   0.359
2   1   0.278


The average of the first run (first two folds) is 0.264, of the second run (last two folds) is 0.319, and the overall average is 0.291.  However if you look at the performance vector returned from IteratingPerformanceAverage it shows the value as 0.282.

This is because in Tools.handleAverage (the outer call, from IteratingPerformanceAverage.apply) the first average vector is the average from the first run, with a value of 0.264 and an average count = 2.  However when the second average vector (from the second run, with value 0.319) is folded in, in the call to Averagable.buildAverage, it is treated as having an average count of only 1, whereas it should really have the same weight as the first average vector.  (Thus the weighted average of (2*0.264 + 1*0.319)/3 gives the incorrect reported value of 0.282.)

Can anyone suggest how to work around this?

I am thinking that in Tools.handleAverages when the first average vector is inserted its average count should be set to 1.

Any help will be greatly appreciated.
« Last Edit: February 11, 2009, 06:45:11 AM by fig » Logged
fig
Newbie
*
Posts: 4


« Reply #1 on: February 11, 2009, 05:52:14 AM »

Forgot to mention...
This is a follow up to an earlier post: http://rapid-i.com/rapidforum/index.php/topic,554.0.html.

Cheers,
A
Logged
Sebastian Land
Administrator
Hero Member
*****
Posts: 2426


« Reply #2 on: February 17, 2009, 12:50:58 PM »

Hi,
thanks for your hint. Because of your detailed description I was able to find the bug relativly quick. If you check out a version from cvs it is already fixed.

Greetings,
  Sebastian
Logged
fig
Newbie
*
Posts: 4


« Reply #3 on: February 18, 2009, 05:58:45 AM »

Yes, it works now.  Thank you so much!
Logged
Pages: [1]
  Print  
 
Jump to: