Pages: [1]
  Print  
Author Topic: Remove correlated attributes delivers strange result  (Read 352 times)
qwertz
Full Member
***
Posts: 102


« on: August 07, 2012, 04:11:37 PM »

Hi there,

I am (unfortunately) not an expert in correlations calculation but the result of this sample process seems strange to me.


First I run the code as is. --> Result includes all attributes
Then I change the parameter "filter relation" to property "less". --> Result still includes att1

To my understanding att1 can either have a correlation greater than 0.9 OR less than 0.9 but it cannot appear in both results...




Code:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.003">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.0.000" expanded="true" name="Root">
    <process expanded="true" height="512" width="640">
      <operator activated="true" class="generate_data" compatibility="5.2.003" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30"/>
      <operator activated="true" class="remove_correlated_attributes" compatibility="5.2.003" expanded="true" height="76" name="Remove Correlated Attributes" width="90" x="179" y="30">
        <parameter key="correlation" value="0.9"/>
      </operator>
      <connect from_op="Generate Data" from_port="output" to_op="Remove Correlated Attributes" to_port="example set input"/>
      <connect from_op="Remove Correlated Attributes" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>



Best regards
Sachs
Logged
haddock
Hero Member
*****
Posts: 837



WWW
« Reply #1 on: August 08, 2012, 07:32:39 AM »

Quote
First I run the code as is. --> Result includes all attributes

Really? Returns nothing on my machine.
Logged

Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in information?

T.S.Eliot ~ Choruses from the Rock 1934
qwertz
Full Member
***
Posts: 102


« Reply #2 on: August 08, 2012, 07:46:44 AM »


I just double checked on another machine. Works perfectly fine... except that I have no clue on what's happing in the background (as described above)...
« Last Edit: August 08, 2012, 07:49:43 AM by qwertz » Logged
haddock
Hero Member
*****
Posts: 837



WWW
« Reply #3 on: August 08, 2012, 08:19:49 AM »

Hi,

Odd, I ran twice before posting, but now it works as you say; you'd expect random data to get cleaned out, but it doesn't. The reason for this is noted in the help...

Quote
Please note that this operator might fail in some cases when the attributes should be filtered out, e.g. it might not be able to remove for example all negative correlated features. The reason for this behaviour seems to be that for the complete m x m - matrix of correlations (for m attributes) the correlations will not be recalculated and hence not checked if one of the attributes of the current pair was already marked for removal. That means for three attributes a1, a2, and a3 that it might be that a2 was already ruled out by the negative correlation with a1 and is now not able to rule out a3 any longer.

Err, yes, well  Wink
Logged

Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in information?

T.S.Eliot ~ Choruses from the Rock 1934
qwertz
Full Member
***
Posts: 102


« Reply #4 on: August 08, 2012, 09:18:27 AM »



Being able to read helps a lot... stupid me...

Though, I have to admit that I don't fully understand the content of the explanation.

Quote
The reason for this behaviour seems to be that for the complete m x m - matrix of correlations (for m attributes) the correlations will not be recalculated and hence not checked if one of the attributes of the current pair was already marked for removal. That means for three attributes a1, a2, and a3 that it might be that a2 was already ruled out by the negative correlation with a1 and is now not able to rule out a3 any longer.

So in the end I am not able to use this operator as I don't know in what cases attributes are removed correctly?
I am wondering then what the inteded use scenario is like?


Anyway, thanks a lot Smiley
Sachs
Logged
haddock
Hero Member
*****
Posts: 837



WWW
« Reply #5 on: August 08, 2012, 09:30:56 AM »

Hi,

Agreed, the explanation is a bit obscure, but at least there is one. On the other hand bear in mind..

1. It doesn't remove falsely, it may not remove completely, in that some may remain.
2. The use scenario may be different from mining random data for 90% correlation!
3. There are alternative dimension reducers.
4. Some learners, like SVMs, handle high dimensionality rather well.

Best

H
Logged

Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in information?

T.S.Eliot ~ Choruses from the Rock 1934
Pages: [1]
  Print  
 
Jump to: