Pages: [1]
  Print  
Author Topic: Loosing my ID field when using Principal Component Generator  (Read 2014 times)
Keithr
Newbie
*
Posts: 10


« on: October 27, 2008, 01:55:12 PM »

Hi,

I'm using the pricipal component generator to combine some highly correlated variables in a classification problem, which works fine, but it's dropping the ID I have so there is no way to tie the classification result back to the actual customer ID.

The process I'm using is as follows:
ExampleSource (label, ID, 82 variables) -> AttributeFilter (label, ID, 3 variables) -> principalComponentGenerator (label, 1 variable).

What am I doing wrong?

Thanks in advance for your help.

Keith
Logged
Sebastian Land
Administrator
Hero Member
*****
Posts: 2425


« Reply #1 on: October 27, 2008, 02:30:16 PM »

Hi Keith,
you probably forgot to set the ID attribute as special attribute. Every not special attributes are used within PCA and removed afterwards. You might define the ID attribute at the operator loading the data, if it provids an parameter "id_attribute" otherwise you can change the type lateron by using the ChangeAttributeRole operator:
Code:
    <operator name="ChangeAttributeRole" class="ChangeAttributeRole">
        <parameter key="name" value="ID"/>
        <parameter key="target_role" value="id"/>
    </operator>

Greetings,
  Sebastian
Logged
Keithr
Newbie
*
Posts: 10


« Reply #2 on: October 27, 2008, 03:38:29 PM »

Hi Sebastian,

I think I'm setting the ID up correctly in the aml file, and it does show in the meta data view before I run the PCA as an ID.

label     dropped                   binominal   mode = Y (624)   Y (624), N (624)   0.0
id           hhId                           integer   avg = 106,258,671.246 +/- 3,587,967.584   [100,731,555.000 ; 111,749,338.000]   0.0
regular   DELTA_1_SALES   real   avg = -33.475 +/- 43.687   [-100.000 ; 103.170]   0.0
regular   DELTA_1_TRIPS   real   avg = -22.330 +/- 48.379   [-100.000 ; 161.110]   0.0
regular   DELTA_1_CAT_PEN   real   avg = -30.810 +/- 38.007   [-100.000 ; 69.230]   0.0

But after the PCA runs the ID disappears and all I have left is the label and the PCA variable.  It also renames the label from dropped to "label".

label            label   nominal   mode = Y (624)   N (624), Y (624)   0.0
regular   pc_1   real   avg = -50.189 +/- 70.917   [-173.156 ; 160.344]   0.0

aml file:
<attributeset default_source="8051_D_75_C3_200810_Hurdle25_50pctDec_Train_5050.psv">

  <id
name      ="hhId"
sourcecol = "1"
valuetype = "integer"
/>

  <label
name      ="dropped"
sourcecol = "2"
valuetype = "binominal">
<value>Y</value>
<value>N</value>
</label>

  <attribute
name      ="baselineSales"
sourcecol = "3"
valuetype = "integer"
/>
Logged
Tobias Malbrecht
Global Moderator
Sr. Member
*****
Posts: 293



WWW
« Reply #3 on: October 27, 2008, 04:09:58 PM »

Hi Keith,

the operator PrincipalComponentGenerator is outdated, please use the operator PCA instead. This operator outputs a model which can then be applied to the data using the ModelApplier. This way, all special attributes (label and id) should be kept.

Regards,
Tobias
Logged

Tobias Malbrecht
Director Product Marketing
RapidMiner GmbH
Keithr
Newbie
*
Posts: 10


« Reply #4 on: October 27, 2008, 05:45:25 PM »

Hi Tobias,

That did the trick.

Thanks a lot for your help!

Keith
Logged
Pages: [1]
  Print  
 
Jump to: