Pages: [1]
  Print  
Author Topic: An error in aggregation operator + kmeans  (Read 2383 times)
Shubha
Full Member
***
Posts: 141


« on: March 19, 2009, 10:19:22 AM »

Hi,

This is my code:

<operator name="Root" class="Process" expanded="yes">
    <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
        <parameter key="number_examples"   value="25"/>
        <parameter key="target_function"   value="random"/>
    </operator>
    <operator name="AttributeFilter" class="AttributeFilter">
        <parameter key="condition_class"   value="attribute_name_filter"/>
        <parameter key="parameter_string"   value="att.*"/>
    </operator>
    <operator name="KMeans" class="KMeans">
        <parameter key="k"   value="3"/>
    </operator>
    <operator name="Aggregation" class="Aggregation">
        <list key="aggregation_attributes">
          <parameter key="att1"   value="average"/>
        </list>
        <parameter key="group_by_attributes"   value="cluster"/>
    </operator>
</operator>


I get the error that the attribute 'cluster' does not exist. But after running the kmeans, a new attribute 'cluster' was created in the exampleset. So, why is this error? Or is it reading the initial input example set ? How do i tell RM to read that particular data which was generated by applying the kmeans?

Thanks, Shubha
Logged
Tobias Malbrecht
Global Moderator
Sr. Member
*****
Posts: 293



WWW
« Reply #1 on: March 19, 2009, 10:39:31 AM »

Hi Shubha,

this error happens because the Aggregation operator only searches through the regular attributes when matching attribute names. We will add a parameter work_on_special in the near future. Until then you have to change the type of the cluster attribute to regular before applying the Aggregation.

Kind regards,
Tobias
Logged

Tobias Malbrecht
Director of Product Marketing
RapidMiner
Shubha
Full Member
***
Posts: 141


« Reply #2 on: March 19, 2009, 11:10:12 AM »

Thanks Tobias! That exactly did my job...

One more question, can i specicy all the variables namely (att1, att2, att3, att4, att5) in the aggregate function? (in the above code i posted, only att1 is used). I tried by using the regular expression, att.*. But there is an error, "The attribute 'att.*' doesn't exist". I am sure that i am missing something... What could it be?

Thanks again,
Shubha
Logged
Tobias Malbrecht
Global Moderator
Sr. Member
*****
Posts: 293



WWW
« Reply #3 on: March 19, 2009, 11:13:30 AM »

Hi Shubha,

One more question, can i specicy all the variables namely (att1, att2, att3, att4, att5) in the aggregate function? (in the above code i posted, only att1 is used). I tried by using the regular expression, att.*. But there is an error, "The attribute 'att.*' doesn't exist". I am sure that i am missing something... What could it be?

You can specify more. You just have to extend the list and therewith specify more than one aggregation attribute and function. Please see the attached code:

Code:
<operator name="Aggregation" class="Aggregation">
    <list key="aggregation_attributes">
      <parameter key="att1"   value="average"/>
      <parameter key="att2"   value="average"/>
    </list>
    <parameter key="group_by_attributes"   value="cluster"/>
</operator>

Regards,
Tobias
Logged

Tobias Malbrecht
Director of Product Marketing
RapidMiner
Shubha
Full Member
***
Posts: 141


« Reply #4 on: March 19, 2009, 11:20:38 AM »

Thank you very much...
Logged
Shubha
Full Member
***
Posts: 141


« Reply #5 on: March 20, 2009, 01:00:26 PM »

Was just thinking if I could avoid adding each variable for aggregation. Instead, can I specify all the variables which needs to be aggregated by a regular expression or so?

Thanks, Shubha
Logged
haddock
Hero Member
*****
Posts: 853



WWW
« Reply #6 on: March 20, 2009, 03:40:39 PM »

Sure

Code:
<operator name="Root" class="Process" expanded="yes">
    <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
        <parameter key="number_examples" value="25"/>
        <parameter key="target_function" value="random"/>
    </operator>
    <operator name="AttributeFilter" class="AttributeFilter">
        <parameter key="condition_class" value="attribute_name_filter"/>
        <parameter key="parameter_string" value="att.*"/>
    </operator>
    <operator name="KMeans" class="KMeans">
        <parameter key="k" value="3"/>
    </operator>
    <operator name="AttributeAggregation" class="AttributeAggregation">
        <parameter key="aggregation_attributes" value="at.*"/>
        <parameter key="attribute_name" value="nu"/>
    </operator>
</operator>
Logged

Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in information?

T.S.Eliot ~ Choruses from the Rock 1934
Shubha
Full Member
***
Posts: 141


« Reply #7 on: March 20, 2009, 04:44:12 PM »

Thanks,

This does different i guess. But, surely this will answer another question of mine. AttributeAggregation is something which I learnt new today. Thanks.

What i need was for each group of nominal cluster attribute, i need the average of all the 'att' attributes, (i.e., The above can do averages row-wise, but actually i need column-wise) without actually specifying each of the variables.

Secondly, unlike 'Aggregation', the operator 'AttributeAggregation' will not perform the operation groupwise.

Thirdly, if my attrubutes have different names, unlike att1, att2,... i cant use the regular expressions too...


Thanking you,
Shubha
Logged
haddock
Hero Member
*****
Posts: 853



WWW
« Reply #8 on: March 24, 2009, 12:30:31 PM »

Oh dear, so wrong on so many levels, as the following shows.

Code:
<process version="4.3">

  <operator name="Root" class="Process" expanded="yes">
      <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
          <parameter key="number_examples" value="200"/>
          <parameter key="number_of_attributes" value="4"/>
          <parameter key="target_function" value="random"/>
      </operator>
      <operator name="ExampleSetTranspose" class="ExampleSetTranspose">
      </operator>
      <operator name="AttributeAggregation" class="AttributeAggregation">
          <parameter key="aggregation_attributes" value="at.*"/>
          <parameter key="aggregation_function" value="average"/>
          <parameter key="attribute_name" value="Mmm"/>
      </operator>
      <operator name="FeatureNameFilter" class="FeatureNameFilter" breakpoints="after">
          <parameter key="skip_features_with_name" value="at.*"/>
      </operator>
      <operator name="ExampleSetTranspose (2)" class="ExampleSetTranspose">
      </operator>
  </operator>

</process>

Logged

Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in information?

T.S.Eliot ~ Choruses from the Rock 1934
Shubha
Full Member
***
Posts: 141


« Reply #9 on: March 24, 2009, 01:01:53 PM »

Thank you very much...  Smiley Transpose was exactly what i thought too... I was just thinking if the examples/observations were too many,will  this method be OK? And also, again trasposing.... Wanted to confirm from you experts, if I am missing some useful operators specially made for these purposes...

I can also see the application of feature here.. Smiley... Many Thanks for clearing all my queries...



Logged
Pages: [1]
  Print  
 
Jump to: