Pages: [1]
  Print  
Author Topic: Aggregate samples by cluster name and create "average" sample per cluster  (Read 422 times)
smarie
Newbie
*
Posts: 2


« on: January 18, 2011, 04:30:52 PM »

Hello Rapidminers,

I was wondering how to simply display the results of some clustering. In particular I would love to see the average sample of each cluster, and display all of them in the same window. I have found several ways to do that  but none is satisfactory:

- a) Use the "Aggregate" operator with the GroupBy="cluster" and all 100 attributes, one by one (I cant really do this !! Smiley ), in the "aggregation attributes" parameter (I haven't found any wildcard here to say that I want all real-valued attributes to be averaged)

- b) Use the "Multiply" operator as many times as needed (one per cluster). In each branch use filtering on the "cluster" attribute so that each branch now contains the subset of the sample set corresponding to "cluster_0", "cluster_1",... . Finally transpose the sample set and use the "Generate Aggregation" operator so that a new attribute is created, being the average of all others. Since the transpose operator has been used this new attribute is actually the new sample (the average of the samples in that cluster).
> issue: now I have x different samplesets (one for each cluster) and it seems that there is no operator to put all of them together in a new sampleset.


Is there an easy way to solve this problem, that is really simply an average of all rows belonging to each cluster group ? Maybe with the R plugin ?
Any help would be very much appreciated

Cheers

Sylvain
Logged
smarie
Newbie
*
Posts: 2


« Reply #1 on: January 18, 2011, 05:45:23 PM »

Hi,

I have thought of another way : maybe I can use a "script" operator in order to generate the correct inputs (the list of all real-valued attributes) and then pass them to the "Aggregate" operator. I came up with the following java code to use in the "script" operator, but I still have to dig the java sources to understand how to correctly pass the parameters and trigger the "Aggregate" operator from there.

Do you think this has a chance to work ?

Code:
ExampleSet exampleSet = operator.getInput(ExampleSet.class);

// getParameterAsBoolean(PARAMETER_ONLY_DISTINCT )
boolean onlyDistinctValues = false;

// getParameterAsBoolean(PARAMETER_IGNORE_MISSINGS )
boolean ignoreMissings = false;

/*
 * we create a list of tuples (attribute_name,"average") for all
 * real-valued attributes
 */
List<String[]> parameterList = new ArrayList<String[]>();
Attributes attributes = exampleSet.getAttributes();
Iterator<Attribute> r = attributes.allAttributes();
while (r.hasNext()) {
Attribute attribute = r.next();
if (Ontology.ATTRIBUTE_VALUE_TYPE.isA(attribute.getValueType(), Ontology.REAL)) {
parameterList.add(new String[] { attribute.getName(), "average" });
}
}

Operator aggOperator = new AggregationOperator (TODO....)

return aggOperator.apply();

 

Thanks in advance for your help
Best regards

Sylvain
Logged
Sebastian Land
Administrator
Hero Member
*****
Posts: 2421


« Reply #2 on: January 20, 2011, 12:32:30 PM »

Hi,
if it's a centroid based clustering this is already shown in the result screen.

Otherwise I must admit, you have a problem. Well, I wasn't aware that this isn't possible. Please write a feature request for that in the bug tracker.

Greetings,
 Sebastian
Logged

Hope to see you at RapidMiner Community Meeting and Conference (RCOMM 2011) in Dublin from June 7-10, 2011.
The Call for Paper is online now!
More information at http://www.rcomm2011.org
chikaya
Newbie
*
Posts: 7


« Reply #3 on: July 26, 2012, 10:01:37 AM »

I kow its an old thread, but anyway, I would vote for the feature too, just as Sebastians says it is already implemented in the Centroid based operators. I work with Spectra and would like to the "Centroids"  Huh when i use other clustering operators!
Logged
Pages: [1]
  Print  
 
Jump to: