Pages: [1]
  Print  
Author Topic: Applying an operation to a large example set  (Read 434 times)
mikeb
Newbie
*
Posts: 5


« on: September 24, 2013, 05:00:09 PM »

Hi,
I have an example set with 10,000 examples and 3,800 attributes.  These are document file names and the TF-IDF values for 3800 terms in those documents.  I want to raise each TF-IDF value by the power of 0.75.  Is there a simple, fast way to do this?

What I have tried is looping through each of the attributes and generating a new attribute that is the TF-IDF value raised by the power of 0.75, then looping through the resulting collection and using recall, join, and remember operators to join each collection example to the previous ones as I iterate through the loop.  The problem is that this slows down and eventually stalls out or crashes as the iterations increase and the joined example set gets larger and larger.  So I am wondering if there is some more efficient way to do the (seemingly) simple thing of applying one operation like this to every value in the example set.

I should also mention that I looked at the Generate Function Set operator.  This looks like what I want, except that the specific operation I want to do is not included as one of the choices in that operator.

Thanks in advance for your help.
Logged
awchisholm
Sr. Member
****
Posts: 395


WWW
« Reply #1 on: September 24, 2013, 10:18:41 PM »

Hello mikeb

Groovy is the answer. Use the Script operator with this code.

Code:
ExampleSet exampleSet = operator.getInput(ExampleSet.class);

for (Attribute attribute : exampleSet.getAttributes()) {
    String name = attribute.getName();
    for (Example example : exampleSet) {
        example[name] = (example[name])**0.75;
    }
}

return exampleSet;

I did an experiment with 10,000 examples by 3,800 attributes and it took 2 minutes on my laptop. Obviously other's results may vary Smiley

regards

Andrew
Logged

mikeb
Newbie
*
Posts: 5


« Reply #2 on: September 25, 2013, 03:47:59 PM »

Hi awchisholm,
Thanks!  I think that will work for me.
mikeb
Logged
Pages: [1]
  Print  
 
Jump to: