Pages: [1]
  Print  
Author Topic: Filtering the Results  (Read 2095 times)
Benjamin
Guest
« on: October 06, 2008, 10:50:29 AM »

Hey guys,

I want to compare the Rapid Miner for a university project with IBM Omnifind. For that Iīd like to run the same scenario in both aplications. Donīt worry itīs a really simple one. Iīll give you the descripton and then what my problem is.

Scenario:
I use the NHTSA data base which contains many many problem reports of cars in America. I splitted every report in a seperate file. Now I want to compare the problem reports in a Correlation Matrix und filter it for the keyword fire. What I can see now is that I have a strong correlation between a car brand and a part of a car.

How to do this in RapidMiner:

I splitted the main file so that I have 1000 files each containing a problem report. Then I load the files via:
Textinput->StringTokenizer->English Stopwordfilter->TokenLengthFilter->Porterstemmer.

After that I use the Correlation Matrix. The thing is that I get too many data. I want to filter the results so that I use only the files which contain the keyword I want to filter. In my case that is "fire". Is that possible? I get at the moment a wide range Correlation Matrix but canīt really use it. Plotting the results is not possible because of too much data.

I hope that you can help me.

Cheers
Benjamin
Logged
Benjamin
Guest
« Reply #1 on: October 07, 2008, 10:59:36 AM »

ok, letīs specify my wish. Iīd like to filter for some key words my dataset and do then a CorrelationMatrix. So that I can see if I filter for my keyword Fire that we have a strong correlation between Ford and door. Maybe I have to use AttributeWeightSelection.

please help
Logged
Ingo Mierswa
Administrator
Hero Member
*****
Posts: 1196



WWW
« Reply #2 on: October 09, 2008, 06:17:31 PM »

Hi,

the solution is quite simple: just use the operator "ExampleFilter" before applying the correlation matrix and filter out all examples where the TFIDF value for the keyword (here: Fire) or it's corresponding wordstem is 0. After that, you should apply a "RemoveUselessAttributes" operator to filter out all now constant attributes. Then apply the correlation matrix.

Cheers,
Ingo
Logged

Did you try our new Marketplace? Upload or download new Extensions, add comments, and organize your operators. Have a look at  http://marketplace.rapid-i.com
Benjamin
Guest
« Reply #3 on: October 13, 2008, 11:50:45 AM »

about the example filter. I set the parameter string to fire but I donīt really know how to set the condition class. Can you tell me what I need to set here. If I set the parameter string then I get from  every configuration that it doesnīt work with a parameter.
Logged
Tobias Malbrecht
Global Moderator
Sr. Member
*****
Posts: 290



WWW
« Reply #4 on: October 13, 2008, 12:13:22 PM »

Hi,

you have to use the attribute_value_filter option of the condition_class parameter. As parameter_string you have to specify a condition. Whenever an example does not fulfill the condition, it is filtered from the example set. The following code should work for your example.

Code:
    <operator name="ExampleFilter" class="ExampleFilter">
        <parameter key="condition_class" value="attribute_value_filter"/>
        <parameter key="parameter_string" value="Fire<>0"/>
    </operator>

Hope that helps,
Tobias
Logged

Tobias Malbrecht
Rapid-I GmbH
Pages: [1]
  Print  
 
Jump to: