Pages: [1]
  Print  
Author Topic: removing columns from a dataset/analysis  (Read 2019 times)
RapidMinerUser
Newbie
*
Posts: 5


« on: March 27, 2009, 12:15:31 PM »

Hi all,
I'm' new to Rapidminer so this is a basic question I'd appreciate your help with.

I'm running a decision tree (ID3Numerical).

How do I:
1) remove columns (variables) from a dataset before processing
2) alternatively, list the subset of the dataset's variables that I want to put through the tree?

In a related issue, how do I store and view a dataset that's been read in without running the whole ETL process again and pausing it after import?!

thanks in advance for all your help.

Richie
Logged
haddock
Hero Member
*****
Posts: 853



WWW
« Reply #1 on: March 27, 2009, 01:35:08 PM »

Hi,

Welcome to the whacky world of RM! Here's an answer to your questions....

Code:
<operator name="Root" class="Process" expanded="yes">
    <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
        <parameter key="target_function" value="random"/>
        <parameter key="number_examples" value="200"/>
    </operator>
    <operator name="FeatureNameFilter" class="FeatureNameFilter">
        <parameter key="skip_features_with_name" value="a.*"/>
        <parameter key="except_features_with_name" value="att5"/>
    </operator>
    <operator name="ExampleSetWriter" class="ExampleSetWriter">
        <parameter key="example_set_file" value="bla"/>
    </operator>
</operator>

Some say that reading the manuals and working through the tutorial and examples helps, others that it takes all the fun out of guessing.


Logged

Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in information?

T.S.Eliot ~ Choruses from the Rock 1934
RapidMinerUser
Newbie
*
Posts: 5


« Reply #2 on: March 27, 2009, 02:39:30 PM »

Thansk for addressing the first of my quetions and for producing an easy to follow example. I used a different operator, AttributeFilter with the same result.

How do I now view the datasets created so I can see the results of my operators? For example, in the example you give, how do I:
- view the first dataset before filtering
- view the last dataset after the FeaturenameFilter

I ask because the input and data preparation may be computationally expensive and I don't want to have to rerun them again.

Where is the reference in the documentation that you mention by the way? I always search the documentation first but found nothing on this basic ETL stuff.

Thanks,
R
Logged
haddock
Hero Member
*****
Posts: 853



WWW
« Reply #3 on: March 27, 2009, 03:36:15 PM »

Quote
How do I now view the datasets created so I can see the results of my operators? For example, in the example you give, how do I:
- view the first dataset before filtering
- view the last dataset after the FeaturenameFilter

By using breakpoints, like this....

Code:
<operator name="Root" class="Process" expanded="yes">
    <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
        <parameter key="target_function" value="random"/>
        <parameter key="number_examples" value="200"/>
    </operator>
    <operator name="FeatureNameFilter" class="FeatureNameFilter" breakpoints="before">
        <parameter key="skip_features_with_name" value="a.*"/>
        <parameter key="except_features_with_name" value="att5"/>
    </operator>
    <operator name="ExampleSetWriter" class="ExampleSetWriter">
        <parameter key="example_set_file" value="bla"/>
    </operator>
</operator>


The process halts before removing the attributes, to see the data check out the data view of the data table. Then continue and do the same to see what got written.

Quote
Where is the reference in the documentation that you mention by the way? I always search the documentation first but found nothing on this basic ETL stuff.

Curious, if I open rapidminer-4.2-tutorial.pdf and search on "remove attribute" the first hit is the section on the FeatureNameFilter operator. Equally, if you work through the tutorial ( Help->Rapidminer Tutorial ) you'll come across examples which use breakpoints,  the first in example four...

That being said I've griped before about the documentation, but believe you me RM is much better and easier to use than the documentation. Being a halfwit myself perhaps I should offer up an idiot's guide to the data underworld...

Good weekend  Wink
« Last Edit: March 27, 2009, 03:45:49 PM by haddock » Logged

Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in information?

T.S.Eliot ~ Choruses from the Rock 1934
RapidMinerUser
Newbie
*
Posts: 5


« Reply #4 on: March 27, 2009, 03:46:57 PM »

Thanks for the help haddock.

I ran Help\Rapidminer Tutorial but it has no search feature and when I close the tutorial dialog my whole process tree had been lost. So I gave up on that pretty quickly! I'll check the PDF you mention.

I think once these folks sort out their documentation they'll have a really excellent product that people will use. For an industry user wanting to get up and running quickly it's pretty lacking alright.

BTW, I have used breakpoints. Once you move on throuhg a breakpopint however, there's no way to go back and view the datasets. I'm coming from a SAS background where this is easy to do. Also, it's important to start a process at any stage since it's pointless to have to keep reading in the data before running any analytics.
Any pointers on where I can find out about that?

Thanks and have a good weekend yourself.

R

Logged
haddock
Hero Member
*****
Posts: 853



WWW
« Reply #5 on: March 27, 2009, 04:09:45 PM »

Nay probs, just copy the example set, like this....

Code:
<operator name="Root" class="Process" expanded="yes">
    <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
        <parameter key="target_function" value="random"/>
        <parameter key="number_examples" value="200"/>
    </operator>
    <operator name="IOMultiplier" class="IOMultiplier">
        <parameter key="io_object" value="ExampleSet"/>
    </operator>
    <operator name="FeatureNameFilter" class="FeatureNameFilter">
        <parameter key="skip_features_with_name" value="a.*"/>
        <parameter key="except_features_with_name" value="att5"/>
    </operator>
    <operator name="ExampleSetWriter" class="ExampleSetWriter">
        <parameter key="example_set_file" value="bla"/>
    </operator>
</operator>

I realise that you may not wish to clog up memory, so this is not perfect in the sense of a good debugger, still for what it is worth there is your answer.

For serious users I'd really recommend a course up at RM, in two days I learnt more than in the preceeding two months of grappling with the guesswork. Besides which Ralf is a very approachable tutor and genial lunch host  Cool
Logged

Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in information?

T.S.Eliot ~ Choruses from the Rock 1934
RapidMinerUser
Newbie
*
Posts: 5


« Reply #6 on: March 27, 2009, 04:31:38 PM »

well if there's lunch...!
Thanks that's exactly what we wanted. We're not short of storage here but can't afford the time to repeatedly run through a long ETL process.

Thanks,
Richie
Logged
Pages: [1]
  Print  
 
Jump to: