Pages: [1]
  Print  
Author Topic: Usage of Excel, CSV and SPSS Daa-Files  (Read 3218 times)
misskeynes
Newbie
*
Posts: 4


« on: June 24, 2008, 11:21:43 AM »

Hello,
I am just getting started with this powerful tool and after following the tutorial I am thrilled because of the many possibilties RapidMiner offers though - or maby because  Wink - beeing an open source Software. Unfortunately I am having some problems with using my data, which is in SPSS, Excel or CSV-Format. When using the Wizard, my SPSS Files are not recognized, the data looks totally chaotic in the preview, my CSV can be recognized, but there are some problems concerning the columns, the wizard tells me, that there are different numbers of columns detected in different rows.  I could not find any conclusion for my problems in the tutorial or in the comunity-discussions. Can somebody please help me?
Im working with MS Windows XP. Is that a problem? I have installed all new java-updates etc.

Thanks a lot in advance,

greetz from munich
Vera
Logged
Tobias Malbrecht
Global Moderator
Sr. Member
*****
Posts: 293



WWW
« Reply #1 on: June 24, 2008, 12:58:30 PM »

Hi Vera,

unfortunately the wizard does not support Excel or SPSS file directly. For reading these files you should simply place (drag and drop) a ExcelExampleSource or SPSSExampleSource operator from the New Operator tab, group IO.Examples into the operator tree and specify the file that you want to read in the file parameter. For CSV files the appropriate operator is named CSVExampleSource. I hope that solves your problem,

Regards,
Tobias
Logged

Tobias Malbrecht
Director of Product Marketing
RapidMiner
_misskeynes
Guest
« Reply #2 on: June 24, 2008, 01:13:12 PM »

Hi Tobias,

thanks for you quick reply.

it seems to be kind of muphys-law-like: ...now im able to import the spss data. great! but: when dragging the spssexamplesource to the left, I resume this errormessage:

G Jun 24, 2008 2:01:27 PM: [Error] Parameter 'filename' is not set and has no default value.


the when trying to run a Decision-Tree (or anything else) over the data, i resum this errormessage:

Error in: DecisionTree (DecisionTree) Input example set does not have a label attribute Many operators like classification and regression methods or the PerformancEvaluator require the input example sets to have a label or class attribute. If this not the case, applying these operators is pointless. If you read the data using an ExampleSource, you can specify the label attribute by using a 'label' tag in the attribute description file.


How can i define these attrbutes? it seemed to be so easy using the wizard...

sorry for the stupid questions, is there any further thing like a tutorial or similar that helps me getting comfortable with the software? i, very eager to get to know it better, since it seems to be the missing link for solving most of my analytical problems.

thanks in advance,
greeting from munich
vera
Logged
Tobias Malbrecht
Global Moderator
Sr. Member
*****
Posts: 293



WWW
« Reply #3 on: June 24, 2008, 01:31:49 PM »

Hi Vera,

well it might of course be a little bit complicated for novice users, but I think if you have once understood the concepts or - as you said - got comfortable with it, the functionaly of RapidMiner is nearly overwhelming. A good general start for learning to use RapidMiner is the built in tutorial which gives kind of a guided tour by presenting some example processes. The first processes mainly show how to load in the data, then how to define special attributes (see below) and then how to learn a model on the data. You might only need to have a look at the first few steps of the tutorial and you will get an understanding of how to do some simple analyses with RapidMiner.

But anyway, you are not far from successfully setting up an analysis. So first you already dragged an SPSSExampleSource operator into the operator tree. The error shown then is simply because you have not specified a filename yet. Therefore click on the operator in the operator tree, and the parameters will show up on the right side. There simply specify a file in the parameter filename and this will do. Then you have to specify a label or target variable (as it is called by statisticians). This can be done by the operator ChangeAttributeRole which can again be found in the operator lists on the right sight in the group Preprocessing.Attribtues.Filter. In the parameters of the ChangeAttributeRole operator you have to specify the attribute (name) that should be your label and as target_role chose label. Then you are able to use the DecisionTree subsequently.

I hope that clarifies the proceeding a little bit. Otherwise I have attached a demo process which will exactly suit your need. You just have to set the filename in the input operator:

Code:
<operator name="Root" class="Process" expanded="yes">
    <operator name="SPSSExampleSource" class="SPSSExampleSource">
        <parameter key="filename" value="sample\data\spss_data.sav"/>
    </operator>
    <operator name="ChangeAttributeRole" class="ChangeAttributeRole">
        <parameter key="name" value="DEGREE"/>
        <parameter key="target_role" value="label"/>
    </operator>
    <operator name="DecisionTree" class="DecisionTree">
    </operator>
</operator>

If you have more questions, please feel free to ask.
Regards from Dortmund,
Tobias
« Last Edit: June 24, 2008, 01:52:02 PM by Tobias Malbrecht » Logged

Tobias Malbrecht
Director of Product Marketing
RapidMiner
misskeynes
Newbie
*
Posts: 4


« Reply #4 on: June 24, 2008, 02:45:53 PM »

Hi Tobias,

thank you so much for your reply  Cheesy
a light appeared at the end of the tunnel now...


but i still recieve this error message

Error in: ChangeAttributeRole (ChangeAttributeRole) The attribute 'DEGREE' does not exist. The example set does not contain an attribute with the given name.

i tried to filll in a target variable which exists in my dataset instead of degree, too, but it didnt work either.

have you got any further advice for me?
sorry for bugging...but i just cant wait to have a first experience of success, because i too believe, that the possibilities rapidminer offers are overwhelming as you said.

thanks in advance
vera
Logged
Tobias Malbrecht
Global Moderator
Sr. Member
*****
Posts: 293



WWW
« Reply #5 on: June 24, 2008, 02:52:09 PM »

Hi Vera,

well ....

Error in: ChangeAttributeRole (ChangeAttributeRole) The attribute 'DEGREE' does not exist. The example set does not contain an attribute with the given name.

i tried to filll in a target variable which exists in my dataset instead of degree, too, but it didnt work either.

... of course the attribute DEGREE does presumably not exist in your dataset. Hence, you have to fill in the name of a variable which exists in your data set. I think the matching is case sensitive, so you have to be quite exact when specifying the name. You might want to put a break point after the input operator (by double-clicking on it) and then have a look at the meta data of your data set to find the name of a suitable attribute which you can then fill in the parameter of the ChangeAttributeRole operator.

Hope that helps,
regards,
Tobias
Logged

Tobias Malbrecht
Director of Product Marketing
RapidMiner
misskeynes
Newbie
*
Posts: 4


« Reply #6 on: June 24, 2008, 03:05:16 PM »

Hi Tobias,

I was not sure, if DEGREE was a technical term or a variable name of your data-set, so I treid both: DeGREE and my target-variable name. my mistake was, to write the variable name in lower case letters, not in capital letters. now my variable seems to be accepted. but now i recieve a new error:

Error in: DecisionTree (DecisionTree) This learning scheme does not have sufficient capabilities for the given data set: numerical label not supported Each learning scheme has particular capabilities for data set handling. For example, some learners can only handle numerical attributes and can not learn from nominal attributes. Please perform a preprocessing step to transform your data set or use an alternative learning scheme. In case of a polynominal label attribute, i.e. a classification task with more than two classes, you can use a learning scheme capable only for binominal classes by wrapping a Binary2MultiClassLearner around the learning operator.

but i think i just have to clean my data and get rid of all varialbes i do not need for the d-tree. no kitchen-sink-approach possible with this tool it seems ;-) spss is more generous, but i forgot that there is no possibility to drop variables from the analysis during the process.

so thanks a lot, im on my way now into a brighter future with rapidminer!

have a nice day and thank you for your patience,
vera
Logged
Tobias Malbrecht
Global Moderator
Sr. Member
*****
Posts: 293



WWW
« Reply #7 on: June 24, 2008, 03:16:52 PM »

Hi Vera,

Error in: DecisionTree (DecisionTree) This learning scheme does not have sufficient capabilities for the given data set: numerical label not supported Each learning scheme has particular capabilities for data set handling. For example, some learners can only handle numerical attributes and can not learn from nominal attributes. Please perform a preprocessing step to transform your data set or use an alternative learning scheme. In case of a polynominal label attribute, i.e. a classification task with more than two classes, you can use a learning scheme capable only for binominal classes by wrapping a Binary2MultiClassLearner around the learning operator.

This error message appears, because the variable you chose to be a label (target variable) seems to be numerical, but the decision tree learner can only build a model for nominal labels. As I assume your target variable should be nominal (as it has only discrete values) there are several possibilities how to come up with a solution. First: if your SPSS file has value labels you can switch on the parameter use_value_labels in the SPSSExampleSource operator. If the variable you want to specify as label has value labels (and is signed as nominal by SPSS) then no further step should be needed. The second way to overcome your problem is to use a Numerical2Nominal operator in combination with an AttributeSubsetPreprocessing operator. Therefore drag an AttributeSubsetPreprocessing in the operator tree above the ChangeAttributeRole operator. As parameter attribute_name_regex specify the target variable. Then drag a Numerical2Nominal operator onto the AttributeSubsetPreprocessing operator. This should solve your problem.

Hope that enlightens your future with RapidMiner even more .. ;-)
Regards,
Tobias

Regards,
Tobias
Logged

Tobias Malbrecht
Director of Product Marketing
RapidMiner
misskeynes
Newbie
*
Posts: 4


« Reply #8 on: June 24, 2008, 03:39:59 PM »

Hi Tobias,

thank you, i just figured out, that "numerical" does not refer to the variable typ but it refers to the fact if a variable ist discrete or metric (do you call it like that in english? must brush mine u a little bit...) now i changed the level to nominal and everything is fine  Cheesy

now ill try my best to derive some reasonable models from my data.

thank you so much,
best regards
vera
Logged
Pages: [1]
  Print  
 
Jump to: