Pages: [1]
 Author Topic: [SOLVED] How to convert nominal to numeric value? Urgent..Please have a look  (Read 7457 times)
Anuj_Gupta@us.crawco.com
Newbie

Posts: 3

 « on: June 02, 2011, 10:40:00 AM »

Hey Folks,

I am struggling with one query since long. I have nominal data, I want to convert it into

numerical data. I got to know from rapidminer that there are two ways  which are as follows

:

1. Using nominal to numeric operator.
2. Creating dummy variables (0 or 1), using nominal to binominal operator.

But for first method, my boss says it (nominal to numeric operator) is not correct method

and he is not at all happy with this operator. Can anybody suggest me that, it is correct or

not so that i can convince to my mentor.

While using second operator, I tried with this but doing this numbers of variables become

too high (as categories are more) then it leads to memory error.

So Can anybody guide or suggest me some other alternatives so that I can convert nominal

to numeric value.

Thanks for your timings and seeking for your valuable suggestions.
 « Last Edit: September 20, 2013, 12:21:17 PM by Marius » Logged
Ingo Mierswa
Hero Member

Posts: 1238

 « Reply #1 on: June 03, 2011, 03:16:13 PM »

Hi,

Quote
1. Using nominal to numeric operator.
2. Creating dummy variables (0 or 1), using nominal to binominal operator.

well, there is also a third one: you could first map the values to numbers you want to use instead (Operator: "Map") and parse them afterwards (Operator: "Parse Numbers").

Well, if the usage of the nominal to numeric operator is a problem or not depends a bit on what are you doing on which data. Indeed, the operator simply produces numbers based on the internal mapping used by RapidMiner. If you produce those numbers for two data sets with different mappings, those numbers would also differ. You can deal with this by ensuring that the same internal mapping is used for all data sets.

But still even then the internal mappings don't have any real meaning. For example, if you have the three nominal values "low", "medium", "high", you would probably would not like to end up with the numbers "2", "1", and "3" but would prefer at least something like "1", "2", and "3" instead. But even this might become problematic: Is "high" really exactly 1 more than "medium" compared to "medium" to "low". Who knows?

For both reasons (especially the second one since the first one can be dealt with if you are cautious) I would agree with your boss that method 2 should usually be preferred. If memory is getting low, you could try to create a view instead which calculates the values on the fly instead of directly calculating and storing them. If this still does not work, you could use method 3 introduced by me above so that at least both problems discussed above will be smaller.

Cheers,
Ingo
 Logged