Pages: [1]
  Print  
Author Topic: importing data with null values  (Read 3201 times)
b2
Guest
« on: July 19, 2008, 10:24:04 PM »

Is there a way to replace null values, or at least reject lines with nulls, during import? 

I am trying to import a file with scattered missing values and I can only import up to the first omission. 

The example for dealing with missing data I found in the tutorial has '?' in the data file for missing values.  My data has nothing; here is an example of my data: the 1st & 3rd lines are complete, the 2nd line is missing the 1st & last columns.
N282WN,WN,978,91,91,1525,1630,65,308,2,-1
,WN,1114,91,91,1850,1955,65,308,2,
N207WN,WN,1182,91,91,1405,1510,65,308,2,-1

This is the error I get:
 [Error] Data format error in line 393: the line does not provide the expected number of columns (was: 10, expected: 11)! Stop reading...


Thanks much!!
Logged
steffen
Sr. Member
****
Posts: 376



« Reply #1 on: July 20, 2008, 12:59:10 PM »

Hello b2

I copied your data into a simple text-file and loaded it with the operator "SimpleExampleSource" default settings using RapidMiner 4.2. I had no problems, the operator recognized all missing values.

idea: maybe the line 393 of your data is corrupted, e.g. a comma is missing.

hope this was helpful

Steffen
Logged

"I want to make computers do what I mean instead of what I say"
Read The Fantastic Manual
b2
Guest
« Reply #2 on: July 22, 2008, 12:16:56 AM »

Steffen,

Thank you for your help.

There are no missing commas.  Could it have to do with the fact that one of the missing fields is at the end or beginning of the line?  Is there an option I need to set?

I am using version community 4.1

I tried duplicating what you did.  I switched from ExampleSource to SimpleExampleSource and copied the input data back off this post into a new file.  I got a similar error.  This is the error:
Error in: SimpleExampleSource (SimpleExampleSource) Could not read file  ...\twig.txt': Number of columns in line 1 was unexpected, was: 10, expected: 11

Logged
steffen
Sr. Member
****
Posts: 376



« Reply #3 on: July 22, 2008, 07:20:55 AM »

Hello b2

Maybe it depends on the version. I remember something like this but I am not sure....
Is there any specific reason you cannot switch to 4.2 ?

greetings

Steffen
Logged

"I want to make computers do what I mean instead of what I say"
Read The Fantastic Manual
jean-charles
Guest
« Reply #4 on: July 22, 2008, 09:41:13 AM »

Hi Steffen,

You have all what is value replenishment, either replacing "unknown" values in metadata by a constant (typically zero), or by the attribute's mean. You have more sophisticated approaches where a learner trained on complete values is used to guess missing values, but I have never been able to understand how the operator works and is organized. You can use "Sparse array management" option in your (file/database)ExampleSource if needed.

This item could be a good wiki article in "data formats" Grin

Cheers,
  Jean-Charles.
Logged
steffen
Sr. Member
****
Posts: 376



« Reply #5 on: July 22, 2008, 01:09:58 PM »

Hello Jean-Charles

You have all what is value replenishment, either replacing "unknown" values in metadata by a constant (typically zero), or by the attribute's mean. You have more sophisticated approaches where a learner trained on complete values is used to guess missing values, but I have never been able to understand how the operator works and is organized.
Yes, but not during import.

You can use "Sparse array management" option in your (file/database)ExampleSource if needed.
Why ? As far I as see, Sparse Data Format is for data wiith a lot of missing values or a small number of different values (for efficient storage).

Quote
This item could be a good wiki article in "data formats" Grin
True, true...  Embarrassed

greetings

Steffen
Logged

"I want to make computers do what I mean instead of what I say"
Read The Fantastic Manual
Ingo Mierswa
Administrator
Hero Member
*****
Posts: 1226



WWW
« Reply #6 on: July 22, 2008, 04:28:00 PM »

Hi all,

actually there was a bug in versions < 4.2 for reading CSV-like data with missing values at the end of lines. The new version 4.2 which is available now on our web site does no longer contain this bug and everything should work fine as Steffen has pointed out. So I would suggest to upgrade to RM 4.2.

Cheers,
Ingo
Logged

Did you try our new Marketplace? Upload or download new Extensions, add comments, and organize your operators. Have a look at  http://marketplace.rapid-i.com
b2
Guest
« Reply #7 on: July 23, 2008, 08:40:09 AM »

Thank you all very much for your help.

I have upgraded to 4.2 and the same error occurs.  I have found that it happens when I have missing integer-type data, but not when I have missing nominal-type data.  I am beginning to think this may be a follow-on to the bug in version 4.1.

Is there a way to have the import skip incomplete lines?

thank you.
Logged
b2
Guest
« Reply #8 on: July 23, 2008, 08:48:05 AM »

ExampleSource was giving me trouble.

CSVExampleSource works fine.
Logged
Ingo Mierswa
Administrator
Hero Member
*****
Posts: 1226



WWW
« Reply #9 on: July 24, 2008, 12:31:27 PM »

Hi again,

maybe it would have worked with the ExampleSource operator, too (both operators are basically the same but with different parameter settings), so it might have something to do with quoting, line trimming, or the column separation parameter. However: good to hear it works now  Grin

Cheers,
Ingo
Logged

Did you try our new Marketplace? Upload or download new Extensions, add comments, and organize your operators. Have a look at  http://marketplace.rapid-i.com
Pages: [1]
  Print  
 
Jump to: