Pages: [1]
  Print  
Author Topic: Missing rows with ExampleSource  (Read 1804 times)
MuehliMan
Jr. Member
**
Posts: 85


« on: July 22, 2008, 03:25:52 PM »

I am trying to import a large dataset into RM. As source I have a CSV File with about 200 rows and app. 250 columns.
(ExampleCSVSource gives an error complaining that there are different columns in line...)

Using the ExampleSource and the ExampleSource Wizard I can see in the lower part of the window that 189 rows and 251 columns to import, so I click the Finish button.
When click on the Edit... Button to see my dataset I get table with all 251 columns, but only 19 examples.

Where are my missing rows? Any help is welcome!
BTW: I am still using version 4.1
Logged
Ingo Mierswa
Administrator
Hero Member
*****
Posts: 1226



WWW
« Reply #1 on: July 22, 2008, 04:34:07 PM »

Hello,

in the AttributeEditor, you can define which rows should be shown and press the Update button in the panel on the left. You could of course also simply load the data and see if all data is there. Just run the process and check the meta data view and the data view.


Quote
(ExampleCSVSource gives an error complaining that there are different columns in line...)

If you have missing values in this data set at the end of the lines I would suggest upgrading to RapidMiner 4.2 since there was a bug for previous versions ignoring missing values at the end of lines in CSV files.

Cheers,
Ingo
Logged

Did you try our new Marketplace? Upload or download new Extensions, add comments, and organize your operators. Have a look at  http://marketplace.rapid-i.com
MuehliMan
Jr. Member
**
Posts: 85


« Reply #2 on: July 23, 2008, 09:33:20 AM »

Hi Ingo,

in the AttributeEditor, you can define which rows should be shown and press the Update button in the panel on the left. You could of course also simply load the data and see if all data is there. Just run the process and check the meta data view and the data view.

The number of examples in the AttributeEditor is given as 19 (20 rows) maximum.  If I open the dat file in a text editor I find  all of the entries there. Could it be that there is an option preventing RM from showing all entries?

Another funny thing about it is that if i import my data to OpenOffice, export the data as XLS File and load this file as ExcelExampleSource I get all columns and rows.

If you have missing values in this data set at the end of the lines I would suggest upgrading to RapidMiner 4.2 since there was a bug for previous versions ignoring missing values at the end of lines in CSV files.

I have read some threads about that bug in other posts and I have switched to version 4.2.

Thanks for your help,
Markus
Logged
Ingo Mierswa
Administrator
Hero Member
*****
Posts: 1226



WWW
« Reply #3 on: July 24, 2008, 12:36:05 PM »

Hello,

the attribute editor stops reading if anything goes wrong. So, I assume that there is something unusual with line 19 or 20. Probably, there is a problem with quoting or with the definition of the column separators not matching your data format. If you like, and the data is not too sensible, you could post an excerpt of your data and I could have a look what the problem might be.

Cheers,
Ingo
Logged

Did you try our new Marketplace? Upload or download new Extensions, add comments, and organize your operators. Have a look at  http://marketplace.rapid-i.com
MuehliMan
Jr. Member
**
Posts: 85


« Reply #4 on: July 25, 2008, 01:26:06 PM »

hi,

I am pretty sure that you are right with your assumption. I'll try and go through the CSV File with a text editor to check commas and the columns.

Greets,
Markus
Logged
Pages: [1]
  Print  
 
Jump to: