Pages: [1]
  Print  
Author Topic: Read Excel changes integer to integer.0  (Read 1122 times)
CharlieFirpo
Newbie
*
Posts: 49


« on: August 06, 2013, 12:35:50 PM »

Dear All!

A have an excel table with a column that has text and number values as well. The first row of this column is the name (it is a text) of the RapidMiner attribute. The second row in excel is also a text. The third, fourth... are numbers, integers, like 1 2 3. These numbers are not 1.0 2.0 3.0. But when I use the Read Excel operator, RapidMiner changes these 1 2 3 numbers into 1.0 2.0 3.0. Why? How can I change them back? Using Real to Integer operator does not work, because this attribute has text value as well.

If I use Read CSV (reading a .csv with the same content), then during the read of the csv (using the Import Configuration Wizard), these 1 2 3 numbers are not change into 1.0 2.0 3.0. So reading csv is correct. I could use this operator, but it has a disadvantage. I can not set the Imported cell range. In Read Excel operator, there is a parameter to set the imported cell range.

So what is the solution? Read CSV or Read Excel operators do not read the tables automaticully. If I create a new (next) column in the csv or xls using not the RapidMiner, and after that I run my RapidMiner process, it will not see my new column. Why? Why the Read operators do not read the entire tables at every run of the process? They read only the actual values of the cell range set at first use of the Read operator.

Using Read Excel, I can set a large imported cell range, so if I create a new column or row in the table, the RapidMiner process will read the new column or row. But I'm not able to set the imported cell range at Read CSV.

But using Read Excel, my 1 2 3 numbers change into 1.0 2.0 3.0.

Is there any solution for my problem? I want a huge (not known) imported cell range, and I want my number in the format 1 2 3, not 1.0 2.0 3.0.

Thank you reading this and trying to help me!
« Last Edit: August 06, 2013, 12:43:49 PM by CharlieFirpo » Logged
CharlieFirpo
Newbie
*
Posts: 49


« Reply #1 on: August 06, 2013, 01:15:28 PM »

If I create a new row in my .xls or .csv table, then the Read CSV operator will automatically read the new row. Of course at Read Excel, it is needed to set a huge imported cell range to read the new row.
But the read operators do not read the new columns. Neither Read Excel if one set a huge imported cell range. It does not work. There are the 'data set meta data information' at both read operators, and here needed to add the new columns. But to add a huge number of columns does not work, because if an added column does not exist in the table, then the read operators will throw a failure message.

So if one does not know the exact size of a table (the exact number of the columns), and the number of the columns can change by the time, it is necessary to use the Import Configuration Wizard if the number of the columns change or set the correct 'data set meta data information' manually.

Sad

Nice day!
Logged
JEdward
Full Member
***
Posts: 145


« Reply #2 on: August 09, 2013, 03:24:13 PM »

Hi Charlie,

It is possible, but unfortunately I can't recall exactly how. 

I would have suggested using the Read Excel With Format operator & the using a regular expression to remove the formatting information, but it seems this is not working on my test. 

Code:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.012">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.3.012" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="read_excel_format" compatibility="5.3.012" expanded="true" height="60" name="Read Excel with Format" width="90" x="45" y="30">
        <parameter key="excel_file" value="C:\Users\john.heath\Desktop\Test.xls"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="5.3.012" expanded="true" height="76" name="Select Attributes" width="90" x="45" y="120">
        <parameter key="attribute_filter_type" value="regular_expression"/>
        <parameter key="regular_expression" value="(format)"/>
        <parameter key="invert_selection" value="true"/>
      </operator>
      <connect from_op="Read Excel with Format" from_port="output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

I think it is something like this that should work though. 
Good luck!
Logged
Pages: [1]
  Print  
 
Jump to: