Pages: [1]
  Print  
Author Topic: How do i stop RapidMiner from converting large integers to scientific notation?  (Read 1628 times)
Keithr
Newbie
*
Posts: 10


« on: September 03, 2008, 06:50:21 PM »

Hi,

I'm using the K-means algorithm to cluster some data for my company, and the ID I'm using, which comes directly from our database, goes up to 9 digits  (e.g. 107,204,426).  The problem is that RM is converting this to scientific notation (e.g. 1.07204426E8).  Now I realize that I can just multiply this number by 10^8 to get the original number but would prefer to have RM leave the number as is so that I can easily insert this data back into our database.

The number is way below the max for an int (2.1 billion) so RM should be able to handle it easily.

Is there a way to stop RM from converting large numbers to scientific notation?

Thanks in advance.

Keith Robinson
Logged
Tobias Malbrecht
Global Moderator
Sr. Member
*****
Posts: 293



WWW
« Reply #1 on: September 06, 2008, 08:48:06 PM »

Hi Keith,

no need to post on multiple boards normally we look through all the boards for new questions! Wink

Concerning your question, do you mean the number representation you observe in RapidMiner or when the data is written back to the database? I just checked the number representation in RapidMiner with the value you mentioned and it works correctly. Maybe the behaviour is due to the database you are using?

Cheers,
Tobias
Logged

Tobias Malbrecht
Director of Product Marketing
RapidMiner
Keithr
Newbie
*
Posts: 10


« Reply #2 on: September 10, 2008, 02:47:26 PM »

Hi Tobias,

Sorry about posting my question twice.  After posting it the 1st time I realized that it was in the wrong forum so reposted it in the correct forum and tried to delete it from the wrong one but could not.

We use Sybase IQ, which is a database specifically for data warehousing.  The one downside to it is that loading records using an insert SQL statement is agonizingly slow.  However, from a flat file and using their proprietary SQL I can load over a million records in seconds.  So my plan is to us RapidMiner to create a CSV that I'll then automatically load into the db using Perl or ksh.  However, when I create a CSV file RM converts the customer ID to scientific notation.  Since no significant digits are lost I can easily convert it back to the actual number, but was wondering if there is a way to have RM keep the actual values.

Here are a couple of lines from the CSV file I created using a decision tree algorithm.  Notice that the 2nd to last column, which I'm passing in as an integer ID (107204426 ), now ends in "E8" (1.07204426E8).  How do I stop RM from doing this?

Thanks

Keith

"zsButcherSales","zsMreSales","zsPremiumSales","zsStandardSales","zsTobaccoSales","zsFrozenFoodsSales","zsGrocerySales","hhId","cluster"
-0.754741,-0.024232,-1.277233,-2.890393,-0.294866,-1.042057,-1.814267,1.07204426E8,"0"
Logged
Pages: [1]
  Print  
 
Jump to: