Pages: [1]
Author Topic: Question Mark on p-Value  (Read 1881 times)
Posts: 3

« on: January 16, 2014, 06:05:35 PM »

Hello Guys,

I'm doing a linear regression with a data set that have about 500k rows and 66 attributes, I'm running rapidminer on a windows os, rapidminer is using 8 gb of mem only for itself and a processor xeon 2.4GHz.  These are my problems:

First: The process takes about 40 minutes to finish, it seems a lot of time compared with other tools I've used

Second and more important: in the values of the p-values and std error and some other metrics I get an "?" (question mark), I don't know what that means and I starting to think that is something wrong with rm. I'm including a picture with the results

Thank you very much!!!
Marius Helf
Hero Member
Posts: 1811

« Reply #1 on: January 17, 2014, 10:15:11 AM »


RapidMiner's Linear Regression does not only do the actual regression, but also eliminates colinear features, performs a feature selection etc. This actually can take quite some time and often improves the model quality, but you can try to switch it off and see how the runtime is affected. Out of curiosity, which other tools are you using and how long do they need for your dataset?

The issue regarding the missing values has been forwarded to our development team.

Best regards,

Please add [SOLVED] to the topic title when your problem has been solved! (do so by editing the first post in the thread and modifying the title)
Please click here before posting.
Pages: [1]
Jump to: