Pages: [1]
  Print  
Author Topic: What PC to buy specially for RM / Datamining  (Read 2271 times)
wessel
Sr. Member
****
Posts: 366


« on: November 11, 2009, 10:16:27 PM »

Hello,

I wish to buy a new pc for big data mining projects.
What should I buy?
What is important?
What OS should I use?

Regards,

Wessel
Logged
keith
Full Member
***
Posts: 159


« Reply #1 on: November 11, 2009, 11:38:30 PM »

As with many things, the answers depend on a lot of factors (how big is a big data mining project?  What's your budget?  Are you comfortable running Linux or will stick with Windows?).  But as general rules of thumb:

1. Memory (RAM) is critical to handling large data sets.  But getting more than 4 GB is a waste if you're running a 32-bit operating system (see next point).

2. A 64-bit operating system (of either Windows or Linux) will enable you to use memory more effectively than their 32-bit counterparts.  You'll be able to deal with bigger data sets, run more complex processes, and spend less time dealing with "out of memory" system errors. 

3.  If you're going to be using the free, community version of RM, then go for a faster CPU speed rather than more CPU cores.  Additional processing cores on CPU won't help unless you're running the commercial version of RapidMiner.   If you can upgrade to the commercial version, there are parallelizeable operators for cross validation, evolutionary algorithms, parameter iteration, et al that will take advantage of multiple cores nicely and speed up your processes dramatically.

4. Linux probably outperforms Windows, although I am running on mostly on Windows and have not found it performance to be inadequate, so I consider this less important than the first three recommendations.  If your source data is stored in a Microsoft SQL Server database, it may be beneficial to stick with Windows.  Maybe someone else can comment on the relative performance of RM on Linux vs. Windows.

5. Hard drive space will depend more on the size of your data than RM's own requirements.  There may be marginal benefits to getting disks with faster spindles (e.g. 15k rpm), but this isn't where I'd spend my money in all probability.
Logged
andk
Newbie
*
Posts: 23


« Reply #2 on: March 22, 2011, 05:29:48 PM »

3.  If you're going to be using the free, community version of RM, then go for a faster CPU speed rather than more CPU cores.  Additional processing cores on CPU won't help unless you're running the commercial version of RapidMiner.   If you can upgrade to the commercial version, there are parallelizeable operators for cross validation, evolutionary algorithms, parameter iteration, et al that will take advantage of multiple cores nicely and speed up your processes dramatically.

could you point this out in more detail? why does the freeware version of RM doesn't take advantage of multiple cores? i thought the only difference between the commercial and freeware versions is the support?! because as the opener of this thread i am also thinking about buying a new pc for the purpose of textmining. how much is a commercial single user license of RM by the way?
Logged
keith
Full Member
***
Posts: 159


« Reply #3 on: March 22, 2011, 09:26:01 PM »

At the time I wrote it (November 2009) , that statement was accurate -- the free version didn't support multithreading or parallel processing, but the commercial version did. 

That's changed in recent versions, and it's no longer the case.  As of today, there is no difference in the software product between the community/free edition and the commercial version.

Keith
Logged
wessel
Sr. Member
****
Posts: 366


« Reply #4 on: March 22, 2011, 10:20:49 PM »

I bought a PC with i7 CPU quad core anyways Tongue
Logged
andk
Newbie
*
Posts: 23


« Reply #5 on: March 22, 2011, 10:29:45 PM »

ah ok then i really have to think to get myself a new workstation .... the text processing on my 2ghz 4gb ram macbook is a pain in the a.... sometimes. thanks for the infos.

andre
Logged
Pages: [1]
  Print  
 
Jump to: