Pages: [1]
  Print  
Author Topic: What algorithm does Decision Tree used in Rapidminer?  (Read 1723 times)
johnny5550822
Newbie
*
Posts: 12


« on: March 04, 2014, 11:49:48 PM »

Hi all,

What kind of Decision tree algorithm does Rapidminer used? Does it take care of imbalanced data?

Thanks!
Johnny
Logged
fras
Global Moderator
Jr. Member
*****
Posts: 86


« Reply #1 on: March 05, 2014, 02:08:45 PM »

If you have strongly imbalanced data do not use a decision tree.
In general, exploring your data with a decision tree is a good idea, applying the model
on unseen data not always.
You may preprocess your data by applying the operator "Sample (Bootstrapping)"
but you should switch off preprocessing in the testing step.
For further documentation please refer to the documentation of the decision tree operator.
Logged
johnny5550822
Newbie
*
Posts: 12


« Reply #2 on: March 11, 2014, 02:11:45 AM »

thanks for your reply. Because I know there are algorithm out there which solved imbalance problem (for decision tree), and I am not sure about the version that decision tree is using in rapidminer. Like C4.5 or something else?
Logged
Marius
Administrator
Hero Member
*****
Posts: 1794



WWW
« Reply #3 on: March 11, 2014, 02:01:27 PM »

Hi,

I am not sure which implementation the RapidMiner decision tree is using, I suppose something similar to C4.5. If you want to make sure to use C4.5 you can use W-J48 from the Weka Extension. That operator is a free implementation of C4.5.

Best regards,
Marius
Logged

Please add [SOLVED] to the topic title when your problem has been solved! (do so by editing the first post in the thread and modifying the title)
Please click here before posting.
johnny5550822
Newbie
*
Posts: 12


« Reply #4 on: March 11, 2014, 11:40:29 PM »

Great, thanks a lot!
Logged
fmon
Newbie
*
Posts: 7


« Reply #5 on: June 23, 2014, 08:42:38 AM »

I suppose based on the criterion you use in the parameter setting of decision tree operator ,the RM produces a different tree using a different algorithm like c4.5.
Am I right?
If anyone has any information please share it here.
thanks
Logged
Marius
Administrator
Hero Member
*****
Posts: 1794



WWW
« Reply #6 on: June 23, 2014, 09:16:42 AM »

Hi,

the algorithm stays the same, no matter which criterion you choose. Only in each node the "best" attribute for splitting is selected using  a different method, depending on the parameter setting.

Best regards,
Marius
Logged

Please add [SOLVED] to the topic title when your problem has been solved! (do so by editing the first post in the thread and modifying the title)
Please click here before posting.
fmon
Newbie
*
Posts: 7


« Reply #7 on: June 23, 2014, 12:34:58 PM »

Hi,
Thank you for your helpful answer.
So who knows what is the algorithm used by the operator "Decision Tree" to produce a decision tree?
Logged
Marius
Administrator
Hero Member
*****
Posts: 1794



WWW
« Reply #8 on: June 23, 2014, 12:39:50 PM »

Well, as I said, it's similar to C4.5. In each node the split attribute is chosen by iterating all attributes, finding the best split for each attribute with respect to the splitting criterion, and then using the attribute that maximizes the chosen criterion.

For nominal attributes always one branch for each value is created. For numerical/date attributes always a binary split is performed. To find the best split value all possible values in the training data are tried.

The procedure is repeated until you have pure leaves or one of the pre-pruning conditions is met. Then optionally some post-pruning is applied.

Best regards,
Marius
Logged

Please add [SOLVED] to the topic title when your problem has been solved! (do so by editing the first post in the thread and modifying the title)
Please click here before posting.
fmon
Newbie
*
Posts: 7


« Reply #9 on: June 23, 2014, 02:49:39 PM »

Thanks,
I wanted to make myself sure!
Logged
Pages: [1]
  Print  
 
Jump to: