Pages: [1]
Author Topic: Balanced sampling decision trees  (Read 7296 times)
Posts: 1

« on: May 30, 2013, 11:34:50 PM »

Hi everyone,

I'm just starting to use rapidminer and I have a problem with decision trees. I am working with a somewhat large dataset (approximately 500.000 cases). I am trying to use decision trees to predict if users are willing or not to buy a product. The problem is that the buying rate is very low 0.5%. When using stratified sampling with a ratio of 50% with the "sample" operator as pointed out somewhere in a similar thread in the forum, my tree is always biased towards the majority class so the results are totally useless. Is there any way I can balance the outcome variable with a rate of 50-50% do the modeling, and then rebalance the samples to their original rates? I have searched over the forum but trying all the answers and searching over many operators in rapidminer didn't gave me any results.

Thanks a lot in advance!
Marius Helf
Hero Member
Posts: 1811

« Reply #1 on: June 11, 2013, 11:13:24 AM »

The Sample operator can be used to sample the majority class down (i.e. discarding some of the examples) if you use the balance_data option. Then you can specify how many examples of each class you want to use for learning.

Is that sufficient for you?

Best regards,

Please add [SOLVED] to the topic title when your problem has been solved! (do so by editing the first post in the thread and modifying the title)
Please click here before posting.
Pages: [1]
Jump to: