Pages: [1]
  Print  
Author Topic: Strange result from Naive Bayes classifier  (Read 1668 times)
pupu
Newbie
*
Posts: 7


« on: June 08, 2009, 06:47:34 PM »

Hello,

First of all, thank you so much to contribute this great DM tool ... you, guys, are so great
I'm new to DM .. and try using RM .. I'm trying to use Naive Bayes to predict whether new customer with a particular profile will/will not buy the product. I have set up the model like this ..
Quote
<operator name="Root" class="Process" expanded="yes">
    <operator name="TrainingSet" class="DatabaseExampleSource">
        <parameter key="database_url"   value="jdbc:mysql://localhost:3306/insurance"/>
        <parameter key="username"   value="xxx"/>
        <parameter key="password"   value="xxx"/>
        <parameter key="query"   value="select * from customer;"/>
        <parameter key="label_attribute"   value="CARAVAN"/>
        <parameter key="classes"   value="buy not_buy"/>
    </operator>
    <operator name="NaiveBayes" class="NaiveBayes">
    </operator>
    <operator name="TestSet" class="DatabaseExampleSource">
        <parameter key="database_url"   value="jdbc:mysql://localhost:3306/insurance"/>
        <parameter key="username"   value="xxx"/>
        <parameter key="password"   value="xxx"/>
        <parameter key="query"   value="select * from customer_eval;"/>
        <parameter key="label_attribute"   value="CARAVAN"/>
        <parameter key="classes"   value="buy not_buy"/>
    </operator>
    <operator name="ModelApplier" class="ModelApplier">
        <list key="application_parameters">
        </list>
    </operator>
</operator>
It works without error but in data view field: confidence(buy) and confidence(not_buy) return '?' as a result for each data record ..

Can anybody give me any clues to my error?

Thank you so much
Pupu.

and here is haddock reply

Quote
Hi there,

Firstly welcome to the dataminers' asylum! On your prob what happens if you apply the model on the training set, do you still get a row of ?'s in the prediction columns? Just disable your second database call to check it out. Make sure to tick "keep example set" in the learner. 

? usually represents a missing value, so I'm pondering what got learnt.. The setup looks fine so something murky is going on. I take it you've checked the training set and such.

Logged
pupu
Newbie
*
Posts: 7


« Reply #1 on: June 08, 2009, 06:52:38 PM »

To haddock,

I apply model on the training set .. all values in prediction field are 'not_buy' and confidence(buy)/confidence(not_buy) are '?' for all records
I have checked the training set .. there are no missing value but it is unbalanced like 6% is buy and 94% is not_buy ... is the unbalance matter relevant to my problem?

Thank you very much
Pupu.
Logged
haddock
Hero Member
*****
Posts: 853



WWW
« Reply #2 on: June 08, 2009, 07:05:08 PM »

Hi there,

I think that is probably the cause of your problem, try balancing it up so it is more even. Why not get 50 buy records and 50 no_buy records and do a merge? Hope you get better results, get back if that doesn't do the trick.

Onward, full ahead through the fog...

Logged

Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in information?

T.S.Eliot ~ Choruses from the Rock 1934
Sebastian Land
Administrator
Hero Member
*****
Posts: 2426


« Reply #3 on: June 09, 2009, 07:43:09 AM »

Hi,
NaiveBayes has indeed problems with unbalanced examplesets. But this should not result in unkown confidence values. A more critical question on that issue: How many attributes does your exampleset contain?

Greetings,
  Sebastian
Logged
Tobias Malbrecht
Global Moderator
Sr. Member
*****
Posts: 293



WWW
« Reply #4 on: June 09, 2009, 08:42:39 AM »

Hi,

well, Sebastians question will indeed be essential here. Unfortunately, Naive Bayes did produce unknown confidence values for data sets with a high number of attributes. We have robustified Naive Bayes regarding that issue - but after the release of version 4.4 of the Community Edition. The recent automatically delivered RapidMiner Enterprise Edition update already contains that bugfix. It will of course also be part of the next Community Edition release, which is probably about to come in a couple of weeks.

If you like (and there are no privacy issues) you can send us a data sample and we can check if it works on the most recent developer version. If there should be any privacy issues and you need a solution very urgently, we could also build you a custom version for once. Just drop us a note.

Kind regards,
Tobias

Logged

Tobias Malbrecht
Director of Product Marketing
RapidMiner
pupu
Newbie
*
Posts: 7


« Reply #5 on: June 09, 2009, 12:00:42 PM »

Hi all  Smiley,

Thank you so much for your replies.

To haddock,
I tried what u suggest .. i split the data set to 50 for 'buy' and 50 for 'not_buy' ... Naive bayes still produce '?' for confidence value and prediction result is 50% correct.

To Land,
The data set has 85 attributes ... should I try feature/attribute subset selection before apply Naive Bayes?

To Tobias Malbrecht,
There is no privacy here ... actually it is data set from KDD cup '98 .. How can I send u the dataset?

Best regards,
Pupu.
Logged
pupu
Newbie
*
Posts: 7


« Reply #6 on: June 09, 2009, 12:04:41 PM »

Hi again,

I just forget to tell that your examples are very useful to me ..  Cheesy

Best regards,
Pupu.
Logged
pupu
Newbie
*
Posts: 7


« Reply #7 on: June 09, 2009, 06:23:58 PM »

Hello,

As you mentioned about number of attributes .. i do "select <some fields> from table"
those confidence value are shown now ..  Smiley
I'm finding the way to do something with unbalanced data ..   Angry (Cheers myself)

Thank you so much everyone.

Pupu.
Logged
Pages: [1]
  Print  
 
Jump to: