Pages: [1]
  Print  
Author Topic: Optimize Parameters  (Read 454 times)
maccten
Newbie
*
Posts: 28


« on: April 29, 2013, 05:34:28 PM »

Hi All

I have a data set I'm trying to find the natural clusters in. I am using KNN to group the clusters and a Bayesian model to classify the cluster labels
The aim being the higher the accuracy of the Bayesian model, the better the clusters are (There will obviously be some manual checking done afterwards)

I embedded all of this information into an Optimize Parameters as well as a log file to tell me the performance of each iteration i.e. For every value of K output the performance of the model
I got the results below from the log file

K Value   Performance
2   0.957987839
3   0.999203892
4   0.997700133
5   0.99876161
6   0.999380805
7   0.998142415
8   0.998407784
9   0.970278638
10   0.996196373

it can be seen from this that the optimal value are 3 or 6. Its possible the Optimize Parameters setting ignored these because of overfitting. It recommended k = 2 with the results below

accuracy: 99.92% +/- 0.02% (mikro: 99.92%)


                             true cluster_0   true cluster_1   class precision
pred.cluster_0              34792                      0           100.00%
pred.cluster_1              75                        55576            99.87%
class recall              99.78%              100.00%   


The accuracy that is shown from the Optimize Parameters is the same as k=3
From the log operator where k = 2 the accuracy is just over 95%

I was wondering if anyone can help me understand why this may be the case?

Thanks
Logged
Marius
Administrator
Hero Member
*****
Posts: 1794



WWW
« Reply #1 on: May 08, 2013, 01:16:28 PM »

As always, the xml of your process would be helpful. Since I don't have it, I can only guess: which performance are you logging? Your log operator should be configured to log the performance of the X-Validation operator. That will give you the accuracy estimated after application of the complete cross validation steps. If you log operator points to the Performance operator, you will only see the performance of the very last iteration of the X-Validation, i.e. the 10th step. That is not what you want to see and may explain the differences.

Just as a sidenote your accuracies are *very* high, and very high accuracies are always suspicious (unless you have really well-separated clusters). You may want to have another look at your process setup to see that everything is setup correctly.

Best regards,
Marius
Logged

Please add [SOLVED] to the topic title when your problem has been solved! (do so by editing the first post in the thread and modifying the title)
Please click here before posting.
Pages: [1]
  Print  
 
Jump to: