Pages: [1]
  Print  
Author Topic: ROC true positve rate remains at 0 for some time before going up. Unusual.  (Read 1227 times)
x2xx7x
Newbie
*
Posts: 6


« on: October 24, 2013, 12:44:53 AM »

Hi everyone,
I'm pretty new to data mining and RapidMiner so take it easy on me  Smiley .
I'm dealing with a binary classification problem where I'm trying to identify people at high risk for a certain condition. 1 = yes    2= no
I'm using various sizes of data (in terms of observations) averaging around 160,000 observation. the data set contains 22 attributes (nominal/polunominal/numerical) and the binominal class label as described above. I'm comparing different classification algorithms for this problem which are listed in the table below. All experiments used a 5-fold cross validation with a binominal classification performance operator to get the results.

THE PROBLEM
The J48 Decision tree from the WEKA extension provides promising results as seen in the provided results table below, however, the AUC does not seem correct (see table below). When looking at the plot of the ROC curve at the bottom left corner of the chart the true positive rate remains at 0 for a little as the false positive rate increases along the x-axis. at about .5 along the x-axis the true positive rate finally increases and eventually goes above the y=x line. This is clearly why the AUC suffers but I do not know why this is happening and this does not occur in any other algorithm. (all data has been prepossessed to remove missing values and under-sampling has been implemented with some additional steps as well.)

If anyone knows why this could be occurring your help would be greatly appreciated, thank you.

                                                           AUC    Sensitivity Specificity F-Measure Accuracy
Logistic Regression    (WEKA LR)          0.715   65.70%   65.28%   65.56%   65.49%
C4.5 Decision Tree (WEKA J48)          0.678   67.99%   63.58%   66.52%   65.78%
Random Forest (WEKA RF)                 0.704   63.89%   65.17%   64.30%   64.53%
Support Vector Machine                        0.710   70.49%   59.87%   66.94%   65.18%
Neural Network                                  0.713   72.25%   57.14%   66.81%   64.70%
Radial Basis Function Network          0.654   62.96%   59.07%   61.67%   61.01%
K-NN                                                  0.500   52.27%   52.71%   52.38%   52.49%
Na´ve Bayes                                          0.689   59.14%   68.41%   62.01%   63.77%

Logged
x2xx7x
Newbie
*
Posts: 6


« Reply #1 on: October 24, 2013, 01:02:52 AM »

Correction - the True positive rate stays at 0 until .05 and the begins to climb
Logged
dan_
Full Member
***
Posts: 114


« Reply #2 on: December 23, 2013, 06:11:35 PM »

Hi,

Normally this  happens when a model assigns the top probabilities for the positive class to some examples that are actually in the negative class.

I used the word "normally" as this would happen in a correct implementation of ROC curves and ROC analysis. However, from my experience, ROC analysis is unreliable in RapidMiner, including in the latest non-free professional version 6. So I would not use whatever is related to ROC curves from RapidMiner in my analyses, even if I would pay $2999+ per year for this software.

See this for some related discussion
http://rapid-i.com/rapidforum/index.php/topic,7502.0.html

Dan

 
« Last Edit: December 23, 2013, 09:23:56 PM by dan_ » Logged
Pages: [1]
  Print  
 
Jump to: