it reflects very badly on RM that nobody else flags up the 1-P fallacies
Haddock is right that there are such fallacies concerning the p-values. Some appearing even in RapidMiner. For instance look at the T-test operator:
This operator uses a simple (pairwise) t-test to determine the probability that the null hypothesis is wrong
BTW this type of fallacy may discourage many R users that do Data Mining in R, to use also RapidMiner. Why? Because using stats fallacies in a software's documentation gives a taste of unprofessional. The R users usually have a good level of statistical literacy and observe easily these fallacies. I suggested this aspect (regarding some wrong use of statistical language & concepts in RapidMiner) to be corrected http://rapid-i.com/rapidforum/index.php/topic,5823.0.html
The probability computed by the t-test mentioned above (the so called p-value) is not the probability that the null hypothesis is right or wrong.
According to [Ross, Introductory Statistics, Academic Press, 2010] or any other Stats book the p-value is the probability for the test statistic to be beyond some values (computed using the data sample), assuming the null hypothesis was true. When the test statistic is in the critical region (or equivalently, the p-value is below a threshold called significance level), the null hypothesis is rejected as it is judged to be inconsistent with the data sample.
There is another fallacy in which the expression 1-p appears, where p is a pvalue. Since the expression 1-p appears also in my posts,
Haddock made a wrong/superficial connection between this fallacy and the idea that I had exposed in this topic. It's wrong to put the
label "fallacy" whenever one sees 1-p (just because there exists some fallacy about 1-p). Such confusions are possible when statistical inference and tests are not sufficiently understood (although statistical tests are used in Data Mining - for instance in decision tree algorithms like CHAID and QUEST, etc). My backgrounds in Computer Science and Mathematical Statistics help me to avoid such fallacies: I did not state in my posts that 1-p is the probability that the alternative hypothesis is true, as Haddock suggested. This kind of error is certainly not made by statisticians.
@Haddock I am ready any time to discuss on data mining with you. It is regrettable however that you use such a language when you run out of
data mining arguments. It's good though that you attended a RapidMiner course. Obviously it is hard to make such brief courses comprehensive. If you want more, get one of the books I recommended in the list. In particular this will demonstrate you also how p-values are used in Data Mining.