whoa, but that's quite a difference: in SPSS all models are actually tested (which can also be done with the PaREn extension during the evaluation step but is also possible with a simple process for core RapidMiner as Simon has pointed out).
The cool thing about the PaREn extension is that it predicts which model is probably the best even without any testing. This is the first time I have actually see this meta learning approach really working and this is probably the reason why we at Rapid-I and many others love it. Kudos to the Christian and the team of the DFKI for this great extension!
@ Ingo: Note that SPSS Modeler has less, but very carefully chosen and highly optimised algorithms (there where it is possible - example: take C5 as opposed to C4.5 implemented in open source software). Therefore one affords to create models for most classification algorithms available in SPSS and to retain the best ones, in a reasonable amount of time.
Factually speaking (by the way as a fan of both software - RM and SPSS Modeler), there are obviously similarities and differences in the features we discuss about, and I am afraid that the differences show for now that SPSS Modeler is incomparably much ahead: time of running to build the best models, reliability and performance of models (see my previous posting above regarding unexpectedly suboptimal optimised PaREn models), the combination of best models in an overall model to use, etc. On the other hand, the estimated accuracies in PaREn were quite far from the actual accuracies in most of my experiments, but the idea is interesting.
@ Christian et al. : I would have an additional suggestion to which I had thought when posting questions earlier in this topic. ROC Analysis can be added to searching the model giving the best accuracy when the output/label attribute is binominal. More precisely, after finding the best parameters for a learner, given a dataset, one can get also the optimised threshold from a ROC curve (as opposed to using the default threshold 0.5), which guarantees the best accuracy.
However, perhaps this suggestion may be useful to consider after the ROC Analysis implemented in Rapid Miner would be revised as it is still unreliable in this package (i.e. AUC calculation needs corrections, as I have shown on the forum
http://rapid-i.com/rapidforum/index.php?PHPSESSID=18d6261d2d63b2ca946477f03c2552bc&topic=2237.0, and Find Threshold operator does not find the best threshold as expected but provides suboptimal solutions - I emailed a complete report to the RM development team, with relevant processes illustrating this).
PaREn is an excellent initiative towards RM's enrichment. However the extension needs to be more practical and more accurate. Indeed, it requires relatively much processing time and models are not as optimised as expected - see postings in this thread, where it is explained that both - an ad hoc model created with no particular setting, and a trivial model that picks up blindly the most frequent class as prediction - are better in accuracy than the optimised, time consuming to build PaREn model. Improvement would be very beneficial and necessary indeed for the extension. Other users of the extension may wish to generate ad hoc models in addition to the PaREn models, and to compare their accuracies - this would be a useful feedback to the development team.
I hope the feedback and suggestions in this thread help and would be useful to PaREn, as part of the community's contribution to improve the open source software. Good luck!
Regards,
Dan