Hi wessel,
I read this, but I did not understand what I actually read.
Ok, I'm glad you asked. I will try to clarify.
This is feature is a feature subset selection algorithm specially designed for Micro array data?
How is the optimal feature subset defined?
As the feature subset that produces the classifier with minimal error on the test set?
The Plugin/Extension does not consist of one single feature selection algorithm or classifier.
It is a small collection of feature selection methods that were published with a focus on microarray data and which were not available in RM (at that time(?)). To try them out on my data, I had to implement them in RM. The list of implemented methods is shown in my previous post. I'll edit it and add some explanation to the operators.
What is wrong with the wrapper approach?
This has more intelligent search?
Microarray data usually has LOTS of feature, 1000s to 100.000s. My particular dataset has >280.000 features. Conducting a Forward Selection with a wrapper and a 10fold cross-validation with, e.g. an SVM for training, for selecting only 10 features you would have to train roughly ~ 10 * 280.000 * 10 = 28.000.000 SVMs. Don't even try to think of backward-elimination as the models have to be built on very highdimensional data.
Yes, you could reduce from 10fold to e.g. 5fold and use NaiveBayes instead of SVM, but it would still take ages. And with the old* RM forward selection you would run out of memory.
*I know the new forward selection improved on this, but I haven't tried yet.
So that's the computational issue. Also, wrapper searches with a a certain learner may overfit to that specific learner.
In microarray literature it mostly boils down to using a statistical test to rank the features and then select the k most important features. But by the time more sophisticated methods have been developed that take feature interactions into account. Examples for such are RFE, MRMR, CFS, CBFS, FCBF. Some of them define an optimal feature set, some (most) just rank the features.
Hope I could shed some light on the topic.
Ben