|
RapidMiner (formerly YALE) and its plugins provide more than 400 operators
for all aspects of Data Mining.
Meta operators automatically optimize the experiment designs and users no
longer need to tune single steps or parameters any longer.
A huge amount of visualization techniques and the possibility to place
breakpoints after each operator give insight into the success of your design
- even online for running experiments.
On this page we discuss the main groups of operators and give operator
examples for each of the groups.
RapidMiner provides operators for:
|
In- and output: flexible operators for data in- and output
in different file formats including
- Known data mining and learning scheme formats (Arff, C4.5, csv ...)
- Sparse file formats (known from SVMlight, mySVM ...)
- Excel files
- SPSS files
- Data sets from databases (Oracle, mySQL, PostgreSQL, Microsoft SQL Server, Sybase ...)
- dBase
- Text files (Word Vector plugin) and Audio files (Value Series Plugin)
- and more
|
|
Machine learning algorithms: more than 100 learning schemes
for regression, classification, and clustering tasks, including:
- Support Vector Machines (SVM, LibSVM, SMO, mySVM ...)
- Decision Tree and Rule Learners (ID3, C4.5, PART, PRISM, RIPPER ...)
- Lazy Learners (Nearest Neighbors, K*, LBR ...)
- Bayesian Learners (Naive Bayes, Bayes Net, AODE ...)
- Logistic Learners (Logistic Regression, SimpleLogistic ...)
- Gaussian Processes
- Meta Learning (AdaBoost, Bagging, Stacking, BayesianBoosting ...)
- Association Rule Mining (Apriori, Tertius ...)
- Clustering (Clustering Plugin: k-Means, k-Medoids, DBscan, SVClustering ...)
- and more
|
|
Weka operators: all learning schemes and attribute evaluators
of the Weka learning environment are also available and can be used like all
other RapidMiner operators
|
|
Data preprocessing: operators which often have to be applied
before the learning process include
- Discretization (Binning, Frequency ...)
- Example and feature filtering (Conditioned, ValueTypeFilter ...)
- Normalization (Interval, Standardization, z-Transformation ...)
- Sampling (Simple, Stratified, ModelBased ...)
- Dimensionality Reduction (PCA, Kernel-PCA, GHA, ICA ...)
- Missing and infinite value replenishment
- Removal of useless features
- and more
|
|
Feature operators:
- Feature Selection (Forward Selection, Backward Elimination,
Genetic Algorithms, WeightGuided ...)
- Feature Weighting and Relevance (ChiSquared, Correlation,
InfoGain, RelieF ...)
- Feature Construction (GGA, YAGGA ...)
- Feature Extraction from time series (Value Series Plugin)
- and more
|
|
Performance evaluation: several validation and
evaluation schemes to estimate the performance of learning or
preprocessing on your dataset, e.g.
- Cross-validation (stratified, shuffled, non-shuffled ...)
- Training and test set splitting (random, fixed ...)
- Leave-one-out
- Significance tests (ANOVA, paired t-Tests ...)
- Large number performance criteria for classification and regression
(absolute, relative, accuracy, precision, recall, kappa, AUC ...)
- and more
|
|
Meta operators: several optimization operators for experiment design, e.g.
- Parameter Optimization (Grid, Quadratic, Evolutionary ...)
- Learning Curves
- Experiment loops and iterations
- and more
|
|
Visualization: logging and presenting results include the visualization of
- Online 1D, 2D and 3D plots of your data and experiment results
(ExperimentLog, DataView, MetaDataView ...)
- Built-in color, histogram, and distribution plots
- Quartile / box plots
- Learned Models (Tree View, ClusterModel graphs ...)
- High-dimensional data (Andrew's Curves, GridViz, Parallel, RadViz, Survey, SOM ...)
- SVM functions (HyperplaneProjection, AttributeFunction ...)
- ROC plots
- Lift Charts
- and more
|
|