HomeSearchSitemapLegalContact Us
Quick Links
Testimonials

"RapidMiner is an awesome package. Thank you for making such powerful functionality available in such a convenient form."

Michael Van Kleeck, USA
 
Training Seminars

 

Random Image
Hosted by
SourceForge.net Logo
Home arrow Products arrow RapidMiner Community Edition arrow Plugins arrow Data Stream Plugin
Data Stream Plugin
The data stream plugin for RapidMiner (formerly YALE) provides operators for simulating data streams from one or several data sets, for simulating concept drifts, and for handling concept drifts on data streams with simulated or real-world concept drift. Image

 

Features

The key features of the data stream plugin for RapidMiner are:

  • enables data stream mining experiments and prototype application development and testing in RapidMiner
  • integrates with all RapidMiner data mining operators
  • 100% Java implementation
  • very easy to extend
  • operators for simulating data streams from one or several data sets (sources)
  • operators for simulating different types of concept drift for controlled experiments
  • operators for handling concept drift with simulated or real-world concept drift including:
    • full memory: naive strategy always keeping all training examples for training classifiers, i.e. there is no forgetting of old examples
    • no memory: naive strategy only keeping the training examples in the most recent batch and forgetting all older previous examples
    • time windows of fixed size: this strategy uses a time window of configutable fixed length on the data stream for training classifiers and discards examples outside the time window
    • time window of adaptive size: this strategy automatically adjust the length of the time window on the data stream to the current amount of concept drift, so that the expected prediction is minimized.
    • example weighting: local, global, and combined example weighting strategies consider the age of examples and/or their helpfulness in predicting future example labels and assign corresponding weights to the examples during the training process, allowing gradual forgetting and performance-based weighting
    • example selection: this strategy is a special case of example weighting using only weights of zero or one, i.e. select or discard, and allows more flexibel example selection than a time window of adaptive or fixed size, because it allows to reconsider old data, if it becomes helpful again for classifying new instances; this approach selects the examples for training that minimize the expected error rate of the resulting classification model; this stragegy usually outperforms all of the above approaches in terms of accuracy
    • ensemble-based learning: knowledge-based sampling (KBS) on data streams (KBS-stream) is a very efficient and effective concept drift handling strategy using a boosted ensemble of base classifiers and difference modelling to typically achieve a higher accuracy than any other approach; in RapidMiner, this operator is called BayBoostStream (Bayesian Boosting on Data Streams).

 

For theoretical background on these concept drift handling data stream mining methods please refer to the following two publications:

  • Ralf Klinkenberg: Learning Drifting Concepts: Example Selection vs. Example weighting, Intelligent Data Analysis (IDA) Journal, Volume 8, Number 3, 2004, pages 281-300.
  • Martin Scholz and Ralf Klinkenberg: Boosting Classifiers for Drifting Concepts, Intelligent Data Analysis (IDA) Journal, Volume 11, Number 1, March 2007, Special Issue on Knowledge Discovery from Data Streams, pages 3-28.

 

Please note that this plugin currently is in beta status, i.e. it currently is not as well tested and stable as the RapidMiner core.

 

Download and Documentation

The following files are available from the RapidMiner Plugins download page:

TypeFilenameDescription
Plugin rapidminer-datastream-XXX.jar The main plugin as jar file
rapidminer-datastream-XXX-installer.exe The main plugin as windows installer
Source rapidminer-datastream-XXX-src.jar The source code of the plugin
Javadoc rapidminer-datastream-XXX-javadoc.jar The javadoc of the plugin

 
< Prev