This training course is an introduction into knowledge discovery from unstructured data like text documents. It focuses on the necessary preprocessing steps and the most successful methods for automatic text classification (including Naive Bayes and Support Vector Machines, SVM) and text clustering. Many practical exercises for different settings (for example e-mail spam detection, automatical e-mail routing, adaptive personal news filtering, sentiment analysis of text documents like news, web pages, blogs, e-mail, or PDF documents) will enable the participants to transfer the gained knowledge to own text mining problems.
After the training course the participants will have the ability to
- identify the processes for processing unstructured data,
- transform textual data into a structured format,
- apply different statistical text processing methods,
- perform text classification or text clustering,
- work on recent tasks like sentiment analysis or opinion mining.
This training course together with the course “Data Mining / Predictive Analytics with RapidMiner and RapidAnalytics 3” is the recommended preparation for the exam for becoming a certified Rapid-I Expert.
- Course ID: 105
- Number of days: 2 days
- Location: Dortmund, Germany
- Target audience: users, analysts, developers, administrators
- Previous knowledge: foundations of data mining and RapidMiner
- Methods: lectures, discussions, individual and group work, exercises on realistic data.
Participants may introduce own work and project specific questions in order to find particular solutions together with the trainer and other participants. The training course addresses intermediate learners and we recommend to visit the courses “Data Mining / Predictive Analytics with RapidMiner and RapidAnalytics 1” and “Data Mining / Predictive Analytics with RapidMiner and RapidAnalytics 2” before visiting this one.
Topics: The topics include:
- Loading of texts
- Loading from flat files
- Loading from data sets
- Loading from data bases
- Loading from process definitions
- Visualizing documents and tokens
- High-dimensional visualizations for transformed documents
- Handling unstructured data
- Preprocessing of textual data
- Filtering of tokens
- Term frequencies
- Document frequencies
- Advanced modeling
- Methods for high-dimensional data
- Support Vector Machines
- Text classification
- Text clustering
- Web Mining
Crawling the web
Extracting information from web sites
Transforming web sites to documents
Information extraction using XPath or regular expressions
|Number of Participants:
||4 or more
|Price per Participant:
Value added tax (VAT) may have to be added to these prices.