Pages: [1]
 Author Topic: Newbie: help with unsupervised anomaly detection with RapidMiner  (Read 1182 times)
max001
Newbie

Posts: 2

 « on: June 03, 2013, 10:11:10 AM »

Hello,

After I managed to build a project doing data classification, I would like to ask for advise on how to build a project doing "unsupervised anomaly detection".
http://en.wikipedia.org/wiki/Anomaly_detection

I would appreciate a "pointer" to the right model to use, or tutorial on this topic - as a hint.

My problem... (with some simplifications):

I have a temperature sensor, reporting the data (temperature) every minute, for a length of 30 days - my "training data".

I have no idea whether in the history I view, there was any anomaly ("issue") related to the temperature, or when - just the data itself. So, the classification models aren't relevant, at least to my newbie level of understanding...

Then, I have a data for the temperature of the last one hour, reported by a minute.

My goal is to apply a reasonable heuristics, telling me the probability of that "hour" to represent an "anomaly", compared to the training data. Right now, I have some freedom to define "anomaly", but it should reflect real world scenarios like "too high", "too low", "too volatile", "too steady".

At the 2nd stage, I will need to analyze the information based on the days of week (assuming the temperature changes reflect some weekly "trends").

Thanks for any hint,

Max

 Logged
Marius Helf
Administrator
Hero Member

Posts: 1805

 « Reply #1 on: June 11, 2013, 09:08:51 AM »

Hi Max,

you should have a look at the Outlier operators, especially Outlier Detection (LOF). It calculates the Local Outlier Factor for each example, a numeric measure where high values indicate a higher probability for the example of being an outlier.
You can manually create a label which is true for all values above a certain threshold, and false otherwise. If you then create a descriptive model, e.g. a decision tree, which classifies the examples into true or false, you will know why the respective examples are outliers.

Best regards,
Marius
 Logged

Please add [SOLVED] to the topic title when your problem has been solved! (do so by editing the first post in the thread and modifying the title)
Please click here before posting.
max001
Newbie

Posts: 2

 « Reply #2 on: June 11, 2013, 09:54:54 AM »

Thanks a lot,
Max
 Logged
 Pages: [1]
Jump to: