Pages: [1]
  Print  
Author Topic: Do we need to normalize the Data in outlier detection?  (Read 1163 times)
Anki
Newbie
*
Posts: 39


« on: July 18, 2011, 10:30:35 AM »

Hello All,

For finding the outliers, Do we need to normalize the Data?
Can you please tell me impact without normalizing?
Which methods are best for Normalized data for Outlier detection?
And Which methods are better if I don't want to normalize?

Thank you in advance

Yours
Anki
Logged
Ingo Mierswa
Administrator
Hero Member
*****
Posts: 1210



WWW
« Reply #1 on: July 31, 2011, 12:07:11 PM »

Hi,

Quote
For finding the outliers, Do we need to normalize the Data?

this is certainly recommended, yes. All outlier detection methods depend on distances and / or densities and distance calculations can be really skewed on non-normalized data.


Quote
Can you please tell me impact without normalizing?

If you have a single dimension with a much larger scale than the other dimensions, this single dimension might overrule the others.


Quote
Which methods are best for Normalized data for Outlier detection?

Totally depends, there should probably also be a No-Free-Lunch-Theorem for outlier detection  Wink
In general, I have good experiences with the local outlier factor method.


Quote
And Which methods are better if I don't want to normalize?

Don't do this (see above).

Cheers,
Ingo
Logged

Did you try our new Marketplace? Upload or download new Extensions, add comments, and organize your operators. Have a look at  http://marketplace.rapid-i.com
Anki
Newbie
*
Posts: 39


« Reply #2 on: August 01, 2011, 07:42:06 AM »

Hi Ingo,

Thank you very much.

Yours
Anki
Logged
wessel
Sr. Member
****
Posts: 487


« Reply #3 on: August 07, 2011, 11:13:01 AM »

Agreed with all above.

I would like to add:
Visualization. Finding a way to nicely visualize your data can be time consuming but should be worth it.
You can often spot outliers yourself.

You can also look at misclassifications. The ada-boost algorithm adds weight to instances that are misclassified.
Outliers are often given a very high weight after only a few iterations.

Best regards,

Wessel
Logged
Pages: [1]
  Print  
 
Jump to: