Welcome,
Guest
. Please
login
or
register
.
Did you miss your
activation email?
Home
Help
Search
Login
Register
Rapid-I
Rapid-I Forum
»
General Community
»
Data Mining
»
column-1's value is based on column-2 value; should we remove one?
Pages: [
1
]
« previous
next »
Print
Author
Topic: column-1's value is based on column-2 value; should we remove one? (Read 1228 times)
ruser
Newbie
Posts: 40
column-1's value is based on column-2 value; should we remove one?
«
on:
July 08, 2009, 02:14:12 PM »
The Database Table which I have taken for the Clustering purpose, has some clumns which are calculated based on the values in the other columns.
E.g.
col-M col-N=(5*col-M) col-R col-S col-T
--------------------------------------------------------------
x 5x a c a-c
y 5y b d b-d
In such cases, is it better to remove the redundant columns(apart from the ones which will be helpful to interpret the clustering results)?
Logged
steffen
Sr. Member
Posts: 374
Re: column-1's value is based on column-2 value; should we remove one?
«
Reply #1 on:
July 10, 2009, 11:30:55 AM »
Hello
Here is what I am thinking:
Removing "redundant" columns is not that easy:
To refer to your example
R S T(=R-S)
1 1 0
2 2 0
The distance using euclidean metric constrained to R and S is Squareroot(2), the distance constrained to T is 0. So be careful when you are removing columns...
On the other side: Assuming that the created columns are necessary, keeping even the redundant columns will at worst increase the absolute distance between to items.
regards,
Steffen
PS: I cannot hold back to remark, that the results (in comparsion by just using all original columns) may change if your additional columns have been calculated on wicked (e.g. nonlinear) transformations which your metric cannot cope with. But I guess you are aware of that.
Logged
"I want to make computers do what I mean instead of what I say"
Read The Fantastic Manual
Pages: [
1
]
Print
« previous
next »
Jump to:
Please select a destination:
-----------------------------
General Community
-----------------------------
=> News and Updates
=> Data Mining
=> Chit Chat
-----------------------------
RapidMiner
-----------------------------
=> Getting Started
=> Data Mining / ETL / BI Processes
=> Problems and Support
=> Feature Requests
=> Development
-----------------------------
RapidAnalytics
-----------------------------
=> Getting Started
=> Applications and Integration
-----------------------------
RapidNet
-----------------------------
=> Getting Started
=> Problems and Support
Loading...