Pages: [1]
 Author Topic: Extract Coefficients from linear regression model  (Read 1971 times)
MuehliMan
Jr. Member

Posts: 85

 « on: August 04, 2010, 10:47:37 PM »

Here is another problem I would like to share with the RapidMiner Community.

I have generated a Linear Regression model (imported from the stored .mod file.).

y = a*x + b*y + c*z +...

a, b, c ... coefficients of the Linear Regression Modell
x, y, z ... attribute values

I would like to multiply the coefficients from the wgt file with the corresponding attribute to analyse which attribute contributes to the deviation of the prediction.

Desired: a*x, b*y, c*z

I hope I could explain the problem somehow.

Cheers,
Markus
 Logged
Sebastian Land
Hero Member

Posts: 2426

 « Reply #1 on: August 05, 2010, 08:23:15 AM »

Hi Markus,
I'm curious: How do you want to calculate the deviation for one single attribute?

Anyway I think you will have to incorporate the script operator to get access to the single components of the formular. The linear regression model is a FormularProvider class, this should give you a good interface to retrieve the coefficients.
If you convert them to a AttributeWeight object, you can apply this to do the multiplication for the whole example set. Might be this is of help depending on how you will answer my question above

Greetings,
Sebastian
 Logged

Old World Computing - Expert Consulting and Training for RapidMiner
www.oldworldcomputing.com
MuehliMan
Jr. Member

Posts: 85

 « Reply #2 on: August 10, 2010, 04:24:41 PM »

Hey Sebastian,

Maybe I am just working in circles without gaining any information (it would not be the first time...)

I was thinking of a way to visualize where deviations come from. So I sort my examples according to label-prediction in decreasing order. Thus I know which examples are predicted badly. But of course I don't know the attributes that are responsible for that. Thus I want to see which attributes do contribute what to the final prediction.

And here the coefficients enter the game. It would make no sense to focus on attributes that have a very small coefficient. So those attributes can be wrong. But I want to get rid of the attributes that have a large coefficient but still show nearly no correlation

Cheers,
Markus

 « Last Edit: August 11, 2010, 07:47:35 AM by MuehliMan » Logged
Sebastian Land
Hero Member

Posts: 2426

 « Reply #3 on: August 23, 2010, 08:56:50 AM »

Hi Markus,
do you have normalized your data? Since if you don't do, the coefficients might be small although the contribution is great depending on the scale of the attribute.
Then let me add, that you are trying to model a shrinkage algorithm, that is already included in the linear regression itself: There is a parameter that will assign costs on high coefficients. Thus high coefficients on unimportant attributes are surpressed. Take a look at the parameter "ridge".

Another way to look at the things is the attribute selection which might become handy in your case.

Greetings,
Sebastian
 Logged

Old World Computing - Expert Consulting and Training for RapidMiner
www.oldworldcomputing.com
 Pages: [1]