Pages: [1]
  Print  
Author Topic: Repeating model building for multiple labels  (Read 2712 times)
keith
Full Member
***
Posts: 160


« on: August 27, 2008, 03:15:29 PM »

Hi,

I have a dataset containing about 20 attributes and 6 numerical label variables I want to predict.  I would like to use the same type of modeling process (NearestNeighbor with attribute weights determined by EvolutionaryWeighting, all inside a WrapperXValidation) to predict each label, allowing the attribute weights to be optimized separately for each label.

Ideally, I could iterate through each label to predict, using the same operator structure, rather than writing out 6 slightly different operator chains.  Something like this pseudo-code:

For (predictvar in list_of_predict_vars)
    Set label = predictvar
        Do XVal - EvoWeights - NearestNeighbor model fit
        Save model and performance results for this predictvar
    Go to next predictvar
Generate predictions on original data using all 6 models

I suspect that using macros could get me close to doing this, and there seems to be some related approaches mentioned at http://rapid-i.com/rapidforum/index.php/topic,32.msg47.html and http://rapid-i.com/rapidforum/index.php/topic,35.msg64.html  But I haven't quite figured out how to iterate through a user-defined list of values, and to change the label variable of a dataset using that list.

Any suggestions?

Thanks,
Keith
Logged
Tobias Malbrecht
Global Moderator
Sr. Member
*****
Posts: 293



WWW
« Reply #1 on: August 27, 2008, 03:26:50 PM »

Hi Keith,

the operator MultipleLabelIterator was exactly implemented for that purpose. Simply load your example set, mark the labels as special attributes and give them the appropriate names "label1", ..., "label6". Then put all your model building into the meta operator. When saving the model you may use the macro %{a} in the file name string which captures the number of the current iteration of the outer operator chain.

The application of the model can be analogously done afterwards.

Hope that helps,
Tobias
Logged

Tobias Malbrecht
Director of Product Marketing
RapidMiner
keith
Full Member
***
Posts: 160


« Reply #2 on: August 27, 2008, 04:39:20 PM »

Perfect!  I should have known RM was already prepared to handle the task.  Smiley 

Thanks, Tobias!
Logged
keith
Full Member
***
Posts: 160


« Reply #3 on: August 27, 2008, 07:38:50 PM »

I'm running into a slight problem using the MultipleLabelIterator.  I have renamed the label variables to start with "label_", but I can't change all of them to be of role "label".  I'm using the ChangeAttributeRole operator to change one variable at a time to type "label", but only the last variable so changed is retained.  Any variable that was previously of role "label' gets deleted from the ExampleSet data.

Code:
        <operator name="Change label_var1 to label" class="ChangeAttributeRole" breakpoints="after">
            <parameter key="name" value="label_var1"/>
            <parameter key="target_role" value="label"/>
        </operator>
# this works, changing label_var1 from regular to label

        <operator name="Change label_var2 to label" class="ChangeAttributeRole" breakpoints="after">
            <parameter key="name" value="var2"/>
            <parameter key="target_role" value="label"/>
        </operator>
# this changes label_var2 from regular to label, but it deletes label_var1 from the data!

What am I doing wrong?  Is there a better way to change the role of a group of attributes to label?
Logged
Tobias Malbrecht
Global Moderator
Sr. Member
*****
Posts: 293



WWW
« Reply #4 on: August 27, 2008, 07:50:11 PM »

Hi,

there can only be one special attribute named label at a time. Nevertheless you can mark them as a special attribute label_1, label_2, etc. Without looking it up, I don't knwo whether the MultipleLabelIterator checks for the attribute names or their "special names". But if you both name them that way and mark them as I mentioned above, this should be sufficient.

Regards,
Tobias
Logged

Tobias Malbrecht
Director of Product Marketing
RapidMiner
keith
Full Member
***
Posts: 160


« Reply #5 on: August 28, 2008, 06:46:37 PM »

Thanks, Tobias.  I got it working.  I was confused as to whether the attribute name or its type ("special attribute") was the one that needed to the label prefix.  It's the latter, of course.

Followup question: Is there a way inside the MultipleLabelIterator inner operators to reference the current label attribute name?  The reason is that I need to convert the prediction, expressed in log-odds, back to a probability as I had previously asked about in http://rapid-i.com/rapidforum/index.php/topic,219.msg860.html.

Thus, I need to take the prediction attribute "prediction(y)", and rename it to "pred_y", then transform it by "exp(pred_y)/(1+exp(pred_y))".  Then do the same for prediction(z) -> pred_z -> exp(pred_z)/(1+exp(pred(z))

If I can get the current label attribute within the iterator in a macro variable, then I should be able to automate this process (assuming there are string functions that will allow me to append and/or take substrings of macro vars).

Alternatively, if there's an easier way to accomplish what I described above, I'd be open to that as well.

Thanks, as always.  These forums have been immensely helpful in getting me up and running with RM, and I'm most grateful.

Keith
Logged
Tobias Malbrecht
Global Moderator
Sr. Member
*****
Posts: 293



WWW
« Reply #6 on: August 28, 2008, 07:27:04 PM »

Hi Keith,

Followup question: Is there a way inside the MultipleLabelIterator inner operators to reference the current label attribute name?

as far as I know, there is no way to directly access the complete label name via a macro. But as each iteration operator the MultipleLabelIterator should define a macro which returns the number of the current iteration. Hence, if you name the labels as label_1, label_2, label_3, and so on you can access the label name by using the string label_%{a} in parameters, where %{a} returns the number of the current iteration.

Hope that helps,
Tobias
Logged

Tobias Malbrecht
Director of Product Marketing
RapidMiner
keith
Full Member
***
Posts: 160


« Reply #7 on: August 28, 2008, 08:11:47 PM »

Thanks Tobias.  I was hoping to be able to leave the attribute name unchanged since it's more meaningful than "label_1", but what you suggest does work, and I've implemented that now.

However, the %{a} macro doesn't seem to be able to be used inside the list of calculations in the FeatureGenerator.  For example, I have the following defined inside a MultipleLabelIterator node to apply a model, change the prediction column name to remove parentheses, and then calculate the probability from the predicted log-odds value:

Code:
        <operator name="Generate Predictions" class="ModelApplier">
            <list key="application_parameters">
            </list>
            <parameter key="keep_model" value="true"/>
        </operator>
        <operator name="Select prediction column" class="AttributeSubsetPreprocessing" expanded="yes">
            <parameter key="attribute_name_regex" value="prediction.*"/>
            <parameter key="condition_class" value="attribute_name_filter"/>
            <parameter key="process_special_attributes" value="true"/>
            <operator name="Rename prediction column" class="ChangeAttributeName" breakpoints="before,after">
                <parameter key="new_name" value="predict_%{a}"/>
                <parameter key="old_name" value="prediction(label_%{a})"/>
            </operator>
        </operator>
        <operator name="Calculate Predicted Probability" class="FeatureGeneration" breakpoints="after">
            <list key="functions">
              <parameter key="pred_odds_%{a}" value="exp(predict_%{a})"/>
              <parameter key="pred_plus1_%{a}" value="+(const[1](), pred_odds_%{a})"/>
              <parameter key="pred_prob_%{a}" value="/(pred_ubb2_odds, pred_ubb2_plus1)"/>
            </list>
            <parameter key="keep_all" value="true"/>
        </operator>

The rename of the column works fine ( "prediction(label_1)" gets renamed to "predict_1").  However, the FeatureGeneration node creates new attributes named "pred_odds_%{a}", "pred_plus1_%{a}", and "pred_prob_%{a}", taking the %{a} literally, not as a macro.  Am i doing something wrong, or is RM not set up to work this way?

Sorry to keep pestering you with these questions... but I do appreciate the help.

Keith
Logged
Tobias Malbrecht
Global Moderator
Sr. Member
*****
Posts: 293



WWW
« Reply #8 on: August 28, 2008, 09:13:46 PM »

Hi Keith,

hm, but the function values are at least computed correctly? The problem you experienced might be due to the parameter lists. As far as I remember, macros can not be used in parameter lists. As a workaround, you can generate functions with generic names like pred_odds which you change afterwards to pred_odds_%{a}, again using the ChangeAttributeName operator.

Hope that solves your problem,
Tobias
« Last Edit: August 28, 2008, 09:22:34 PM by Tobias Malbrecht » Logged

Tobias Malbrecht
Director of Product Marketing
RapidMiner
Pages: [1]
  Print  
 
Jump to: