Pages: [1]
  Print  
Author Topic: Decision Tree Model Not Visible  (Read 1538 times)
hgwelec
Newbie
*
Posts: 37


« on: October 01, 2008, 05:32:51 PM »

Hello again,



I am trying to use a decision tree learner for a problem.  If i run the stream with just the input file node and the decision tree learner, the resulting decision tree is shown fine. However when i run the following stream (essentially i perform cross-validation), i cannot see the resulting tree (and hence the resulting model). Here is the setup :

<operator name="Root" class="Process" expanded="yes">
    <operator name="CSVExampleSource" class="CSVExampleSource">
        <parameter key="filename"   value="D:\MyDocumentsr\kvltrain.csv"/>
        <parameter key="label_name"   value="zkvl"/>
    </operator>
    <operator name="FeatureNameFilter" class="FeatureNameFilter">
        <parameter key="skip_features_with_name"   value="(Age|Profession)"/>
    </operator>
    <operator name="XValidation" class="XValidation" expanded="yes">
        <operator name="DecisionTree" class="DecisionTree">
            <parameter key="keep_example_set"   value="true"/>
        </operator>
        <operator name="OperatorChain" class="OperatorChain" expanded="yes">
            <operator name="ModelApplier" class="ModelApplier">
                <list key="application_parameters">
                </list>
            </operator>
            <operator name="ClassificationPerformance" class="ClassificationPerformance">
                <parameter key="absolute_error"   value="true"/>
                <parameter key="accuracy"   value="true"/>
                <list key="class_weights">
                </list>
                <parameter key="classification_error"   value="true"/>
                <parameter key="normalized_absolute_error"   value="true"/>
                <parameter key="root_mean_squared_error"   value="true"/>
                <parameter key="root_relative_squared_error"   value="true"/>
            </operator>
        </operator>
    </operator>
    <operator name="ProcessLog" class="ProcessLog">
        <parameter key="filename"   value="D:\Programs\Rapid-I\rm_workspace\logger.log"/>
        <list key="log">
          <parameter key="accuracy"   value="operator.CSVExampleSource.value.null"/>
        </list>
    </operator>
    <operator name="GnuplotWriter" class="GnuplotWriter">
        <parameter key="additional_parameters"   value="set grid"/>
        <parameter key="name"   value="ProcessLog"/>
        <parameter key="output_file"   value="D:\Programs\Rapid-I\rm_workspace\log.gnu"/>
        <parameter key="values"   value="accuracy"/>
        <parameter key="x_axis"   value="accuracy"/>
    </operator>
</operator>



Any idea as to why this is happening?



Thanks,


Harry
Logged
hgwelec
Newbie
*
Posts: 37


« Reply #1 on: October 02, 2008, 10:15:47 AM »

Ok found out what happened : The Model gets consumed (?) in the first operator of cross validation. However if i save the model first and then read it at the end of the process chain, the decision tree shows fine :


Here is the setup :

<operator name="Root" class="Process" expanded="yes">
    <operator name="CSVExampleSource" class="CSVExampleSource">
        <parameter key="filename"   value="D:\MyDocuments\kvltrain.csv"/>
        <parameter key="label_name"   value="zkvl"/>
    </operator>
    <operator name="FeatureNameFilter" class="FeatureNameFilter">
        <parameter key="skip_features_with_name"   value="(Age|Profession)"/>
    </operator>
    <operator name="XValidation" class="XValidation" expanded="yes">
        <parameter key="number_of_validations"   value="3"/>
        <operator name="OperatorChain (2)" class="OperatorChain" expanded="yes">
            <operator name="DecisionTree" class="DecisionTree">
                <parameter key="keep_example_set"   value="true"/>
            </operator>
            <operator name="ModelWriter" class="ModelWriter">
                <parameter key="model_file"   value="D:\Programs\Rapid-I\rm_workspace\model.mod"/>
            </operator>
        </operator>
        <operator name="OperatorChain" class="OperatorChain" expanded="yes">
            <operator name="ModelApplier" class="ModelApplier">
                <list key="application_parameters">
                </list>
            </operator>
            <operator name="ClassificationPerformance" class="ClassificationPerformance">
                <parameter key="absolute_error"   value="true"/>
                <parameter key="accuracy"   value="true"/>
                <list key="class_weights">
                </list>
                <parameter key="classification_error"   value="true"/>
                <parameter key="normalized_absolute_error"   value="true"/>
                <parameter key="root_mean_squared_error"   value="true"/>
                <parameter key="root_relative_squared_error"   value="true"/>
            </operator>
        </operator>
    </operator>
    <operator name="ProcessLog" class="ProcessLog">
        <parameter key="filename"   value="D:\Programs\Rapid-I\rm_workspace\logger.log"/>
        <list key="log">
          <parameter key="accuracy"   value="operator.CSVExampleSource.value.null"/>
        </list>
    </operator>
    <operator name="GnuplotWriter" class="GnuplotWriter">
        <parameter key="additional_parameters"   value="set grid"/>
        <parameter key="name"   value="ProcessLog"/>
        <parameter key="output_file"   value="D:\Programs\Rapid-I\rm_workspace\log.gnu"/>
        <parameter key="values"   value="accuracy"/>
        <parameter key="x_axis"   value="accuracy"/>
    </operator>
    <operator name="ModelLoader" class="ModelLoader">
        <parameter key="model_file"   value="D:\Programs\Rapid-I\rm_workspace\model.mod"/>
    </operator>
</operator>
Logged
Tobias Malbrecht
Global Moderator
Sr. Member
*****
Posts: 293



WWW
« Reply #2 on: October 02, 2008, 12:49:57 PM »

Hi,

Ok found out what happened : The Model gets consumed (?) in the first operator of cross validation. However if i save the model first and then read it at the end of the process chain, the decision tree shows fine :

you are right in that a decision tree is shown but its probably not the decision tree you want to look at. The thing is, that the XValidation is a kind of loop that repeatedly learns a model (by applying the DecisionTree learner) on a portion of the data and tests its performance on the complementary portion of the data where the actual chosen portion differs from iteration to iteration. Hence, if you save the model inside the XValidation operator you always save a model which is learned only on a portion of the data. Hence, if you want to learn the complete model in addition to the determination of the learning performance you may simply turn on the parameter learn_complete_model in the parameters of the XValidation operator which will then apply the learner once more on the complete set and finally output the resulting model. If you compare the resulting model to the model you wrote out during the cross validation, you will probably observe a difference between them.

Regards,
Tobias

« Last Edit: October 02, 2008, 12:51:48 PM by Tobias Malbrecht » Logged

Tobias Malbrecht
Director of Product Marketing
RapidMiner
hgwelec
Newbie
*
Posts: 37


« Reply #3 on: October 02, 2008, 02:28:01 PM »

Hello Tobias,


First, thanks for your reply. After some experimentation i found out about the learn_complete_model option, right after i sent my first reply in the spirit of "we share our knowledge with the community"  Smiley

Your reply puts things in order....Thanks again!
Logged
Pages: [1]
  Print  
 
Jump to: