Pages: [1]
Author Topic: Chart combining data and model  (Read 811 times)
Jr. Member
Posts: 62

« on: January 18, 2013, 02:03:56 AM »


    I think I have an idea how models and data can be visualized together (generate a new attribute with the model and add it as one of the series), but it requires multiple steps, and setting the proper typesetting for the model values can be laborious (and it can be even unsatisfactory, as the points generating the graph might not be dense enough on all parts, sometimes the result is just a predicted (nominal) label, while we would like to see a separator line).
    I guess not all models can be visualized this way, but I guess most of them can be. When kernel methods are used I think this would be also helpful to understand some of the consequences, help to adjust the parameters, select the proper functions.
What do you think? Would this be useful? Or is it already available?
Thanks, gabor
Hero Member
Posts: 1794

« Reply #1 on: January 18, 2013, 10:37:56 AM »

Hi Gabor,

I am not sure if I understand your proposal correctly. Do you mean something like the graphics on page 15/16 of this book?

That is already possible (more or less): use Generate Data to create a large dataset (say 1000 examples) with the same features as your true input data. Then apply the model, and use the Advanced Charts to create a chart with the predicted label on the color dimension and the true label as shape. That will create similar graphics.

If I did not understand your request correctly or if you have questions about what I have written, please let me know.

Best regards,

Please add [SOLVED] to the topic title when your problem has been solved! (do so by editing the first post in the thread and modifying the title)
Please click here before posting.
Jr. Member
Posts: 62

« Reply #2 on: January 18, 2013, 01:27:14 PM »

Hi Marius,

   Yes, I meant something like those figures. I tried to describe a similar idea as yours, although I think it is not so easy/trivial to perform:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
  <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
    <process expanded="true" height="695" width="840">
      <operator activated="true" class="linear_regression" compatibility="5.2.008" expanded="true" height="94" name="Linear Regression" width="90" x="45" y="30">
        <parameter key="feature_selection" value="none"/>
        <parameter key="eliminate_colinear_features" value="false"/>
      <operator activated="true" class="generate_data" compatibility="5.2.008" expanded="true" height="60" name="Generate Data" width="90" x="180" y="30">
        <parameter key="target_function" value="grid function"/>
        <parameter key="number_examples" value="576"/>
        <parameter key="number_of_attributes" value="2"/>
        <parameter key="attributes_lower_bound" value="-1.3"/>
        <parameter key="attributes_upper_bound" value="1.0"/>
      <operator activated="true" class="generate_attributes" compatibility="5.2.008" expanded="true" height="76" name="Generate Attributes" width="90" x="315" y="30">
        <list key="function_descriptions">
          <parameter key="att1" value="(att1 + 10) / 20 * 2.3 - 1.3"/>
          <parameter key="att2" value="(att2 + 10) / 20 * 1.4 - 0.2"/>
      <operator activated="true" class="apply_model" compatibility="5.2.008" expanded="true" height="76" name="Apply Model" width="90" x="450" y="30">
        <list key="application_parameters"/>
      <operator activated="true" class="select_attributes" compatibility="5.2.008" expanded="true" height="76" name="Select Attributes" width="90" x="585" y="30">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="prediction(label)|att1|att2|"/>
        <parameter key="include_special_attributes" value="true"/>
      <operator activated="true" class="union" compatibility="5.2.008" expanded="true" height="76" name="Union" width="90" x="720" y="30"/>
      <connect from_port="input 1" to_op="Linear Regression" to_port="training set"/>
      <connect from_op="Linear Regression" from_port="model" to_op="Apply Model" to_port="model"/>
      <connect from_op="Linear Regression" from_port="exampleSet" to_op="Union" to_port="example set 2"/>
      <connect from_op="Linear Regression" from_port="weights" to_port="result 1"/>
      <connect from_op="Generate Data" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
      <connect from_op="Generate Attributes" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Apply Model" from_port="labelled data" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Apply Model" from_port="model" to_port="result 2"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Union" to_port="example set 1"/>
      <connect from_op="Union" from_port="union" to_port="result 3"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="source_input 2" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <portSpacing port="sink_result 4" spacing="0"/>
And the result is not as nice as it is in those figures. (A line with the boundaries would be possible if the data and the model could be combined and plotted in one figure.)
I guess this is not so important feature request, as similar thing can be done for some of the models in R for example, but it would be nice if this were possible within RapidMiner too imho.
Pages: [1]
Jump to: