Pages: [1] 2 3 ... 10
 1 
 on: Today at 12:01:38 AM 
Started by DDelen - Last post by DDelen
I want to measure the relative contribution of each input variable to the prediction power/accuracy of a model (any classification or regression model). In some commercial tools like SPSS Modeler this is done automatically by a process so called leave-one-out. In each iteration one input variable is left out of the modeling and the model is tested on holdout sample (or via x-validation), the accuracy is recorded (e.g., variable left out = A, accuracy 82%). This process is repeated for each input variable. At the end you have a list of accuracies for each variable's-absence from the model. The lower the accuracy, the higher the contribution/importance of the variable that is left out. Once done, this accuracies can be converted/inversed into relative importance measures (can also be normalized), and shown using a horizontal bar chart illustrating the relative contribution of all variables.

I tried to do this in RapidMiner 7.0 with Loop Attributes note. It did not work! I could not set it up properly because I am not all that familiar with RapidMiner procedures like loop operators. The short descriptions were not sufficient enough for me to understand and use them properly for this process.

Can anyone create a simple process for a small data set like Golf and Decision Trees and X-Validation for the variable contribution procedure I described, and post it here so that we all can learn/benefit from it?

Thank you.

 2 
 on: February 05, 2016, 01:46:17 PM 
Started by jb1376 - Last post by jb1376
Marco,
Thanks for the fix. Ughh used to filtering by contains with languages instead of regex. Thanks again for your help.

 3 
 on: February 05, 2016, 01:13:15 PM 
Started by mob - Last post by mob
I'm trying to build up a string to use as a file name but I'm having trouble getting the macro to generate correctly
My generate macro expression is
str('Var1') + "_" + str('Var2')  +" _" + str('Var3') +".txt"
where Var1,2,3 are attributes from the exampleset

what I'd like to end up with is a macro value like
Team2_Results_Summer.txt
Team2_Predict_Summer.txt
Team9_Results_Spring.txt

but all I get is a Syntax error even when the function express says the "Expression is syntactically correct" and the process won't continue after the Generate Macro operator

 4 
 on: February 05, 2016, 11:51:03 AM 
Started by inceptorfull - Last post by SvenVanPoucke
Hi,
You can write the results to a csv or other format.
What would be the advantage to use bars for an AUC graph?
Cheers
Sven

 5 
 on: February 05, 2016, 11:47:53 AM 
Started by SvenVanPoucke - Last post by SvenVanPoucke
Per time point a sample generate features having different units. Different time points are compared. Normalization per time point or on the complete dataset, what is the right choice.
Thanks
Sven

 6 
 on: February 05, 2016, 10:59:30 AM 
Started by jb1376 - Last post by Marco Boeck
Hi,

the filter will only accept a file if it matches the full file name. In your case, your regex matches only a part of it. I have modified your process by changing the regex a bit and now it should work:

Code:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="7.1.000-SNAPSHOT">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="loop_files" compatibility="6.4.000" expanded="true" height="82" name="Loop Files" width="90" x="112" y="187">
        <parameter key="directory" value="C:\Users\jb1376\Desktop\test\"/>
        <parameter key="filter" value=".*_201512.*.csv"/>
        <parameter key="recursive" value="true"/>
        <process expanded="true">
          <operator activated="true" class="open_file" compatibility="7.1.000-SNAPSHOT" expanded="true" height="68" name="Open File" width="90" x="514" y="34">
            <parameter key="filename" value="%{file_path}"/>
          </operator>
          <connect from_op="Open File" from_port="file" to_port="out 1"/>
          <portSpacing port="source_file object" spacing="0"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="create_archive_file" compatibility="7.1.000-SNAPSHOT" expanded="true" height="68" name="Create Archive File" width="90" x="246" y="34"/>
      <operator activated="true" class="add_entry_to_archive_file" compatibility="7.1.000-SNAPSHOT" expanded="true" height="103" name="Add Entry to Archive File" width="90" x="447" y="136"/>
      <operator activated="true" class="write_file" compatibility="7.1.000-SNAPSHOT" expanded="true" height="68" name="Write File" width="90" x="648" y="136">
        <parameter key="filename" value="C:\Users\jb1376\Desktop\201512.zip"/>
      </operator>
      <connect from_op="Loop Files" from_port="out 1" to_op="Add Entry to Archive File" to_port="file input 1"/>
      <connect from_op="Create Archive File" from_port="archive file" to_op="Add Entry to Archive File" to_port="archive file"/>
      <connect from_op="Add Entry to Archive File" from_port="archive file" to_op="Write File" to_port="file"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
    </process>
  </operator>
</process>

Regards,
Marco

 7 
 on: February 05, 2016, 10:45:50 AM 
Started by inceptorfull - Last post by inceptorfull
hi all
I want to export the results in tabular form for academic purpose so how to do it?
plus I want to change the AUC graph instead of lines to be bars

 8 
 on: February 05, 2016, 08:21:35 AM 
Started by Karan Kansal - Last post by Karan Kansal

Thanks Martin and Marco for all the help.

 9 
 on: February 05, 2016, 07:49:18 AM 
Started by Karan Kansal - Last post by Karan Kansal

Thanks Marco,

Unchecking the "fail for unknown macros" did the trick. Can it be taken further by  dynamically allowing columns to pass to the child process. For example, in the child process I have to sum two columns and create a new attribute but the column names are unknown. I have the input data in the parent process and need to specify which two columns are to be summed in the child process. Can I do this by somehow referencing the columns and passing it to the child process through Execute process block?


 10 
 on: February 04, 2016, 11:56:16 PM 
Started by aarapidi - Last post by aarapidi
Thank you!

Pages: [1] 2 3 ... 10