Pages: [1] 2 3 ... 10
 1 
 on: Today at 09:29:43 AM 
Started by Muhammad - Last post by mschmitz
Hello Muhammad,

I've created an example process with the iris data set where i learn on two classes and assign the "unsure" predictions (between 0.3 and 0.7) to the third

Code:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="6.1.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="6.1.000" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="6.1.000" expanded="true" height="60" name="Retrieve Iris" width="90" x="45" y="30">
        <parameter key="repository_entry" value="//Samples/data/Iris"/>
      </operator>
      <operator activated="true" class="x_validation" compatibility="5.0.000" expanded="true" height="112" name="Validation" width="90" x="246" y="30">
        <description>A cross-validation evaluating a decision tree model.</description>
        <process expanded="true">
          <operator activated="true" class="filter_examples" compatibility="6.1.000" expanded="true" height="94" name="Filter Examples" width="90" x="45" y="30">
            <list key="filters_list">
              <parameter key="filters_entry_key" value="label.does_not_equal.Iris-versicolor"/>
            </list>
          </operator>
          <operator activated="true" class="random_forest" compatibility="6.1.000" expanded="true" height="76" name="Random Forest" width="90" x="179" y="30">
            <parameter key="number_of_trees" value="25"/>
          </operator>
          <connect from_port="training" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_op="Random Forest" to_port="training set"/>
          <connect from_op="Random Forest" from_port="model" to_port="model"/>
          <portSpacing port="source_training" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true">
          <operator activated="true" class="apply_model" compatibility="5.0.000" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="rename_by_replacing" compatibility="6.1.000" expanded="true" height="76" name="Rename by Replacing" width="90" x="179" y="165">
            <parameter key="include_special_attributes" value="true"/>
            <parameter key="replace_what" value="\(|\)|-"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="6.1.000" expanded="true" height="76" name="Generate Attributes" width="90" x="313" y="165">
            <list key="function_descriptions">
              <parameter key="predictionlabel" value="if((confidenceIrissetosa &gt; 0.2 &amp;&amp; confidenceIrissetosa &lt;0.8),&quot;Iris-versicolor&quot;,predictionlabel)"/>
            </list>
          </operator>
          <operator activated="true" class="rename_by_replacing" compatibility="6.1.000" expanded="true" height="76" name="Rename by Replacing (2)" width="90" x="447" y="165">
            <parameter key="include_special_attributes" value="true"/>
            <parameter key="replace_what" value="predictionlabel"/>
            <parameter key="replace_by" value="prediction(label)"/>
          </operator>
          <operator activated="true" class="performance" compatibility="5.0.000" expanded="true" height="76" name="Performance" width="90" x="581" y="30"/>
          <connect from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Rename by Replacing" to_port="example set input"/>
          <connect from_op="Rename by Replacing" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Rename by Replacing (2)" to_port="example set input"/>
          <connect from_op="Rename by Replacing (2)" from_port="example set output" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_averagable 1" spacing="0"/>
          <portSpacing port="sink_averagable 2" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Retrieve Iris" from_port="output" to_op="Validation" to_port="training"/>
      <connect from_op="Validation" from_port="averagable 1" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>


This works for me quite well. I hope you can use this as a template


Best,

Martin

 2 
 on: Today at 09:16:03 AM 
Started by scepxko - Last post by scepxko
Hi Andrew,

First I own your book. I wish I had it 3 years ago when I started to play with Rapidminer. Thank you for your work

As for your suggestion, yes, RM6 has this more advanced filtering option.
I just tried it with "contains" and "matches" and I don't get the result wished.

In RM5, I introduced a new attribute "ToRemove" that compares the values of "location" and "location2", and then I filtered the examples with the ToRemove values of true/false.
That's a method I saw in other posts.

I tried "finds", "matches", and "if".
Here are my results:

** Filters only the exact match (see my first post, it would remove the "USA" row, but not the "Cayman" row)

finds(location,location2)
finds(location,%{location2})
matches(location,location2)
matches(location,%{location2})
matches(location,".*%{location2}.*")
if((location==location2), true, false)

** Filters nothing at all (input and output files identical)

finds(location,"%{location2}")
finds(location,".*%{location2}.*")
finds(location,"%{location2}")
finds(location,".*%{location2}.*")
if((location==".*location2.*"), true, false)
if((%{location}==".*%{location2}.*"), true, false)
if((%{location}=="(.*?)%{location2}(.*?)"), true, false)

I am surely doing something wrong with the wildcards but I had really no idea what it is.

Any idea?
Thank you in advance

Code:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.015">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.3.015" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="read_excel" compatibility="5.3.015" expanded="true" height="60" name="Read Excel (3)" width="90" x="45" y="30">
        <parameter key="excel_file" value="E:\Rapidminer\example\input.xls"/>
        <parameter key="imported_cell_range" value="A1:G650"/>
        <parameter key="first_row_as_names" value="false"/>
        <list key="annotations">
          <parameter key="0" value="Name"/>
        </list>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="ID.true.integer.attribute"/>
          <parameter key="1" value="industry.true.polynominal.attribute"/>
          <parameter key="2" value="location.true.polynominal.attribute"/>
          <parameter key="3" value="title.true.polynominal.attribute"/>
        </list>
      </operator>
      <operator activated="true" class="read_excel" compatibility="5.3.015" expanded="true" height="60" name="Read Excel (2)" width="90" x="45" y="165">
        <parameter key="excel_file" value="E:\Rapidminer\example\location-to-filter-out.xls"/>
        <parameter key="imported_cell_range" value="A1:E67"/>
        <parameter key="first_row_as_names" value="false"/>
        <list key="annotations">
          <parameter key="0" value="Name"/>
        </list>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="location2.true.polynominal.attribute"/>
        </list>
      </operator>
      <operator activated="true" class="join" compatibility="5.3.015" expanded="true" height="76" name="Join" width="90" x="246" y="165">
        <parameter key="join_type" value="left"/>
        <parameter key="use_id_attribute_as_key" value="false"/>
        <list key="key_attributes">
          <parameter key="location" value="location2"/>
        </list>
        <parameter key="keep_both_join_attributes" value="true"/>
      </operator>
      <operator activated="true" class="generate_attributes" compatibility="5.3.015" expanded="true" height="76" name="Generate Attributes" width="90" x="380" y="165">
        <list key="function_descriptions">
          <parameter key="toRemove" value="finds(location,location2)"/>
        </list>
      </operator>
      <operator activated="true" class="filter_examples" compatibility="5.3.015" expanded="true" height="76" name="Filter Examples (2)" width="90" x="514" y="165">
        <parameter key="condition_class" value="attribute_value_filter"/>
        <parameter key="parameter_string" value="toRemove != true"/>
      </operator>
      <operator activated="true" class="write_excel" compatibility="5.3.015" expanded="true" height="76" name="Write Excel" width="90" x="849" y="165">
        <parameter key="excel_file" value="E:\Rapidminer\example\output.xls"/>
      </operator>
      <connect from_op="Read Excel (3)" from_port="output" to_op="Join" to_port="left"/>
      <connect from_op="Read Excel (2)" from_port="output" to_op="Join" to_port="right"/>
      <connect from_op="Join" from_port="join" to_op="Generate Attributes" to_port="example set input"/>
      <connect from_op="Generate Attributes" from_port="example set output" to_op="Filter Examples (2)" to_port="example set input"/>
      <connect from_op="Filter Examples (2)" from_port="example set output" to_op="Write Excel" to_port="input"/>
      <connect from_op="Write Excel" from_port="through" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>



 3 
 on: Today at 09:10:58 AM 
Started by timc03 - Last post by mschmitz
Hello timc03!

First regarding the centeroids. If you take a look at the model itself, it has an "centeroid table" tab. There you can find your centeroids.

Furthermore there is a way to display the "boarders" of the cluster. Therfore you apply the clustering on random values in a given range. The result is the picture below:



I modified marco's process a bit so it creates this picture and connected the model:

Code:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="6.1.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="read_csv" compatibility="6.1.000" expanded="true" height="60" name="Read CSV" width="90" x="45" y="30">
        <parameter key="csv_file" value="C:\Users\Martin\Downloads\mouse.csv"/>
        <parameter key="column_separators" value="\s"/>
        <parameter key="skip_comments" value="true"/>
        <parameter key="first_row_as_names" value="false"/>
        <list key="annotations"/>
        <parameter key="encoding" value="UTF-8"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="att1.true.real.attribute"/>
          <parameter key="1" value="att2.true.real.attribute"/>
          <parameter key="2" value="att3.true.polynominal.label"/>
        </list>
      </operator>
      <operator activated="true" class="k_means" compatibility="6.1.000" expanded="true" height="76" name="Clustering" width="90" x="380" y="30">
        <parameter key="k" value="3"/>
      </operator>
      <operator activated="true" class="generate_data" compatibility="6.1.000" expanded="true" height="60" name="Generate Data" width="90" x="514" y="255">
        <parameter key="number_examples" value="10000"/>
        <parameter key="attributes_lower_bound" value="0.0"/>
        <parameter key="attributes_upper_bound" value="1.0"/>
      </operator>
      <operator activated="true" class="multiply" compatibility="6.1.000" expanded="true" height="94" name="Multiply" width="90" x="514" y="120"/>
      <operator activated="true" class="apply_model" compatibility="6.1.000" expanded="true" height="76" name="Apply Model" width="90" x="715" y="165">
        <list key="application_parameters"/>
      </operator>
      <connect from_op="Read CSV" from_port="output" to_op="Clustering" to_port="example set"/>
      <connect from_op="Clustering" from_port="cluster model" to_op="Multiply" to_port="input"/>
      <connect from_op="Clustering" from_port="clustered set" to_port="result 1"/>
      <connect from_op="Generate Data" from_port="output" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Multiply" from_port="output 1" to_port="result 2"/>
      <connect from_op="Multiply" from_port="output 2" to_op="Apply Model" to_port="model"/>
      <connect from_op="Apply Model" from_port="labelled data" to_port="result 3"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <portSpacing port="sink_result 4" spacing="0"/>
    </process>
  </operator>
</process>

 4 
 on: Today at 08:37:38 AM 
Started by timc03 - Last post by Marco Boeck
Hi,

I used the following process to import the mouse data taken from here: http://elki.dbs.ifi.lmu.de/wiki/DataSets

Code:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="6.1.001-SNAPSHOT">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="read_csv" compatibility="6.1.001-SNAPSHOT" expanded="true" height="60" name="Read CSV" width="90" x="45" y="30">
        <parameter key="csv_file" value="C:\Users\boeck\Desktop\mouse.csv"/>
        <parameter key="column_separators" value="\s"/>
        <parameter key="skip_comments" value="true"/>
        <parameter key="first_row_as_names" value="false"/>
        <list key="annotations"/>
        <parameter key="encoding" value="UTF-8"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="att1.true.real.attribute"/>
          <parameter key="1" value="att2.true.real.attribute"/>
          <parameter key="2" value="att3.true.polynominal.label"/>
        </list>
      </operator>
      <operator activated="true" class="k_means" compatibility="6.1.001-SNAPSHOT" expanded="true" height="76" name="Clustering" width="90" x="179" y="30">
        <parameter key="k" value="3"/>
      </operator>
      <connect from_op="Read CSV" from_port="output" to_op="Clustering" to_port="example set"/>
      <connect from_op="Clustering" from_port="cluster model" to_port="result 1"/>
      <connect from_op="Clustering" from_port="clustered set" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

You can then simply use the Chart tab of the results to visualize this.



I'm not sure regarding your bonus question, I don't think there is an explicit option to see that, but I may be wrong there.

Regards,
Marco

 5 
 on: Today at 06:18:00 AM 
Started by timc03 - Last post by timc03
I am running a k means clustering in v6.0.008.

I am looking to visualise the results of the clustering as shown here (k means clustering graph): http://en.wikipedia.org/wiki/K-means_clustering#mediaviewer/File:ClusterAnalysis_Mouse.svg

Any suggestions on how to achieve this? I would be happy to use PCA before K Means clustering if that helps.

Also, as an aside, where is the 'cluster centroid' or the mean for each cluster? I have the centroids for each attribute in each cluster in the Cluster Model - cetroid table, but cannot find the cluster mean.

Thanks

 6 
 on: Today at 04:38:25 AM 
Started by tejaswi - Last post by tejaswi
Hi,
   
     I would like to get your help on finding the REPOSITORIES OF TRANSACTION DATA SET carrying it in ASSOCIATION RULE MINING using FP-Growth Algo. .

   Please, let me know the resources as soon as possible. 

 7 
 on: November 25, 2014, 10:55:01 PM 
Started by scepxko - Last post by awchisholm
Version 6 of RapidMiner, the Filter Examples operator lets you filter based on partial matches.

Andrew

 8 
 on: November 25, 2014, 10:28:13 AM 
Started by scepxko - Last post by scepxko
Thank you, it helped to go further.

So here with the updated code below, if I have an exact match between the attribute "location" and the blacklisted attribute "location2", the example is removed.

Problem: it only works with an exact match.

I would like to filter the attribute "location" using a wildcard, like this %{location}=(.*?)%{location2}(.*?)
If I put "India" in the blacklist, it would remove the example with "Gurgaon, India", "Delhi, India" etc.

Any idea?
Thanks again


Code:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.015">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.3.015" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="read_excel" compatibility="5.3.015" expanded="true" height="60" name="Read Excel (3)" width="90" x="45" y="30">
        <parameter key="excel_file" value="E:\Rapidminer\example\input.xls"/>
        <parameter key="imported_cell_range" value="A1:G650"/>
        <parameter key="first_row_as_names" value="false"/>
        <list key="annotations">
          <parameter key="0" value="Name"/>
        </list>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="ID.true.integer.attribute"/>
          <parameter key="1" value="industry.true.polynominal.attribute"/>
          <parameter key="2" value="location.true.polynominal.attribute"/>
          <parameter key="3" value="title.true.polynominal.attribute"/>
        </list>
      </operator>
      <operator activated="true" class="read_excel" compatibility="5.3.015" expanded="true" height="60" name="Read Excel (2)" width="90" x="45" y="165">
        <parameter key="excel_file" value="E:\Rapidminer\example\location-to-filter-out.xls"/>
        <parameter key="imported_cell_range" value="A1:E67"/>
        <parameter key="first_row_as_names" value="false"/>
        <list key="annotations">
          <parameter key="0" value="Name"/>
        </list>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="location2.true.polynominal.attribute"/>
        </list>
      </operator>
      <operator activated="true" class="join" compatibility="5.3.015" expanded="true" height="76" name="Join" width="90" x="246" y="165">
        <parameter key="join_type" value="left"/>
        <parameter key="use_id_attribute_as_key" value="false"/>
        <list key="key_attributes">
          <parameter key="location" value="location2"/>
        </list>
        <parameter key="keep_both_join_attributes" value="true"/>
      </operator>
      <operator activated="true" class="filter_examples" compatibility="5.3.015" expanded="true" height="76" name="Filter Examples (2)" width="90" x="447" y="165">
        <parameter key="condition_class" value="no_missing_attributes"/>
        <parameter key="parameter_string" value="location=(.*?)location2(.*?)"/>
        <parameter key="invert_filter" value="true"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="5.3.015" expanded="true" height="76" name="Select Attributes" width="90" x="648" y="165">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="|ID|industry|location|title"/>
      </operator>
      <operator activated="true" class="write_excel" compatibility="5.3.015" expanded="true" height="76" name="Write Excel" width="90" x="849" y="165">
        <parameter key="excel_file" value="E:\Rapidminer\example\output.xls"/>
      </operator>
      <connect from_op="Read Excel (3)" from_port="output" to_op="Join" to_port="left"/>
      <connect from_op="Read Excel (2)" from_port="output" to_op="Join" to_port="right"/>
      <connect from_op="Join" from_port="join" to_op="Filter Examples (2)" to_port="example set input"/>
      <connect from_op="Filter Examples (2)" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Write Excel" to_port="input"/>
      <connect from_op="Write Excel" from_port="through" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>




 9 
 on: November 25, 2014, 06:23:30 AM 
Started by dolcos - Last post by dolcos
I have installed R extension on Rapidminer Studio 6.1.0 on my MacOSX Retina Display with Yosemite 10.10.1.

My R version is 3.1.1.

But I could no see any perspective of R, and when I try to run R script I get this is the error:

"Process failed.

Could no initiate session with native R. Try using server. Reason: Could not initialize R via JRI. Reason: Unable to initialize R."

I have rJava and JavaGD working fine.

My bash profile:
export R_HOME=/Library/Frameworks/R.framework/Resources
export JAVA_HOME=$(/usr/libexec/java_home)

java_home: /Library/Java/JavaVirtualMachines/jdk1.8.0_11.jdk/Contents/Home

jri: /Library/Frameworks/R.framework/Versions/3.1/Resources/library/rJava/jri




 10 
 on: November 25, 2014, 01:27:13 AM 
Started by wessel - Last post by awchisholm
You could use Recall and Remember inside the loop and have nothing explicitly output by the loop itself. After the loop do a final Recall. This has the side effect that much less memory is used.

Pages: [1] 2 3 ... 10