Pages: [1] 2 3 ... 10
 1 
 on: Today at 08:35:19 PM 
Started by startx25 - Last post by awchisholm
Hello

Is this what you need?

Code:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.008">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="text:create_document" compatibility="5.3.000" expanded="true" height="60" name="Create Document (2)" width="90" x="45" y="75">
        <parameter key="text" value="binominal parameter&#10;binominal attributes&#10;Binominal operator&#10;"/>
      </operator>
      <operator activated="true" class="text:create_document" compatibility="5.3.000" expanded="true" height="60" name="Create Document (3)" width="90" x="45" y="165">
        <parameter key="text" value="at this&#10;had the&#10;of the&#10;"/>
      </operator>
      <operator activated="true" class="collect" compatibility="5.3.008" expanded="true" height="94" name="Collect" width="90" x="179" y="120"/>
      <operator activated="true" class="loop_collection" compatibility="5.3.008" expanded="true" height="76" name="Loop Collection" width="90" x="313" y="120">
        <parameter key="set_iteration_macro" value="true"/>
        <process expanded="true">
          <operator activated="true" class="text:process_documents" compatibility="5.3.000" expanded="true" height="94" name="Process Documents" width="90" x="112" y="75">
            <parameter key="vector_creation" value="Binary Term Occurrences"/>
            <parameter key="add_meta_information" value="false"/>
            <process expanded="true">
              <operator activated="true" class="text:tokenize" compatibility="5.3.000" expanded="true" name="Tokenize (3)">
                <parameter key="mode" value="regular expression"/>
                <parameter key="expression" value="[^a-zA-Z ]"/>
              </operator>
              <operator activated="true" class="text:replace_tokens" compatibility="5.3.000" expanded="true" name="Replace Tokens (3)">
                <list key="replace_dictionary">
                  <parameter key=" " value="_"/>
                </list>
              </operator>
              <connect from_port="document" to_op="Tokenize (3)" to_port="document"/>
              <connect from_op="Tokenize (3)" from_port="document" to_op="Replace Tokens (3)" to_port="document"/>
              <connect from_op="Replace Tokens (3)" from_port="document" to_port="document 1"/>
              <portSpacing port="source_document" spacing="0"/>
              <portSpacing port="sink_document 1" spacing="0"/>
              <portSpacing port="sink_document 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="text:create_document" compatibility="5.3.000" expanded="true" height="60" name="Create Document (4)" width="90" x="112" y="300">
            <parameter key="text" value="This Example Process mostly focuses on the transform binominal parameter. &#10;All remaining parameters are mostly for selecting the attributes. &#10;The Select Attributes operator also has many similar parameters for selection of attributes.&#10;You can study the Example Process of the Select Attributes operator if &#10;you want an understanding of these parameters. The Retrieve operator is used to &#10;load the Golf data set. A breakpoint is inserted at this point so that you can &#10;have look at the data set before application of the Nominal to Binominal operator. &#10;You can see that the 'Outlook' attribute has three possible values &#10;i.e. 'sunny', 'rain' and 'overcast'. The 'Wind' attribute has two possible values &#10;i.e. 'true' and 'false'. All parameters of the Nominal to Binominal operator are &#10;used with default values. Run the process. First you will see the Golf data set. &#10;Press the run button again and you will see the final results. &#10;You can see that the 'Outlook' attribute is replaced by three binominal attributes, &#10;one for each possible value of the original 'Outlook' attribute. &#10;These attributes are ' Outlook = sunny', ' Outlook = rain', and ' Outlook = overcast'. &#10;Only the value of one of these attributes is true for a specific example, the value of &#10;the other attributes is false. Examples whose 'Outlook ' attribute had the value 'sunny'&#10;in the original ExampleSet, will have the attribute ' Outlook =sunny' value set to &#10;'true'in the new ExampleSet, the value of the 'Outlook =overcast' and 'Outlook =rain' &#10;attributes will be 'false'. The numeric attributes of the input ExampleSet remain &#10;unchanged. The 'Wind' attribute was not replaced by two binominal attributes, &#10;one for each possible value of the 'Wind' attribute because this attribute is already &#10;binominal. Still if you want to break it into two separate binominal attributes, &#10;this can be done by setting the transform binominal parameter to true.&#10;"/>
          </operator>
          <operator activated="true" class="text:process_documents" compatibility="5.3.000" expanded="true" height="94" name="Process Documents (2)" width="90" x="313" y="255">
            <parameter key="vector_creation" value="Term Occurrences"/>
            <process expanded="true">
              <operator activated="true" class="text:tokenize" compatibility="5.3.000" expanded="true" name="Tokenize (4)">
                <parameter key="expression" value="\\r\\n"/>
              </operator>
              <operator activated="true" class="text:generate_n_grams_terms" compatibility="5.3.000" expanded="true" name="Generate n-Grams (2)">
                <parameter key="max_length" value="5"/>
              </operator>
              <connect from_port="document" to_op="Tokenize (4)" to_port="document"/>
              <connect from_op="Tokenize (4)" from_port="document" to_op="Generate n-Grams (2)" to_port="document"/>
              <connect from_op="Generate n-Grams (2)" from_port="document" to_port="document 1"/>
              <portSpacing port="source_document" spacing="0"/>
              <portSpacing port="sink_document 1" spacing="0"/>
              <portSpacing port="sink_document 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="5.3.008" expanded="true" height="76" name="Generate Attributes" width="90" x="313" y="75">
            <list key="function_descriptions">
              <parameter key="group" value="&quot;Group_%{iteration}&quot;"/>
            </list>
          </operator>
          <connect from_port="single" to_op="Process Documents" to_port="documents 1"/>
          <connect from_op="Process Documents" from_port="word list" to_op="Process Documents (2)" to_port="word list"/>
          <connect from_op="Create Document (4)" from_port="output" to_op="Process Documents (2)" to_port="documents 1"/>
          <connect from_op="Process Documents (2)" from_port="example set" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_port="output 1"/>
          <portSpacing port="source_single" spacing="0"/>
          <portSpacing port="sink_output 1" spacing="0"/>
          <portSpacing port="sink_output 2" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Create Document (2)" from_port="output" to_op="Collect" to_port="input 1"/>
      <connect from_op="Create Document (3)" from_port="output" to_op="Collect" to_port="input 2"/>
      <connect from_op="Collect" from_port="collection" to_op="Loop Collection" to_port="collection"/>
      <connect from_op="Loop Collection" from_port="output 1" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

regards

Andrew

 2 
 on: Today at 07:20:04 PM 
Started by Nils - Last post by Nils
This topic has been moved to Applications and Integration.

http://rapid-i.com/rapidforum/index.php?topic=6820.0

 3 
 on: Today at 06:18:38 PM 
Started by jfcg - Last post by jfcg
I'm starting to use RapidMiner after using Weka and there is one thing that is driving me crazy.

I'm using a simple dataset:

http://cs.uns.edu.ar/~cic/dm2007/downloads/datasets/titanic.arff

Four nominal columns: passenger class (4 classes), age (adult or child), sex (man or woman), survived? (yes or no).

I store that data in a repository and create a very simple process: retrieve the data and apply W-Apriori operator with default parameters. No rules found message.

I don't understand how is working. In Weka GUI with the same parameters, rules are not only found but also they are right. If I insert an operator (with a multiplier) before W-Apriori and write the input to a arff file, in order to ensure what input is getting W-Apriori operator, and run Weka with that file, the same rules are obtained.

If I switch the algorithm to W-FPGrowth, results are consistent. Same results in Weka and in RapidMiner. I also did try to look inside of Weka plugin. I did a binary comparison between my Weka.jar and classes in the plugin. They matched completely. So I'm using exactly the same Weka version.

Does anybody know what am I doing wrong?

I know that RapidMiner have other own operators for association rules, but all my team is working with Weka, and my first step to convince them to switch to RapidMiner is showing that all we do in Weka is directly possible in RapidMiner.

Thanks in advance!

 4 
 on: Today at 05:12:59 PM 
Started by startx25 - Last post by startx25
Hi all,

I have read this wonderful tutorial for Finding text needles in document haystacks :

https://docs.google.com/file/d/0BzlG_h9m5M7tVXUyeVl4cmhJZGc/edit?usp=sharing

It'work fine, but now i want to add another texte file in step 1 : the text needles file  (with label value : ex:Groupe2)
(2 textfile in intput in step 1)

And in the end result proces, i want to identify from witch text needles file provide my wordlist (Groupe1 or Group2) in my textfile in step3

thank you for any help




 5 
 on: Today at 02:31:34 PM 
Started by Jony - Last post by Jony
Hi,

My data set includes around 1600 rows and 18 columns (fist 9 are filled with values from 1 to 50 randomly with one number only once, and last 9 are empty)

I want to fill the last 9 columns with values 1 or 0, if a certain number is available any first 9 columns of that specific row.

I tried using impute missing value, but cold not succeed, i am very new with rapidminer.

Thanks

 6 
 on: Today at 01:31:19 PM 
Started by rapidox - Last post by rapidox
Hi all,
Rapid Miner is a fantastic tool I am using.

I am trying to get Keyword clustering using web mining and text mining example by http://www.simafore.com/blog/bid/116340/ , but I get a "Duplicate attribute name: Content-Type" error.

I have to read a mysql database table and get the LINK information as attribute.

(mysql)
LINK attribute is:

http://www.liberoquotidiano.it/news/cronaca/1261117/Veneto--Zaia--necessario-assicurarsi-contro-eventi-catastrofici.html
http://www.liberoquotidiano.it/news/sostenibilita/1257087/L-Agenzia-europea-per-l-ambiente-lancia-l-allarme-clima--rischio-permanente----.html
http://www.liberoquotidiano.it/news/cronaca/1254046/Maltempo--Grosseto--sopralluogo-di-Marras-con-D-Angelis-in-zone-alluvione.html

I'd like to get keyword clusters that are based on those web pages content.

Do You know a way to get this process working ?

I attach the xml process here.

I thank You for good collaboration in advance !

Have a good day.
Alex

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.008">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="read_database" compatibility="5.3.008" expanded="true" height="60" name="Read Database" width="90" x="45" y="75">
        <parameter key="define_connection" value="url"/>
        <parameter key="connection" value="libero"/>
        <parameter key="database_url" value="jdbc:mysql://localhost:3306/libero"/>
        <parameter key="username" value="root"/>
        <parameter key="password" value="***********************"/>
        <parameter key="define_query" value="table name"/>
        <parameter key="table_name" value="textmine"/>
        <enumeration key="parameters"/>
      </operator>
      <operator activated="true" class="web:retrieve_webpages" compatibility="5.3.000" expanded="true" height="60" name="Get Pages" width="90" x="179" y="30">
        <parameter key="link_attribute" value="Link"/>
        <parameter key="page_attribute" value="PAGE"/>
        <parameter key="random_user_agent" value="true"/>
        <parameter key="delay" value="random"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="5.3.008" expanded="true" height="76" name="Select Attributes" width="90" x="45" y="210">
        <parameter key="attribute_filter_type" value="no_missing_values"/>
        <parameter key="attribute" value="PAGEOUTPUT"/>
        <parameter key="attributes" value="PAGEOUTPUT"/>
        <parameter key="include_special_attributes" value="true"/>
      </operator>
      <operator activated="true" class="text:process_document_from_data" compatibility="5.3.000" expanded="true" height="76" name="Process Documents from Data" width="90" x="380" y="75">
        <list key="specify_weights"/>
        <process expanded="true">
          <operator activated="true" class="web:extract_html_text_content" compatibility="5.3.000" expanded="true" height="60" name="Extract Content (2)" width="90" x="447" y="210">
            <parameter key="ignore_non_html_tags" value="false"/>
          </operator>
          <connect from_port="document" to_op="Extract Content (2)" to_port="document"/>
          <connect from_op="Extract Content (2)" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="multiply" compatibility="5.3.008" expanded="true" height="94" name="Multiply" width="90" x="380" y="345"/>
      <operator activated="true" class="select_attributes" compatibility="5.3.008" expanded="true" height="76" name="Select Attributes (2)" width="90" x="648" y="390">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="|text"/>
        <parameter key="numeric_condition" value="&lt;5"/>
      </operator>
      <operator activated="true" class="k_medoids" compatibility="5.3.008" expanded="true" height="76" name="Clustering" width="90" x="849" y="435"/>
      <connect from_op="Read Database" from_port="output" to_op="Get Pages" to_port="Example Set"/>
      <connect from_op="Get Pages" from_port="Example Set" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
      <connect from_op="Process Documents from Data" from_port="example set" to_op="Multiply" to_port="input"/>
      <connect from_op="Process Documents from Data" from_port="word list" to_port="result 4"/>
      <connect from_op="Multiply" from_port="output 1" to_port="result 1"/>
      <connect from_op="Multiply" from_port="output 2" to_op="Select Attributes (2)" to_port="example set input"/>
      <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Clustering" to_port="example set"/>
      <connect from_op="Clustering" from_port="cluster model" to_port="result 2"/>
      <connect from_op="Clustering" from_port="clustered set" to_port="result 3"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <portSpacing port="sink_result 4" spacing="0"/>
      <portSpacing port="sink_result 5" spacing="0"/>
    </process>
  </operator>
</process>



 7 
 on: Today at 12:26:37 PM 
Started by uday - Last post by uday
Dear Marius,

Thanks for the Reply Smiley, Just need one clarification can we represent the output of the process in a graphical format , like tree view.

Kindly help me in this regard.

Thanks & Regards,
Uday.

 8 
 on: Today at 11:14:22 AM 
Started by star - Last post by star
Thanks Very Much Andrew, It was very helpful, but in my case it didin't work.  Sad

 9 
 on: Today at 10:43:44 AM 
Started by raa - Last post by raa
Does anyone know how to solve this error Error
 "creating renderer: java.lang.ArrayIndexOutOfBoundsException: 48".

 10 
 on: Today at 09:44:52 AM 
Started by iqra - Last post by iqra
Hello,


I have  created a web service in RA but there are some issue.

1. when I select 'XML' output format and MIME Type 'application/xml;charset=utf-8' and test the web service It gives error

Quote
A server error 500 occurred: Process /home/iqra/processes/clustering_corelation_report could not be executed: INVALID_CHARACTER_ERR: An invalid or illegal XML character is specified. . The error has been recorded.

Same error comes after selecting XML XLST (Server & client side)


with Table and HTML and JSON format its working fine.


2. the second issue is : I could not use the webservice in C sharp application. Using the direct link in C Sharp application Visual Studio ask me to download the file ( that is the created webservice in json format) after downloading nothing happens it is not shown up in solution explorer of c sharp application. so that I could add it using "Add Web Reference" option in visual studio.

One thing more while testing the webservice on RA we get the process out put. Where we can find webservice code???


Hope someone will understand my issues. any help would be greatly appreciated..
There must be some more video tutorials for RA Sad

Best Regards.

Pages: [1] 2 3 ... 10