Pages: [1]
Author Topic: How to use Read RSS Feed  (Read 3231 times)
Posts: 2

« on: July 04, 2012, 10:43:29 PM »


I've made a webpage of 87x "Best RapidMiner Videos", above.  After a few *hours* of googling I could find no quickstart, tutorial or examples for using "Read RSS Feed"....  I want to process web feeds, primarily Twitter feeds.  I could also find no quickstart, tutorial or examples for using RapidMiner with Twitter.  The main thing I want to be able to do is filter tweets on non-identical similarity, presumably via classification.  I would also like to try using RapidMiner for link metrics.  I have searched this forum for "rss" and "twitter", without finding anything helpful to me.  Also, it is not clear to me how to access RapidMiner results programmatically like an API.  It seems that it would be much more accessible if RapidMiner were available in the cloud as a web service.  Ultimately, I hope to be able to use RapidMiner in conjunction with Yahoo! Pipes.  (And, I could find no quickstart, tutorial or examples for using RapidMiner with Yahoo! Pipes....)
Hero Member
Posts: 887

« Reply #1 on: July 05, 2012, 08:19:55 AM »

I Googled "rapidminer Read RSS Feed" and got as the first link. It says, "This workflow connects RapidMiner to Twitter and downloads the timeline". Are we using the same Google?

Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in information?

T.S.Eliot ~ Choruses from the Rock 1934
Posts: 2

« Reply #2 on: July 05, 2012, 02:40:45 PM »

I also got that, but could not get it to work....  There seems to be some kind of inconsistency or incompatibility between the "Read RSS Feed" and the "Process Documents from Data".  Can you reproduce it in a working version?  Can you post the steps you took to get it working?
Posts: 11

« Reply #3 on: April 15, 2013, 09:57:38 PM »

I am having the same issue.  Has this been resolved?

Posts: 25

« Reply #4 on: April 17, 2013, 12:03:18 AM »

This is the example that haddock cited -
I just added my Twitter feed url and my
user agent to "Read RSS Feed" and it works
fine. (Rapid Miner 5.3.000):
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.000">
  <operator activated="true" class="process" compatibility="5.3.000" expanded="true" name="Process">
    <process expanded="true" height="394" width="547">
      <operator activated="true" class="web:read_rss" compatibility="5.3.000" expanded="true" height="60" name="Read RSS Feed" width="90" x="45" y="75">
        <parameter key="url" value=""/>
        <parameter key="user_agent" value="Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0"/>
      <operator activated="true" class="text:process_document_from_data" compatibility="5.3.000" expanded="true" height="76" name="Process Documents from Data" width="90" x="179" y="75">
        <parameter key="prune_method" value="percentual"/>
        <parameter key="prunde_below_percent" value="5.0"/>
        <parameter key="prune_above_percent" value="80.0"/>
        <parameter key="prune_below_rank" value="5.0"/>
        <parameter key="prune_above_rank" value="5.0"/>
        <list key="specify_weights"/>
        <process expanded="true" height="374" width="547">
          <operator activated="true" class="text:tokenize" compatibility="5.3.000" expanded="true" height="60" name="Tokenize" width="90" x="45" y="30"/>
          <operator activated="true" class="text:transform_cases" compatibility="5.3.000" expanded="true" height="60" name="Transform Cases" width="90" x="179" y="30"/>
          <operator activated="true" class="text:filter_by_length" compatibility="5.3.000" expanded="true" height="60" name="Filter Tokens (by Length)" width="90" x="313" y="30">
            <parameter key="min_chars" value="3"/>
          <operator activated="true" class="text:filter_stopwords_english" compatibility="5.3.000" expanded="true" height="60" name="Filter Stopwords (English)" width="90" x="447" y="30"/>
          <connect from_port="document" to_op="Tokenize" to_port="document"/>
          <connect from_op="Tokenize" from_port="document" to_op="Transform Cases" to_port="document"/>
          <connect from_op="Transform Cases" from_port="document" to_op="Filter Tokens (by Length)" to_port="document"/>
          <connect from_op="Filter Tokens (by Length)" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
          <connect from_op="Filter Stopwords (English)" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
      <operator activated="true" class="text:wordlist_to_data" compatibility="5.3.000" expanded="true" height="76" name="WordList to Data" width="90" x="313" y="75"/>
      <connect from_op="Read RSS Feed" from_port="output" to_op="Process Documents from Data" to_port="example set"/>
      <connect from_op="Process Documents from Data" from_port="word list" to_op="WordList to Data" to_port="word list"/>
      <connect from_op="WordList to Data" from_port="word list" to_port="result 1"/>
      <connect from_op="WordList to Data" from_port="example set" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>

Pages: [1]
Jump to: