Pages: [1]
  Print  
Author Topic: I have list of urls and data should be crawl only from that urls using xpath  (Read 639 times)
vpkris
Newbie
*
Posts: 1


« on: December 24, 2012, 02:01:52 AM »

Dear Team,

I am very much confused and stuck.

I have 1000 urls and i need to extract data from this 1000 urls.

I have stored 1000 urls in csv.

I also seen tutorial from http://vancouverdata.blogspot.com/2011/04/rapidminer-web-crawling-rapid-miner-web.html and http://vancouverdata.blogspot.com/2011/04/web-scraping-rapidminer-xpath-web.html. It is excellent but i am not sure where i am lost to understand.

I have enable all extensions.

Do we have one video tutorial which explains process of import url and getting data.

I must learn about this and i am very much interested. please guide me.

I have been trying this from past 2days but i am missing.

Logged
Marius
Administrator
Hero Member
*****
Posts: 1753



WWW
« Reply #1 on: January 02, 2013, 10:20:28 AM »

Hi,

I am not sure where exactly you got stuck, but if your problem is to access the urls stored in your file at first place, the Get Pages operator is for you. Just load your csv file containing the urls, then pass that data to get pages and specify in the link_attribute parameter which column contains the urls.

Best regards,
Marius
Logged

Please add [SOLVED] to the topic title when your problem has been solved! (do so by editing the first post in the thread and modifying the title)
Please click here before posting.
alphabeto
Newbie
*
Posts: 8


« Reply #2 on: September 30, 2013, 11:56:20 AM »

Hi,
Can rapid miner do a automated regular research (say daily) of a list of words in a list of url, and get each page link?
I have a list of  words and I want to regularly get every web link where any of these words appears in any of the web url from my predefined urls list.


Eg. wordlist : qwe, rty
url list: www.asd.com, www.zxc.com

What is the process path in order to get daily and automated each web link where words "qwe" and/or "rty" apear in the www.asd.com and/or www.zxc.com


Many thanks
Dan
Logged
Pages: [1]
  Print  
 
Jump to: