Pages: [1]
  Print  
Author Topic: Crawling and storing onclick element data  (Read 433 times)
RPDMUSR
Newbie
*
Posts: 2


« on: May 20, 2013, 08:21:51 PM »

Hi, I was wondering if you could help with this page:
http://www.hockeyligan.se/elitserien-arena/38238/live/
specifically, I would like to store the data under the tab marked "Sammanfattning" but cannot get Rapidminer to crawl to the data. 
Logged
Marcin
Global Moderator
Full Member
*****
Posts: 165


« Reply #1 on: May 21, 2013, 09:03:13 AM »

Have you tried the web extension? Do you have a specific problem? Can you share the process you have already used?
Logged
RPDMUSR
Newbie
*
Posts: 2


« Reply #2 on: May 21, 2013, 04:51:58 PM »

Hi. Thanks for the reply.

I'm currently using Crawl Web and follow_link_with_matching_URL to arrive at the
http://www.hockeyligan.se/elitserien-arena/38238/live/
page.
When I attempt to use store_with_matching_URL on the page, the data contained in the tabs is not stored, only the data in the margins.

When I inspect element, it shows the data in the tabs as list items where the data is generated by the onclick event.
Not sure how to capture that data from that event.
Logged
Marcin
Global Moderator
Full Member
*****
Posts: 165


« Reply #3 on: May 22, 2013, 12:49:40 PM »

Hmmm, it seems that this page does some crazy javascript magic to load the content after the page is loaded and since our crawler does not execute javascript inside a page, you are not getting your content.

The only way to get the content is to check which URLs the page is calling to get their content and try to crawl this pages which typically deliver their data in the JSON format. The developer tools of chrome (or any other network tool) could be useful to monitor the network traffic.
Logged
Pages: [1]
  Print  
 
Jump to: