Open source software for big data analytics.
No programming required.

HomeContact UsSearchSitemapPrivacy PolicyImprint
  • Deutsch
  • English
Rapid-I. Report the Future. Home Download
Rapid-I Blog
Home Home
Search Search
RSS Feed RSS Feed

 

 

Blog Tags
Login Form





Lost Password?
No account yet? Register
Tag >> Web Mining
Web ScrapingWeb MiningVideos 5 Apr 2011
Web Mining Video Series by Ingo Mierswa Comment (1)

Neil McGuigan, who already made a great series of Text Mining videos ,  has started a new video series about web crawling and web scraping . Until now, the video series consists of three parts:

 

Web Mining and Web Scraping

 

Part 1: Web Scraping with Google Spreadsheets and XPath

In his first video, Neil demonstrates how to grab parts of a web page (scraping) using Google Docs Spreadsheets and XPath. Although RapidMiner is not used here, the explanation of XPath expressions and his list of useful XPath constructs are really helpful if you want to set up a web scraping process with RapidMiner. 

 

Part 2: Web Crawling with RapidMiner

Here, Neil shows how to crawl about 500 pages from a site by a simple RapidMiner process. He also  discusses user agents, crawling rules, and robot exclusion files.

 

Part 3: Web Scraping with RapidMiner and XPath

In this video, Neil shows how to load the 500 html files from the previous web crawl, loop through each of them, use XPath to grab values from each page, and put them in a data table for later analysis. Here the XPath introduction becomes quite handy.

 

Thanks, Neil, for this second great series!

  • Share/Bookmark
  • Stay tuned with our RSS feed!
  • Watch videos on our YouTube channel!
  • Rapid Insight / Inside Rapid-I (Blog)
  • Visit Rapid-I on Facebook and become our fan!
  • Follow Rapid-I on Twitter!
  • Read the Rapid-I Newsletter