Open Source Software für Big Data Analytics.
Ohne Programmierung.

HomeKontaktSucheSitemapDatenschutzImpressum
  • Deutsch
  • English
Rapid-I. Report the Future. Home Download
Rapid-I Blog
Home Home
Search Search
RSS Feed RSS Feed

 

 

Blog Tags
Login Form





Passwort vergessen?
Noch kein Benutzerkonto?
Registrieren
Tag >> Web Mining
Web ScrapingWeb MiningVideos 5 Apr 2011
Web Mining Video Series by Ingo Mierswa Comment (1)

Neil McGuigan, who already made a great series of Text Mining videos ,  has started a new video series about web crawling and web scraping . Until now, the video series consists of three parts:

 

Web Mining and Web Scraping

 

Part 1: Web Scraping with Google Spreadsheets and XPath

In his first video, Neil demonstrates how to grab parts of a web page (scraping) using Google Docs Spreadsheets and XPath. Although RapidMiner is not used here, the explanation of XPath expressions and his list of useful XPath constructs are really helpful if you want to set up a web scraping process with RapidMiner. 

 

Part 2: Web Crawling with RapidMiner

Here, Neil shows how to crawl about 500 pages from a site by a simple RapidMiner process. He also  discusses user agents, crawling rules, and robot exclusion files.

 

Part 3: Web Scraping with RapidMiner and XPath

In this video, Neil shows how to load the 500 html files from the previous web crawl, loop through each of them, use XPath to grab values from each page, and put them in a data table for later analysis. Here the XPath introduction becomes quite handy.

 

Thanks, Neil, for this second great series!

  • Share/Bookmark
  • Abbonieren Sie unseren RSS Feed!
  • Sehen Sie sich Videos in unserem YouTube Channel an!
  • Rapid Insight / Inside Rapid-I (Blog)
  • Besuchen Sie Rapid-I bei Facebook und werden Sie Fan!
  • Folgen Sie Rapid-I bei Twitter!
  • Lesen Sie den Rapid-I Newsletter