Pages: [1]
  Print  
Author Topic: FeatureExtraction - xpath - span  (Read 1482 times)
ullash
Newbie
*
Posts: 3


« on: March 18, 2009, 11:05:31 AM »

Hi,

I am trying to use FeatureExtraction to extract some text from a web page (XHTML). But it does not seem to work.
The xpath location i get using mozilla firebug  is: /html/body/span/div/div/h2
For Rapidminer the xpath query i am using is: /h:html/h:body/h:span/h:div/h:div/h:h2/text()
The above xpath query does not seem to work in rapidminer.

But if i remove the span tag from the webpage, the resulting xpath query seems to work. [/h:html/h:body/h:div/h:div/h:h2/text()]

So my question is how do i extract text from a webpage which has a span tag.

Thanks

« Last Edit: March 19, 2009, 02:27:13 PM by ullash » Logged
Ingo Mierswa
Administrator
Hero Member
*****
Posts: 1226



WWW
« Reply #1 on: March 23, 2009, 12:21:00 PM »

Hi,

I have actually not really an idea right now but it could be that the problem is that "span" is actually not allowed to have inner container tags like div etc., right? Maybe this is the reason why xpath fails here.

Cheers,
Ingo
Logged

Did you try our new Marketplace? Upload or download new Extensions, add comments, and organize your operators. Have a look at  http://marketplace.rapid-i.com
Pages: [1]
  Print  
 
Jump to: