Pages: [1]
  Print  
Author Topic: Crawl Web Link-Page pairs are incorrect  (Read 508 times)
CharlieFirpo
Newbie
*
Posts: 49


« on: May 30, 2013, 09:48:12 AM »

Dear all!

When I use Crawl Web operator and check the "add pages as attribute" parameter then the result will consist of Link-Page pairs (the number of examples depend on "max pages" parameter). But if I check the HTML content of the Link attribute's value (Url from Link attribute (Url 1)) then I see that the real HTML content (of the Link value) is different from the Page attribute's content. How can it be?

If I don't store the Page in Crawl Web but use a Get Pages operator that has the "link attribute" parameter set to Link attribute (from the Crawl Web) and set the "page attribute" parameter to Page, I see that the Link and Page pairs are different too (as in Crawl Web). And when I check the output of Get Pages, I can see an URL attribute too next to the Link and Page attributes (and some more attributes). And the URL attribute contains the real Url (Url 2) belongs to the Page attribute's value. So the HTML content of the URL attribute's value (Url 2) is the same as the Page attribute content (Page's value). But different from Link attribute's value (Url 1).

But I don't understand why the Link attribute and URL attribute are different. And why the Page attribute's values don't belong to Link attribute's values.
« Last Edit: May 30, 2013, 10:08:07 AM by CharlieFirpo » Logged
CharlieFirpo
Newbie
*
Posts: 49


« Reply #1 on: May 30, 2013, 01:31:14 PM »

Ahh..

So the difference between the input Url (the Link attribute's value) and the Page value (HTML content) is because the cookies are not enabled via Crawl Web. But in my web browser, the cookies are enabled, so when I checked out the Url's and its content, this content was different from Crawl Web's Page attribute's value.

At Get Pages operator it is possible to enable cookies but at Crawl Web there is not a parameter to enable cookies. Or is there?
Logged
Marius
Administrator
Hero Member
*****
Posts: 1794



WWW
« Reply #2 on: June 11, 2013, 11:09:39 AM »

Unfortunately, Crawl Web does not support the option to set cookies. But isn't it sufficient to login with the Get Page operator and then use Crawl Web?

Best regards,
Marius
« Last Edit: June 11, 2013, 11:11:17 AM by Marius » Logged

Please add [SOLVED] to the topic title when your problem has been solved! (do so by editing the first post in the thread and modifying the title)
Please click here before posting.
Pages: [1]
  Print  
 
Jump to: