How are links returned by search engines?

T

Tirend2016-01-11 14:23:30

Parsing

Tirend, 2016-01-11 14:23:30

Hello, I'm writing a parser. The purpose of its work is as follows - there is a list of requests, the program in turn sends a request to the search engine, the response page is returned to it. You need to take the first search result and download everything at the specified URL.
The following problems arose. I got the link of the first relevant answer, but when I click on this link, the browser returns javascript, i.e. the link is not a link to that resource, but a link somewhere in the bowels of the search engine that returns the javascript.
And the second problem - how all the same to pump out a resource?

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

D

Dimonchik, 2016-01-11
@dimonchik2013

decompose
1) get a direct link from google
2) download the site from the link
using (2) make it separate, or see for example Question for experienced Python's and Scrapy's (you need Scrapy or Grablib), well, or wget, see for example How scrawl webarchive?
according to (1) see https://addons.mozilla.org/en-US/firefox/addon/goo...
you can either unearth it or explore the return and process it with regexps yourself

V

VZVZ, 2016-01-11
@VZVZ

99.99% is AJAX, i.e. JavaScript makes HTTP requests.
They can be intercepted by a sniffer, such as Fiddler. And make the same requests on your YAP.
PS
In 00.01% of cases it's not HTTP/HTTPS, but another protocol. For example, on sockets it is possible. Then the HTTP sniffer is no good.
But this is EXTREMELY rare.

U

uwini, 2016-01-12
@uwini

Good afternoon.
Tirend : I'm talking about baidu, which is a Chinese search engine.
Yes, there are links with a redirect, like:

www.baidu.com/link?url=BG93Jq_BObOnCzspyHAmb_UtfnV...

You can get a direct link from this url with e.g. curl or if you are using php you can only do it with php .