P
P
pahdom12016-01-06 22:20:39
Google
pahdom1, 2016-01-06 22:20:39

Google search results parsing. How to get full link?

I'm trying to parse urls from google, but I get short links. How can I parse the full link?
The code:

qsi = 0
while qsi < 51:
  g.go('https://www.google.ru/search?q=inurl:?id='+sys.argv[1]+'&hl=ru&start='+str(qsi))
  for elem in g.doc.select('//cite'):
    print elem.text().encode('utf8')
    qsi = qsi+10

What I get as a result:
domen.mobi/site_download-video.xht...
domen.in/music/index.php?cat=/...?id=zver
www.domen.rs/.../profile.php?id=zver
domen.com/watch-eiger-nor...
www.domen.si/index.php?p...id=zvER...
domen.info/.../id/ZveR-IQ3otk

Answer the question

In order to leave comments, you need to log in

1 answer(s)
S
sivabur, 2016-01-06
@sivabur

There are full links in the code

<div class="rc" data-hveid="79"><h3 class="r"><a href="http://habrahabr.ru/post/169409/" onmousedown="return rwt(this,'','','','8','AFQjCNFnsS8s7iJf98knI5sbhEHWMhPKBg','pDjHuW4-NiyLV03SAOr2kA','0ahUKEwjeppzQ-ZXKAhVJ_nIKHc1hAekQFghQMAc','','',event)" wotsearchprocessed="true">Парсинг сайтов-магазинов. Личный опыт и немного how-to</a><div wotsearchtarget="habrahabr.ru" style="cursor: pointer; display: inline-block;width: 16px; height: 16px;">&nbsp;</div></h3><div class="s"><div><div class="f kv _SWb" style="white-space:nowrap"><cite class="_Rm">habrahabr.ru/post/169409/</cite><div class="action-menu ab_ctl"><a class="_Fmb ab_button" href="#" id="am-b7" aria-label="Result details" aria-expanded="false" aria-haspopup="true" role="button" jsaction="m.tdd;keydown:m.hbke;keypress:m.mskpe" data-ved="0ahUKEwjeppzQ-ZXKAhVJ_nIKHc1hAekQ7B0IUTAH" wotsearchprocessed="true"><span class="mn-dwn-arw"></span></a><div class="action-menu-panel ab_dropdown" role="menu" tabindex="-1" jsaction="keydown:m.hdke;mouseover:m.hdhne;mouseout:m.hdhue" data-ved="0ahUKEwjeppzQ-ZXKAhVJ_nIKHc1hAekQqR8IUjAH"><ul><li class="action-menu-item ab_dropdownitem" role="menuitem"><a class="fl" href="http://webcache.googleusercontent.com/search?q=cache:PwmzQGr7D3QJ:habrahabr.ru/post/169409/+&amp;cd=8&amp;hl=en&amp;ct=clnk&amp;gl=ua" onmousedown="return rwt(this,'','','','8','AFQjCNELOpA34BVi-o6dCnCV5hO4EkJ7_g','pGbRdVAUf_tYY1wB5R911w','0ahUKEwjeppzQ-ZXKAhVJ_nIKHc1hAekQIAhTMAc','','',event)" wotsearchprocessed="true">Cached</a></li><li class="action-menu-item ab_dropdownitem" role="menuitem"><a class="fl" href="/search?biw=1680&amp;bih=905&amp;q=related:habrahabr.ru/post/169409/+%D0%BF%D0%B0%D1%80%D1%81%D0%B8%D0%BD%D0%B3&amp;tbo=1&amp;sa=X&amp;ved=0ahUKEwjeppzQ-ZXKAhVJ_nIKHc1hAekQHwhUMAc" wotsearchprocessed="true">Similar</a></li></ul></div></div><a class="fl" href="https://translate.google.com.ua/translate?hl=en&amp;sl=ru&amp;u=http://habrahabr.ru/post/169409/&amp;prev=search" onmousedown="return rwt(this,'','','','8','AFQjCNGVnBzb8kedmpcx8BJmoVel4dUECQ','RMwJM8-ZG5nx6AZjCHCg7w','0ahUKEwjeppzQ-ZXKAhVJ_nIKHc1hAekQ7gEIVjAH','','',event)" wotsearchprocessed="true">Translate this page</a></div><span class="st"><span class="f">Feb 14, 2013 - </span>Разделим <em>парсинг</em> (скраппинг) сайтов на две подзадачи. Собственно сам <em>парсинг</em> – поиск данных, которые нам интересны на&nbsp;...</span></div></div></div>

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question