Z
Z
zigen2014-08-14 10:33:04
Python
zigen, 2014-08-14 10:33:04

How to parse article titles from habrahabr using grab python?

Good afternoon. Simple question. How to parse topic names with habrahabr. I don't understand what request to insert into the selector instead of "repo_listing" Topics come with post_number headers

from grab import Grab

g = Grab()

g.go('http://habrahabr.ru/hub/infosecurity/')

for elem in g.doc.select('//ul[@id="repo_listing"]/li/a'):
    print ('%s: %s' % (elem.text(), elem.attr('href')))

Answer the question

In order to leave comments, you need to log in

2 answer(s)
P
Pavel, 2014-08-14
@zigen

it was necessary to look at the html structure of the habr.

<h1 class="title">
      	<a href="http://habrahabr.ru/post/233297/" class="post_title">Опыт работы эникейщиком/системным администратором в бюджетной организации</a>	
  	<a href="/sandbox/" class="flag flag_sandbox" title="Перейти в песочницу">из песочницы</a>
</h1>

So what you need
is g.doc.select('//h1[@class="title"]/a')

Z
zed, 2014-08-14
@zedxxx

What is grub: https://en.wikipedia.org/wiki/GNU_GRUB
What is grab, in the world of python : en.wikibooks.org/wiki/Grab
face :)

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question