L
L
Lim_Drake2020-11-08 19:02:40
Python
Lim_Drake, 2020-11-08 19:02:40

Is it possible to extract information from blocks united by one class when parsing a site?

Hello! I want to write a Cian site parser ( https://cian.ru/ ), I write in Python, I use BeautifulSoup.
I ran into such a problem that under one class, in several blocks, there is the information I need (I attach a screen if I formulated the problem incorrectly)
5fa80e9b75657911751534.png
Using the selector for this class:

address_block = item.select_one('div._93444fe79c--labels--1J6M3')

I get similar results:
<div class="_93444fe79c--labels--1J6M3">
<a class="_93444fe79c--link--10mjQ" data-name="GeoLabel" href="https://saransk.cian.ru/kupit-kvartiru-mordoviya/" target="_blank">Республика Мордовия</a>, 
<a class="_93444fe79c--link--10mjQ" data-name="GeoLabel" href="https://saransk.cian.ru/kupit-kvartiru/" target="_blank">Саранск</a>, 
<a class="_93444fe79c--link--10mjQ" data-name="GeoLabel" href="https://saransk.cian.ru/kupit-kvartiru-mordoviya-saransk-oktyabrskiy-044297/" target="_blank">р-н Октябрьский</a>, 
<a class="_93444fe79c--link--10mjQ" data-name="GeoLabel" href="https://saransk.cian.ru/kupit-kvartiru-mordoviya-saransk-volgogradskaya-ulica-0231169/" target="_blank">Волгоградская улица</a>, 
<a class="_93444fe79c--link--10mjQ" data-name="GeoLabel" href="/cat.php?deal_type=sale&amp;engine_version=2&amp;house%5B0%5D=2192661&amp;offer_type=flat" target="_blank">124</a></div>

If I use a class selector under the 'a' tag, then I get only the information of 1 block with that class (I will only get the given string: 'Republic of Mordovia')
Is it possible to process this so that I get the full address information?
PS I ask you not to hit hard, I'm just starting to study this area. Also, I apologize if my question is incorrect!

Answer the question

In order to leave comments, you need to log in

3 answer(s)
L
Lim_Drake, 2020-11-10
@Lim_Drake

The solution was provided by Evgeny Palych a little higher in the comments to the post

address_block = item.select_one('div._93444fe79c--labels--1J6M3')
print(address_block.get_text(" "))

# для всех div на странице с классом _93444fe79c--labels--1J6M3
address_blocks = item.select('div._93444fe79c--labels--1J6M3')

for e in address_blocks:
  print(e.get_text(" "))

E
Evgeny Palych, 2020-11-08
@xzKakoyLogin

Check out the BeautifulSoup documentation for many examples.

N
Nikita Undefined, 2020-11-08
@Privetiq

I don’t know about python and this library, but I’ll try to guess:

address_block = item.select_one('div._93444fe79c--labels--1J6M3 a')

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question