T
T
Taya2019-05-14 15:53:32
Python
Taya, 2019-05-14 15:53:32

How to parse a website in python?

you need to parse the site https://koleso.ru/shops/, get data about each store (address, phone, opening hours, coordinates),
but the problem is how to do it? I can not find where the coordinates are taken.
help me please

Answer the question

In order to leave comments, you need to log in

3 answer(s)
A
Alexander, 2019-05-14
@NeiroNx

For some tasks selenium is redundant:

>>> import re
>>> from urllib.request import urlopen
>>> regex = r"createObject\(\"Placemark\",\s?new\sYMaps\.GeoPoint\(([\d\s\.\,]+)\),\s?\"(\w+)\",\s?'([^']+)'\s?\);"
>>> text = str(urlopen("https://koleso.ru/shops/").read(),"windows-1251")
>>> result = [list(x.groups()) for x in re.finditer(regex, text, re.MULTILINE)]
>>> result[0]
['37.834803,55.776082', 'Koleso', '<div><a class="MenuNav_YmapsBalloonPreButton" style="font-size:11px;" href="/shops/3653118/">Карточка магазина</a></div><div class="MenuNav_YmapsBalloonComment"><b>г. Москва</b><br />ш. Энтузиастов, д. 63<br />тел.: +7(499)308-59-93</div>']
>>>

but you need to be able to write a regex

S
Stanislav Pugachev, 2019-05-14
@Stqs

https://scrappy.org/

A
Aleksandr, 2019-05-14
@QQQ-RRR

I myself recently started to study this direction and use Selenium.
Nothing complicated if you have minimal knowledge of HTML, you can, of course, without them

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question