How to parse content generated by JS in Python. What do you advise?

M

mRelby2020-09-24 18:45:46

Python

mRelby, 2020-09-24 18:45:46

I'm new to this business, so I'll ask you not to throw stones at me right away)

Today, wherever you look, JS is everywhere. Which actually significantly complicates parsing, at least in Python for sure.
Two libraries: Requests in conjunction with bs4 is not enough. Or I don’t know something, in which case I would be grateful if you poke your nose where it should be (in the documentation).

Actually the question is: how and with what help is it best to parse certain content from pages that are generated by JS? Thanks

in advance for your replies.

Reply

Answer the question

In order to leave comments, you need to log in

6 answer(s)

D

Dr. Bacon, 2020-09-24
@bacon

Or I don’t know something, in which case I would be grateful if you poke your nose

you need to poke your nose into Google, at every step they write about selenium
PS looked at the previous question, but you have already been poked into it, it means it's useless.

V

Vladislav Lyskov, 2020-09-24
@Vlatqa

selenium

D

DrrRos, 2020-09-24
@DrrRos

Depends on content. Either selenium or, if you need to parse what the page loads via the API, then you can intercept the request to the API and write your own implementation on requests\aiohttp\ to taste.

S

Sergey Ilyin, 2020-09-25
@sunsexsurf

support. Selenium is not always needed (and for a long time, sometimes). First, study how the server gives content (or attach a link to the service). Sometimes, you just need to dig into how requests leave and how they return. And no Selenium is needed.

D

Dimonchik, 2020-09-24
@dimonchik2013

there are slightly lighter tools with JS processing, but they are all more complicated, so Selenium

I

IDzone-x, 2020-09-24
@IDzone-x

Selenium and more