Why doesn't request give the entire html code of the page?

R

r4khic2019-11-11 07:22:34

Python

r4khic, 2019-11-11 07:22:34

I'm doing a regular normal get request using the requests library and not getting all the html code of the page.
The code:

r = requests.get(resource_main_url, timeout=30, headers={"content-type":"text"})
return r.text

From this resource , the incomplete html code of the page is returned to me.
In the response from the site in the html code, there are no tags I need, such as:

h2 class="five_news_title onlytext"

Here is the actual html code of the page that is returned to me.
The first thing that came to mind was that this resource loads these tags using ajax, checking whether this is so, it turned out not to be so.
What else could be the reason that I am not getting the full html code of the page?

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

D

dollar, 2019-11-11
@dollar

There is a suspicion that you are getting a mobile version. Or the bot version . To avoid this, you need to pretend to be a browser. Give the same headers as the browser. First of all, User-Agent from Chrome or FireFox.
There might also be a 403 (Access Denied) error or something similar, again because of the headers.
Another option is that you crossed the path to them. They figured you out, and now they want to interfere with you so that you stop parsing them. Moreover, you need to disguise yourself, at the same time reduce the RPS to a very rare one so as not to burn, and even so you will not interfere with anyone.