Answer the question
In order to leave comments, you need to log in
AJAX parsing in BeautifulSoup Python?
I decided to try to implement a job parser. The site gives out only 20 links, then the "More" button.
Through the "Network" tab, I looked at what sends the request. Crutch pulled out CSRF_TOKEN (pulls out every other time) and made a request, I get status code 403.
Website: https://jobs.dou.ua/vacancies/?category=Ruby
Code:
import requests
from bs4 import BeautifulSoup
HEADERS = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"}
URL = "https://jobs.dou.ua/vacancies/?category=Ruby"
session = requests.Session()
def get_html(url):
r = session.get(url, headers=HEADERS)
return r
def get_links(response):
if response.status_code == 200:
html = BeautifulSoup(response.text, "html.parser")
lis = html.find_all('li', class_="l-vacancy")
# Количество вакансий до нажатия
print(len(lis))
# Костыльно достаю csrf
script = str(html.select('script')[5])
csrf = str(script[32:32+64])
print(script)
print(csrf)
load_data = {
'csrfmiddlewaretoken': csrf,
'count': 20}
response = session.post('https://jobs.dou.ua/vacancies/xhr-load/?category=Ruby', data=load_data)
print(response.status_code)
html = BeautifulSoup(response.text, "html.parser")
lis = html.find_all('li', class_="l-vacancy")
# Количество вакансий после нажатия
print(len(lis))
else:
return 'Connection error!'
get_links(get_html(URL))
Answer the question
In order to leave comments, you need to log in
The site is blocked in the Russian Federation, but I will write general recommendations, it should work
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question