How to parse a dynamic site?

B

black_dis2021-10-22 13:53:37

Python

black_dis, 2021-10-22 13:53:37

https://forum.malinovka.org/topic/13323-list-action...
From this site you need to parse leaders and information on them.
With a normal req request, I get "Please turn JavaScript on and reload the page." and I can't get the information I need.

The code will not be used by me.

import requests
from bs4 import BeautifulSoup

headers = {"user-agent" : "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36"}
res = requests.get("https://forum.malinovka.org/topic/13323-список-действующих-лидеров/", headers = headers)
soup = BeautifulSoup(res.content, "html.parser")

all_liders = soup.findall("div", class_ = "ipsType_normal ipsType_richText ipsContained")

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

V

Vladimir Kuts, 2021-10-22
@fox_12

Take Selenium - and go...

J

jerwright, 2021-10-22
@jerwright

import requests
from bs4 import BeautifulSoup
from fake_useragent import UserAgent
from selenium import webdriver
import time

URL = 'https://forum.malinovka.org/topic/13323-список-действующих-лидеров/'

options = webdriver.ChromeOptions()
driver = webdriver.Chrome(executable_path="chromedriver.exe", options=options)
driver.get(url=URL)
time.sleep(2)
useragent = UserAgent()
needed_html_code = driver.page_source
driver.close()
driver.quit()

soup = BeautifulSoup(needed_html_code, "html.parser")

content_div = soup.find('div', class_='cPost_contentWrap ipsPad')
for p in content_div.find_all('p')[1:]:
  for item in p.contents:
    print(str(item.string).replace('None', ''), end='\n')
  print("-"*15)

You need to install webdriver (in my case chrome) for the code to work. If you put the code on heroku, for example, then you can additionally install it there.