C
C
cegthgtlhj2019-10-14 22:15:54
Python
cegthgtlhj, 2019-10-14 22:15:54

Parsing. How to correctly formulate a request to a website with a window (search)?

a query like this does not return search results

def fg_list_bot(_name_element, _output_file):
    print(_name_element)
    s = requests.Session()
    _data = {"searchValue": _name_element,"searchSubmit":"submit[s][/s]"}
    _headers = {"User-Agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"}  
    r = requests.post(_url, data=_data, headers = _headers)
    with open(_output_file, "w", encoding='utf-8') as f:
        f.write(r.text)
    print(r.status_code)
#
# code beginn
#
import requests
from bs4 import BeautifulSoup
_url = "https://hifi-filter.com/en/catalog/recherche-equivalence.html"
_output_file = "IDLE_HIFI.html"
_name_element = "pi3115"   
fg_list_bot(_name_element, _output_file)

when I add a captcha, as below, then for some time, while the captcha is valid, the program works, then after some time it stops returning results as well as the previous version
def fg_list_bot(_name_element, _output_file):
    print(_name_element)
    s = requests.Session()
_data = {"searchValue": _name_element,"searchSubmit":"submit",
             "g-recaptcha-response":"03AOLTBLSGgHKYeeU_WgH-tOhoUV8UXkBejUCAhxgfuyBKE0QA0PeDOcTlrhTd0zlhTyCVIjjkZrfxWBnBfd6R5_G_XU15ZN8s3nqHljYjvXMHpijXj4TZUIu0t_hBHu65rJb7op28Iz1EplJxP0lbfXJbm3Mif-O6jg-eXb-v_spSH4W2aW4nSvMMrHGy-7iJpOns4O-Ff-P2kit_E7jbrKF6jakyR1f0FlcLGFHAPNaf0w2BhnXvxlFmo6ghDR58jqJmWiRRj0BK8nAMIw0FVI4J1j3hoWDxxNX6bnHXxw-mQb-FEhwM4oHMVCvj-NqzG2gX__H9AXuSU7Ehnl9YwtMi3ssW6V4FuEmVIwpZDPy-nIfSdi7NyuycZj6tgLFyKfefj91oaWCNoNqH48I0MfE6zkfim7KlTfbG0LxGIFpH4MMH1_iNunJ0LJU9s_o8jUA3HP5bL-1jPVAbFC6pnxe07GmyKaSutQ"}    _headers = {"User-Agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"}  
    r = requests.post(_url, data=_data, headers = _headers)
    with open(_output_file, "w", encoding='utf-8') as f:
        f.write(r.text)
    print(r.status_code)
#
# code beginn
#
import requests
from bs4 import BeautifulSoup
_url = "https://hifi-filter.com/en/catalog/recherche-equivalence.html"
_output_file = "IDLE_HIFI.html"
_name_element = "pi3115"   
fg_list_bot(_name_element, _output_file)

this is what the site looks like
q02.png

Answer the question

In order to leave comments, you need to log in

1 answer(s)
G
grinat, 2019-10-15
@cegthgtlhj

https://pypi.org/project/pyppeteer/ and use it to click on the captcha if necessary, etc. True, I have big doubts that you will bypass the Google captcha in this way, in general - https://anti-captcha.com/mainpage there will be enough to throw a couple of bucks for thousands of captchas.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question