Answer the question
In order to leave comments, you need to log in
Python selenium, how to do a validation when uploading a file to a specified directory?
Good evening dear connoisseurs.
There is a parser that collects data, and also loads files into a folder on the laptop disk. using selenium
There is a catch, the files come across the same (different types of goods have the same description), and their sizes are not small, and he has to download them again every time, but I would like to implement the check at night!.
How to make a check when loading the parser so that it checks if there is such a file in the folder or not. I can’t get the file name, because the download link is generated when you click
on the product link, you need to log in to see the file https://stomshop.pro/hlw-31-45b#tab-documentation
An example of my piece of file download code
options = webdriver.ChromeOptions()
# options.add_argument(f"user-agent={user_agent.random}")
options.add_argument("--disable-blink-features=AutomationControlled")
options.add_argument("--headless")
options.add_experimental_option('prefs', {
"download.default_directory": path_registration_documents,
"download.prompt_for_download": False,
"download.directory_upgrade": True,
"plugins.always_open_pdf_externally": True,
}
)
driver = webdriver.Chrome(
executable_path=f"{base_path}/chromedriver",
options=options
)
driver.find_element_by_id("tab-documentation-li").click()
time.sleep(0.5)
documents = driver.find_elements_by_class_name("docext-container")
for document in documents:
document.click()
time.sleep(1)
Answer the question
In order to leave comments, you need to log in
I am sure that there are many special handlers and other things in selenium to get information about the file being uploaded, etc., but so far no one sees, I suggest a crutch: we will manually form a request to receive the file, and without unloading the request, completely get the file name from the response headers
import requests
import re
import os
#...
headers = {'Content-Type': 'application/x-www-form-urlencoded'}
documents = driver.find_elements_by_class_name("docext-container")
for document in documents:
# тут ищем родительский элемент, в нем есть нужный нам ID
document_id = document.find_element_by_xpath('..').get_attribute('data-documentation-id')
# в пейлод вписываем нужные данные от формы, и вставляем наш ID
payload='cr_documentation_action=download&documentation_id={}&email='.format(document_id)
# url для запроса - текущая страница
# ставим обязательно stream=True, чтобы файл не выкачивался сразу
r = requests.post(driver.current_url, headers=headers, data=payload, stream=True)
# название файлов всегда есть в заголовках запроса, response.headers
# поэтому берем их, видим в нужном ключе "attachment; filename*=UTF-8''hlw-shiptsy-ortodonticheskie-reg.pdf"
# ну и недолго думая дергаем регуляркой
document_name = re.search(r'\'\'(.+?\.pdf)', r.headers['Content-Disposition']).group(1)
# дальше уже нужно проверить наличие файла в папке
# я так понял путь до папки с загрузками в переменной path_registration_documents, так что:
if document_name in os.listdir(path_registration_documents):
print('Не новый')
else:
print('Новый док')
document.click()
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question