R
R
RedHairOnMyHead2016-05-20 17:27:38
Python
RedHairOnMyHead, 2016-05-20 17:27:38

How to programmatically download multiple files from a website?

Can anyone give some information on how to programmatically download several files from the site?
There are several dates and for these dates you need to download files (for each date a separate file).

Answer the question

In order to leave comments, you need to log in

3 answer(s)
A
Alexey Ukolov, 2016-05-20
@ThePyzhov

What is "download file" from a programmer's point of view? This is a) send an http request to a specific address and b) store the response in the right place.
Now, armed with this sacred knowledge, you can safely go to the search engine, adding to each of these items the technology stack that is relevant to you.

M
Mercury13, 2016-05-20
@Mercury13

1. Get the principle by which links are built. You may have to disassemble the code of the site itself for this.
2. Use any HTTP library: Indy (Delphi), cURL (DLL with lots of bindings) or something else. Unfortunately, I didn't work with others. Well, or just run a suitable EXE loader (wget, curl ...).
3. Bypass anti-DDoS measures. Many content distribution networks check if the client looks like a legitimate browser - our bot must bypass all these measures.
I know that the weakness of the latest versions of Indy is the focus on .NET and the limitations that follow from there; also Indy is “too smart” and complex REST services do not work, because you build the body, calculate the simulated insertion - and the service says that they do not correspond to each other; Apparently, Indy rigged something. By the way, when I realized that requests without a body work, but not with a body, curl.exe became the first test, and only then I connected libcurl through my own Delphi binding.
The weakness of cURL is that the subtleties of the protocols (say, headers and encodings) are left to the application programmer; FILE* oriented and in all languages ​​except C, writing to a file is a bit difficult; is vararg-oriented and if you work without a type-safe wrapper, you need to be extremely careful.

T
TomasHuk, 2016-05-20
@TomasHuk

На python скачивал по прямым ссылкам так:

import urllib.request

# Задаем адрес в интернете
url = 'https://www.python.org/static/img/python-logo.png'
# Скачиваем файл, сохраняем его с именем logo.png
urllib.request.urlretrieve(url, "logo.png")

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question