Answer the question
In order to leave comments, you need to log in
How to read file paths in Excel, download them and sort them into folders?
The idea is this: There is an Excel file, in each line of which there is a field with a unique ID, and then there are fields, each of which contains external file paths.
The task is to go through all the lines, downloading the files and save them in a folder, which also needs to be created, using the unique id specified in the line for its name.
I am an amateur in programming, but I suspect that a similar task can be solved with the help of some libraries for Python. I would be glad to receive recommendations of such or maybe there are already semi-finished solutions for such tasks?
Answer the question
In order to leave comments, you need to log in
You can convert excel to csv - it will be much easier. This task is solved in 2 minutes.
Or maybe there are already semi-finished solutions for such tasks?
With the help of openpyxl (of course you can use pandas, it is done there in an elementary way) you read your Excel, then with the help of a loop and requests you download files, if there are a lot of files, try to write an asynchronous uploader, it will download 10 times faster (aiohttp and aiofiles will help). Well, to create folders, you will need os.
This is how you can pick up the entire column from Excel
import pandas as pd
from glob import glob
file = glob('*.xlsx')[0]
table = pd.read_excel(file)
urls_list = table['Название столбца'].to_list()
import asyncio
import os
from os.path import join as pth_join
import aiofiles
import aiohttp
DWNLD_FLDR = "Download"
async def download_file(session: aiohttp.ClientSession, link: str, file_name: str):
async with session.get(link) as resp:
if resp.status == 200:
f = await aiofiles.open(pth_join(DWNLD_FLDR, file_name), "wb")
await f.write(await resp.read())
await f.close()
else:
print(f"Error: {resp.status}")
async def gather_files(files_urls: list[dict]):
async with aiohttp.ClientSession(headers=HEADERS) as session:
tasks = []
for item in files_urls:
try:
if os.stat(pth_join(DWNLD_FLDR, item["file_name"])).st_size:
continue
except FileNotFoundError:
pass
task = asyncio.create_task(
download_file(session, item["file_link"], item["file_name"])
)
tasks.append(task)
await asyncio.gather(*tasks)
def main(file_list):
os.makedirs(DWNLD_FLDR) if not os.path.exists(DWNLD_FLDR) else None
asyncio.run(gather_files(file_list))
if __name__ == "__main__":
main([{'file_name': 'test.txt', 'file_link': 'http://file_url'}, ])
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question