Answer the question
In order to leave comments, you need to log in
Why does asyncio hang with big data?
async def run(r):
tasks = []
sem = asyncio.Semaphore(1000)
async with ClientSession() as session:
for url in r:
task = asyncio.ensure_future(bound_fetch(sem, url, session))
tasks.append(task)
responses = await asyncio.gather(*tasks)
with open('0.txt') as f:
urls = f.read().splitlines()
que = []
for url in urls:
que.append(url)
if len(que) == 5000:
loop = asyncio.get_event_loop()
future = asyncio.ensure_future(run(que))
loop.run_until_complete(future)
que = []
loop = asyncio.get_event_loop()
future = asyncio.ensure_future(run(que))
loop.run_until_complete(future)
Answer the question
In order to leave comments, you need to log in
You're loading the entire million addresses into memory, twice.
The first time you do f.read(), then a copy is created within .splitlines() (split into line pieces).
Well, yes, a million individual tasks is also shit. asyncio after all, you need to check whether this or that task can continue to work.
I would make a fixed size task-worker pool, and have each worker in the loop do an f.readline() on its own to get the url to load. And you don’t need to store the entire list in memory, and the control over the number of tasks is better.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question