How can I bypass the limit on requests to the server per second?

A

addison-cochran2020-02-08 20:36:04

Python

addison-cochran, 2020-02-08 20:36:04

I want to spar the entire wall of one VKontakte group. There are many entries.
I learned that there is a wall.get method, but it can only be called 2500 times a day. This is not enough.
Then I found out that from the mobile version, when scrolling down the page, the following request is executed:
POST https://m.vk.com/clubXYZ?offset=35&own=1
I tried it through requests - it works. Removed own=1 - it works.
I also found out that VKontakte loads only 10 posts.
Those. if the group has 70,000 posts, then 7,000 requests will have to be made. Each request is executed in 0.2 s -> 23 minutes (and there are a lot of such groups)
I decided to use streams - it did not help, streams with a proxy did not help, asynchronous requests did not.
I tried asynchronous requests with a proxy, but there is a crutch on a crutch and still nothing works.
What can I do to ensure that VKontakte does not ban my requests?
And how to use a proxy, if necessary?

The code

import random
import asyncio
import aiohttp
import aiohttp_socks
from aiohttp import ClientSession
from aiohttp_socks import SocksConnector
import pickle

storage = []

proxies = ['46.4.96.137:1080', '134.0.116.219:1080', '207.154.231.212:1080', '207.154.231.213:1080', '138.68.161.60:1080', '82.196.11.105:1080', '178.62.193.19:1080', '188.226.141.127:1080', '207.154.231.211:1080', '207.154.231.216:1080', '88.198.50.103:1080', '188.226.141.61:1080', '188.226.141.211:1080', '176.9.119.170:1080', '207.154.231.217:1080', '138.68.161.14:1080', '138.68.165.154:1080', '176.9.75.42:1080', '95.85.36.236:1080', '138.68.173.29:1080', '139.59.169.246:1080']


async def fetch(url, i):
    l = 1
    while l < 10000:
        await asyncio.sleep(random.randint(0, 10))
        proxy = random.choice(proxies)
        # print(proxy)
        try:
            async with ClientSession(connector=SocksConnector.from_url('socks5://' + proxy)) as session:
                async with session.post(url, data={'offset': i}, proxy='http://' + random.choice(proxies)) as response:
                    s = await response.read()
                    l = len(s)
                    print(l)
        except aiohttp.client_exceptions.ServerDisconnectedError:
            await asyncio.sleep(3)
        except aiohttp_socks.proxy.errors.ProxyError:
            await asyncio.sleep(3)
    storage.append(s)
    return s


async def bound_fetch(sem, url, i):
    # Getter function with semaphore.
    async with sem:
        await fetch(url, i)


async def run(r):
    url = 'https://m.vk.com/sketch.books'
    tasks = []
    # create instance of Semaphore
    sem = asyncio.Semaphore(1000)

    # Create client session that will ensure we dont open new connection
    # per each request.
    for i in range(0, r + 1, 10):
        # pass Semaphore and session to every GET request
        task = asyncio.ensure_future(bound_fetch(sem, url, i))
        tasks.append(task)

    responses = asyncio.gather(*tasks)
    await responses


number = 70610
loop = asyncio.get_event_loop()

future = asyncio.ensure_future(run(number))
loop.run_until_complete(future)

print(len(storage))
with open('sketch_books_2.vk', 'wb') as f:
    pickle.dump(storage, f)

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

A

Anton Shamanov, 2020-02-08
@addison-cochran

proxy or parse pages directly