Answer the question
In order to leave comments, you need to log in
Protecting social networks or how to bypass parsing blocking?
Hello, I am writing a scientific work for the institute on the topic "Imperfect social networks" the topic of how social networks poorly protect our personal information, to confirm my words, I wrote light parsers to collect information from pages (so as not to be just verbiage) for many social networks including vk, twitter, instagram and for some dating forums, everything above gave up with the first efforts and gave all their data, but facebook doesn’t work at all, after 30 requests it bans not just an IP address, but completely cuts off the account, although I use selenium along with page scrolling, ip substitution, user-agent and imitation of mouse movements. Tell me how to overcome this giant.
PS Google articles already read
Answer the question
In order to leave comments, you need to log in
You need to parse FB on powerful virtual machines, on each virtual machine there are 5-10 running Chrome, no selenium. You rewrite scripts on Userscript. A lot of IP is achieved by purchasing a premium-proxy-list. The same type of search is useless. It is necessary to look at the photo, read the comments, observe random delays. Like occasionally. Don't forget to emulate different screen resolutions and different window sizes.
You failed the work
Because the stated thesis is not confirmed by practice.
In addition to what has already been said:
1. You need to parse not by searching through the links in the list, but by "deep" linked links.
After, sort them and measure the progress of the parsed data in the desired list - already at home.
2. Parsing profile: each social network account has its own User-Agent (mobile!), which must be constant and no more than 5 different ips of one city per hour, no more than 20-30 different ips of one city (or region) /country, which is worse) per day: i.e. they should be repeated as when moving around the same city with a mobile phone in strict order along the "chain" and the duration (time interval) of their use.
3. Parse only displayed links, not what is in the page code.
4. Follow the same timings as with manual navigation.
5. Completely and correctly! simulate all user text input in fields and link navigation using a touch: respect the percentage of erroneous touches as in normal use.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question