Web scraping Python, which is better?

T

TchernyavskyDaniil2018-04-25 06:46:57

Python

TchernyavskyDaniil, 2018-04-25 06:46:57

Good morning.
There is a task to make parsers for certain sites. Everything is desirable without API
Some of these sites are Linkedin, Instagram.
I have a rather unusual (probably) question. I am completely new to this business, I decided to do it with the help of libraries: BS4, fake_user, Scrapy, Selenium, Requests. Actually, when I got to Instagram, I ran into a problem that the page is loaded dynamically, that is:
The Instagram photo (namely, HTML) is loaded in pieces, if we open the profile - it will be 12, scroll down, there will not be all the number, but only part. As I understand it, with the help of Ajax, well, the fact that Insta itself is based on React, Wirth DOM (if I'm wrong, sorry, please correct me). I found a way out in the face of Selenium.
What am I doing:
I launch the program, Selenium enters the browser, logs in, selects the desired user (which I want to scrape) and using the prescribed script Scroll down in JS, actually scroll down :) I collect photos, hash tags for the photo, basic page information, just like and Photo, open the list of subscribers, scroll through a certain number and collect a list of them. I upload everything to Excel / csv, I just upload the photo to a folder.
Can you please tell me the easiest way? I understand, maybe I'm doing it AT ALL wrong, but I have no experience.
With Linkedin, the situation is about the same. (Selenium - Authorization - Google 'Python Developer' - I take a certain amount - I open each one, take the info and add it).
Next in line is Twitter, but I'll probably use the API :)

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

L

l1l1l1, 2018-04-25
@l1l1l1

Use the API - there is nothing complicated there, but on the contrary, you will greatly simplify your work.

F

Fixid, 2018-04-25
@Fixid

In general, the approach is correct and working, but working through the API is always easier and more convenient.
As an additional bonus, you get a disguise as a normal user.