Answer the question
In order to leave comments, you need to log in
Web scraping Python, which is better?
Good morning.
There is a task to make parsers for certain sites. Everything is desirable without API
Some of these sites are Linkedin, Instagram.
I have a rather unusual (probably) question. I am completely new to this business, I decided to do it with the help of libraries: BS4, fake_user, Scrapy, Selenium, Requests. Actually, when I got to Instagram, I ran into a problem that the page is loaded dynamically, that is:
The Instagram photo (namely, HTML) is loaded in pieces, if we open the profile - it will be 12, scroll down, there will not be all the number, but only part. As I understand it, with the help of Ajax, well, the fact that Insta itself is based on React, Wirth DOM (if I'm wrong, sorry, please correct me). I found a way out in the face of Selenium.
What am I doing:
I launch the program, Selenium enters the browser, logs in, selects the desired user (which I want to scrape) and using the prescribed script Scroll down in JS, actually scroll down :) I collect photos, hash tags for the photo, basic page information, just like and Photo, open the list of subscribers, scroll through a certain number and collect a list of them. I upload everything to Excel / csv, I just upload the photo to a folder.
Can you please tell me the easiest way? I understand, maybe I'm doing it AT ALL wrong, but I have no experience.
With Linkedin, the situation is about the same. (Selenium - Authorization - Google 'Python Developer' - I take a certain amount - I open each one, take the info and add it).
Next in line is Twitter, but I'll probably use the API :)
Answer the question
In order to leave comments, you need to log in
Use the API - there is nothing complicated there, but on the contrary, you will greatly simplify your work.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question