Answer the question
In order to leave comments, you need to log in
Why does the parser return 403 even after specifying Cookie and User-Agent?
I tried to write a parser to upload pictures to myself from artstation.com, I took a random profile , almost all the content is loaded there with json, I found a GET request , it opens normally in the browser, and through requests.get it gives 403. In Google, everyone advises you to specify the User header -Agent and Cookie, used requests.sessions and specified a User-Agent, but still the picture is the same, WHAT?
import requests
url = 'https://www.artstation.com/users/kuvshinov_ilya'
json_url = 'https://www.artstation.com/users/kuvshinov_ilya/projects.json?page=1'
header = {'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:32.0) Gecko/20100101 Firefox/32.0',}
session = requests.Session()
r = session.get(url, headers=header)
json_r = session.get(json_url, headers=header)
print(json_r)
> Response [403]
Answer the question
In order to leave comments, you need to log in
The fault of the 403 code is cloudflare. cfscrape
helped me to bypass
def get_session():
session = requests.Session()
session.headers = {
'Host':'www.artstation.com',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0',
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language':'ru,en-US;q=0.5',
'Accept-Encoding':'gzip, deflate, br',
'DNT':'1',
'Connection':'keep-alive',
'Upgrade-Insecure-Requests':'1',
'Pragma':'no-cache',
'Cache-Control':'no-cache'}
return cfscrape.create_scraper(sess=session)
session = get_session() # Дальше работать как с обычной requests.Session
import requests
import cfscrape
def get_session():
session = requests.Session()
session.headers = {
'Host':'www.artstation.com',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0',
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language':'ru,en-US;q=0.5',
'Accept-Encoding':'gzip, deflate, br',
'DNT':'1',
'Connection':'keep-alive',
'Upgrade-Insecure-Requests':'1',
'Pragma':'no-cache',
'Cache-Control':'no-cache'}
return cfscrape.create_scraper(sess=session)
def artstation():
url = 'https://www.artstation.com/kyuyongeom'
page_url = 'https://www.artstation.com/users/kyuyongeom/projects.json'
post_pattern = 'https://www.artstation.com/projects/{}.json'
session = get_session()
absolute_links = []
response = session.get(page_url, params={'page':1}).json()
pages, modulo = divmod(response['total_count'], 50)
if modulo: pages += 1
for page in range(1, pages+1):
if page != 1:
response = session.get(page_url, params={'page':page}).json()
for post in response['data']:
shortcode = post['permalink'].split('/')[-1]
inner_resp = session.get(post_pattern.format(shortcode)).json()
for img in inner_resp['assets']:
if img['asset_type'] == 'image':
absolute_links.append(img['image_url'])
with open('links.txt', 'w') as file:
file.write('\n'.join(absolute_links))
if __name__ == '__main__':
artstation()
It makes sense to specify all the fields from the Header, not just the User-Agent
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question