Answer the question
In order to leave comments, you need to log in
Why does it give a 504 Gateway Time-out error when parsing multiple pages?
It is necessary to parse several pages from the site, when parsing 1-2 pages everything works fine, if I specify 5 an error occurs 504 Gateway Time-out
and an empty result is returned as a result.
Here is the parser code
script = '''
function main(splash, args)
assert(splash:go(args.url))
assert(splash:wait(5.0))
treat=require('treat')
result = {}
pages = splash:select('.shopee-mini-page-controller__total')
for i=1,5,1 do
for j=1,2,1 do
assert(splash:runjs("window.scrollBy(0, 1300)"))
assert(splash:wait(1.0))
end
result[i]=splash:html()
assert(splash:runjs('document.querySelector(".shopee-icon-button--right").click()'))
assert(splash:wait(5.0))
end
return treat.as_array(result)
end
'''
def start_requests(self):
urls = [
'https://shopee.sg/search?keyword=hdmi'
]
for link in urls:
yield SplashRequest(url=link, callback=self.parse, endpoint='execute', args={'wait': 1.5, 'lua_source' : self.script}, dont_filter=True)
def parse(self, response):
for page in response.data:
sel = Selector(text=page)
yield {
'urls': sel.xpath("//div[contains(@class, 'shopee-search-item-result__item')]//a[*]/@href").getall()
}
Answer the question
In order to leave comments, you need to log in
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question