V
V
Vlad_beg2019-07-29 19:55:38
Python
Vlad_beg, 2019-07-29 19:55:38

Why does it give a 504 Gateway Time-out error when parsing multiple pages?

It is necessary to parse several pages from the site, when parsing 1-2 pages everything works fine, if I specify 5 an error occurs
504 Gateway Time-out
and an empty result is returned as a result.
Here is the parser code

script = '''
    function main(splash, args)
      assert(splash:go(args.url))
      assert(splash:wait(5.0))
      treat=require('treat')
      result = {}
      pages = splash:select('.shopee-mini-page-controller__total')

      for i=1,5,1 do
        for j=1,2,1 do
          assert(splash:runjs("window.scrollBy(0, 1300)"))
          assert(splash:wait(1.0))
        end

        result[i]=splash:html()
        assert(splash:runjs('document.querySelector(".shopee-icon-button--right").click()'))
        assert(splash:wait(5.0))
      end
      return treat.as_array(result)
    end
  '''

  def start_requests(self):
    urls = [
        'https://shopee.sg/search?keyword=hdmi'
    ]
    for link in urls:
      yield SplashRequest(url=link, callback=self.parse, endpoint='execute', args={'wait': 1.5, 'lua_source' : self.script}, dont_filter=True)

  def parse(self, response):
    for page in response.data:
      sel = Selector(text=page)

      yield {
        'urls': sel.xpath("//div[contains(@class, 'shopee-search-item-result__item')]//a[*]/@href").getall()
      }

Answer the question

In order to leave comments, you need to log in

1 answer(s)
E
Evgen, 2019-07-30
@Verz1Lka

In general, 504 from splash comes when it does not have time to execute the code (the default maximum execution timeout is 30 seconds).
Read this section, it might help: https://splash.readthedocs.io/en/stable/faq.html#t...

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question