Where to insert a Python timer to make it work?

D

Daria2019-11-15 17:16:43

Python

Daria, 2019-11-15 17:16:43

I think I found a solution with getting the desired html page. Previously, scarpi was getting empty html.
Those. now, before collecting data from the page, scrapie will wait 5 seconds (during this time, the JS code will have time to request the required html) and the necessary information will be collected.
Hepnite how to be with the insertion of the timer pliz.

# -*- coding: utf-8 -*-
import scrapy
from threading import Timer


class ExampleSpider(scrapy.Spider):
    name = 'bc'
    start_urls = [
        'https://www.greatcircus.ru/',
    ]

    def parse(self, response):
        for ticket in response.css('.col-xs-12 schedule-main-tickets-container'):
            event_name = ticket.css('.schedule-main-tickets-show-title::text').extract(),
            place = ticket.css('.schedule-main-tickets-location::text').extract(),
            url = ticket.css('.text-center a::text').extract(),
            yield {
                'event_name': event_name,
                'place': place,
                'url': url,
            }

    t = Timer(5.0, parse)
    t.start()

Now there is an error:

Exception in thread Thread-1:
Traceback (most recent call last):
  File "C:\programs for work\lib\threading.py", line 917, in _bootstrap_inner
    self.run()
  File "C:\programs for work\lib\threading.py", line 1166, in run
    self.function(*self.args, **self.kwargs)
TypeError: parse() missing 2 required positional arguments: 'self' and 'response'

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

_

_, 2019-11-15
@darina46722

You correctly wrote in the comment that it is useless to wait here - scrapie will not execute js.
According to the code - why do you need a timer from threading, it seems that sleep (5) is enough for your task. But it still won't help if your html is changed by js after loading - you need selenium.