M
M
Max Payne2019-06-25 17:04:41
Python
Max Payne, 2019-06-25 17:04:41

Is it correct to use the Meta class in this case?

I use the Scrapy framework and I have two sites that are parsed using approximately the same algorithm and I want to somewhat abstract and combine their functionality. The problem is that each spider has its own database model, its own data schema for the external service, its own Scrapy ItemLoader, and so on. My idea is to implement some abstract class containing nothing at all

class AbstractSpider(scrapy.Spider):
    name = ''
    allowed_domains = ()
    start_urls = ()

    class Meta:
        model = None
        schema = None
        history_model = None

        _loader = None

    def parse(self, response: scrapy.http.HtmlResponse) -> Any:
        raise NotImplementedError

    def _gen_item(self, response: scrapy.http.HtmlResponse) -> scrapy.Item:
        raise NotImplementedError

    def _add_fields(self, loader: Meta._loader) -> None:
        raise NotImplementedError

Next, implement a common class for both spiders:
class AutoSpider(AbstractSpider, scrapy.Spider):
    class Meta:
        model = models.AutoModel
        schema = schemas.AUTO_SCHEMA
        history_model = models.HistoryAutoModel

        _loader = AutoItemLoader
        full_loader = FullAutoItemLoader
        short_loader = ShortAutoItemLoader

    def parse(self, response: scrapy.http.HtmlResponse) -> Any:
        raise NotImplementedError

    def _gen_item(self, response: scrapy.http.HtmlResponse) -> scrapy.Item:
        ...
        def _add_fields(self, loader: Meta._loader) -> None:
        if isinstance(loader, self.Meta.full_loader):
            self._add_from_full(loader)
        elif isinstance(loader, self.Meta.short_loader):
            self._add_from_short(loader)

    def _add_from_full(self, loader: Meta.full_loader) -> None:
        raise NotImplementedError

    def _add_from_short(self, loader: Meta.short_loader) -> None:
        raise NotImplementedError
...

And then inherit each spider by implementing the Meta class in them with the class that changes the behavior of their common class.
How correct is it to do so? How correct will it be to access Meta not from a spider (for example, from pipelines)? Or how else to increase connectivity and reduce engagement?

Answer the question

In order to leave comments, you need to log in

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question