G
G
gadzhi152016-07-17 12:04:18
Python
gadzhi15, 2016-07-17 12:04:18

Scrapy in Python3. Where is the mistake?

Created a spider

# /usr/bin/env python3.5

from quoka.items import QuokaItem
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from scrapy.loader.processors import TakeFirst
from scrapy.loader import XPathItemLoader
from scrapy.selector import HtmlXPathSelector

class QuokaLoader(XPathItemLoader):
    default_output_processor = TakeFirst()


class QuokaSpider(CrawlSpider):

    name = "quoka"
    allowed_domains = ["quoka.de"]
    start_urls = ["http://www.quoka.de/immobilien/bueros-gewerbeflaechen/"]

    rules = (
             Rule(LinkExtractor(allow=('kleinanzeigen/cat_27_2710_ct_0_page_')), follow=True),
             Rule(LinkExtractor(allow=('immobilien/bueros-gewerbeflaechen/')), callback='parse_item'),
             )

    def parse_item(self, response):
        hxs = HtmlXPathSelector(response)
        l = QuokaLoader(QuokaItem(), hxs)

        #
        l.add_xpath('date',response.xpath("/html/body/div[3]/div[2]/div[1]/main/div[8]/div/div[2]/strong/span/text()").extract())
        l.add_xpath('cost',response.xpath("/html/body/div[3]/div[2]/div[1]/main/div[8]/div/div[3]/div[2]/div[2]/text()").extract())
        l.add_value('url', response.url)

        return l.load_item()

items file:
# -*- coding: utf-8 -*-


import scrapy
from scrapy.item import Item, Field


class QuokaItem(Item):
    name = Field()
    date = Field()
    cost = Field()
    length = Field()
    description = Field()

When running the spider, display the following message:
quoka_spider.py:10: ScrapyDeprecationWarning: __main__.QuokaLoader inherits from deprecated class scrapy.loader.XPathItemLoader, please inherit from scrapy.loader.ItemLoader. (warning only on first subclass, there may be others)
  class QuokaLoader(XPathItemLoader):

What's the snag?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
Данил Бирюков-Романов, 2016-07-17
@ gadzhi15

Это не ошибка - это Warning.
Этот варнинг сообщает о том, что класс scrapy.loader.XPathItemLoader объявлен Deprecated и стоит использовать вместо него scrapy.loader.ItemLoader
Что это значит?
Это значит, что разработчики решили, что scrapy.loader.XPathItemLoader стоит выкинуть и вместо него сделать новый scrapy.loader.ItemLoader. Чтобы не ломать существующее ПО, они добавили Warning при запуске. То есть все работает как раньше, но в лог пишется - обновись и все.
Что можно сделать?
Поступить правильно - использовать scrapy.loader.ItemLoader
Поступить неправильно - забить. Работает же

G
gadzhi15, 2016-07-17
@gadzhi15

In order not to produce a new topic:
I run scrapy crawl quoka_spider.py from the console Output the
following:

/home/gadzhibala/PycharmProjects/test_project/quoka/quoka/spiders/quoka_spider.py:11: ScrapyDeprecationWarning: quoka.spiders.quoka_spider.quokaLoader inherits from deprecated class scrapy.loader.XPathItemLoader, please inherit from scrapy.loader.ItemLoader. (warning only on first subclass, there may be others)
  class quokaLoader(XPathItemLoader):
2016-07-17 12:30:11 [scrapy] INFO: Scrapy 1.1.1 started (bot: quoka)
2016-07-17 12:30:11 [scrapy] INFO: Overridden settings: {'SPIDER_MODULES': ['quoka.spiders'], 'BOT_NAME': 'quoka', 'ROBOTSTXT_OBEY': True, 'NEWSPIDER_MODULE': 'quoka.spiders'}
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/scrapy/spiderloader.py", line 41, in load
    return self._spiders[spider_name]
KeyError: 'quoka_spider.py'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/scrapy", line 11, in <module>
    sys.exit(execute())
  File "/usr/local/lib/python3.5/dist-packages/scrapy/cmdline.py", line 142, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "/usr/local/lib/python3.5/dist-packages/scrapy/cmdline.py", line 88, in _run_print_help
    func(*a, **kw)
  File "/usr/local/lib/python3.5/dist-packages/scrapy/cmdline.py", line 149, in _run_command
    cmd.run(args, opts)
  File "/usr/local/lib/python3.5/dist-packages/scrapy/commands/crawl.py", line 57, in run
    self.crawler_process.crawl(spname, **opts.spargs)
  File "/usr/local/lib/python3.5/dist-packages/scrapy/crawler.py", line 162, in crawl
    crawler = self.create_crawler(crawler_or_spidercls)
  File "/usr/local/lib/python3.5/dist-packages/scrapy/crawler.py", line 190, in create_crawler
    return self._create_crawler(crawler_or_spidercls)
  File "/usr/local/lib/python3.5/dist-packages/scrapy/crawler.py", line 194, in _create_crawler
    spidercls = self.spider_loader.load(spidercls)
  File "/usr/local/lib/python3.5/dist-packages/scrapy/spiderloader.py", line 43, in load
    raise KeyError("Spider not found: {}".format(spider_name))
KeyError: 'Spider not found: quoka_spider.py'

As I understand it, he does not like something in the file name.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question