T
T
tispoint2016-03-17 10:03:28
Scrapy
tispoint, 2016-03-17 10:03:28

Why does Scrapy leave some links raw?

Good afternoon.
I was sure that the rule

rules = (Rule(LinkExtractor(allow=('/bedroom-melissa/')), callback='parse_item', follow=True),
    )

calls a function on every page it encounters.
In the result file, I see that 20-30 percent of the expected number of lines have actually been parsed, and in 70-80 percent of cases only the visited link is written to the result file. Like this, approximately:
http://pastelmebel.ru/shop/bedroom-furniture/bedroom-melissa/shk-803/		
  http://pastelmebel.ru/shop/bedroom-furniture/bedroom-melissa/shk-846/		
  http://pastelmebel.ru/shop/bedroom-furniture/bedroom-melissa/shk-843/		
  http://pastelmebel.ru/shop/bedroom-furniture/bedroom-melissa/closet-hq-840-melissa-oak-sonoma/		
Шкаф ШК-845 Мелисса	http://pastelmebel.ru/shop/bedroom-furniture/bedroom-melissa/shk-845/	12 935 руб.	МДФ, Зеркало
  http://pastelmebel.ru/shop/bedroom-furniture/bedroom-melissa/cabinet-shk-826-melissa-oak-sonoma/		
  http://pastelmebel.ru/shop/bedroom-furniture/bedroom-melissa/shk-802/

Answer the question

In order to leave comments, you need to log in

1 answer(s)
D
Dimonchik, 2016-03-17
@dimonchik2013

э... какой файл результата? вы не в NoSQL вставляете?
всегда дергайте текущий URL в items, так будете знать, обрабатывал он страницы или нет
item['url'] = response.request.url

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question