Answer the question
In order to leave comments, you need to log in
Why does it throw an error when using find()?
Source:
# -*- coding: utf-8 -*-
import pymongo as pymongo
from dictionary import dictionary
class DiplomaPipeline(object):
collection_name = 'DiplomaItem'
arr = ['да', 'нет']
def __init__(self, mongo_uri, mongo_db):
self.mongo_uri = mongo_uri
self.mongo_db = mongo_db
@classmethod
def from_crawler(cls, crawler):
return cls(
mongo_uri=crawler.settings.get('MONGO_URI'),
mongo_db=crawler.settings.get('MONGO_DATABASE')
)
def open_spider(self, spider):
## initializing spider
## opening db connection
self.client = pymongo.MongoClient(self.mongo_uri)
self.db = self.client[self.mongo_db]
def close_spider(self, spider):
## clean up when spider is closed
self.client.close()
def process_item(self, item, spider):
## how to handle each post
print('~~~~~~~~~~!!!!', )
for word in self.arr:
print(word)
print(item['Comment'])
print(item['Comment'].find(word))
# self.db[self.collection_name].insert(dict(item))
# logging.debug("Post added to MongoDB")
return item
да
Вообще это не смешно, а практично, заменить двух мальеньких нигеров на два вместительных бака.
2018-04-06 15:27:06 [scrapy.core.scraper] ERROR: Error processing {'Comment': u'\u0412\u043e\u043e\u0431\u0449\u0435 \u044d\u0442\u043e \u043d\u0435 \u0441\u043c\u0435\u0448\u043d\u043e, \u0430 \u043f\u0440\u0430\u043a\u0442\u0438\u0447\u043d\u043e, \u0437\u0430\u043c\u0435\u043d\u0438\u0442\u044c \u0434\u0432\u0443\u0445 \u043c\u0430\u043b\u044c\u0435\u043d\u044c\u043a\u0438\u0445 \u043d\u0438\u0433\u0435\u0440\u043e\u0432 \u043d\u0430 \u0434\u0432\u0430 \u0432\u043c\u0435\u0441\u0442\u0438\u0442\u0435\u043b\u044c\u043d\u044b\u0445 \u0431\u0430\u043a\u0430.',
'MainPageUrl': u'https://pikabu.ru/story/bezyiskhodnost_5826272'}
Traceback (most recent call last):
File "/Library/Python/2.7/site-packages/twisted/internet/defer.py", line 653, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/Users/mymac/Work/crawler/diploma/diploma/pipelines.py", line 36, in process_item
print(item['Comment'].find(word))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128)
Answer the question
In order to leave comments, you need to log in
As is usually the case, such errors are due to a poor understanding of the language. The bottom line is that I was trying to compare data of two different types: string and unicode string. Once I understand this, it remains to figure out how to convert from a unicode string to a string. Did the following: item['Comment'].encode('utf-8')
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question