S
S
Shlyahten2016-05-16 16:17:20
Python
Shlyahten, 2016-05-16 16:17:20

How to fix Cyrillic encoding in Python?

The whole Cyrillic alphabet turns into something like this: \xd0\x9f\xd1\x80\xd0\xb8\xd0\xb2\xd0\xb5\xd1\x82
And it is written to the file in the same form.
How can this be fixed?

from chatterbot import ChatBot

# Create a new instance of a ChatBot
bot = ChatBot("Alice",
    storage_adapter="chatterbot.adapters.storage.JsonDatabaseAdapter",
    logic_adapters=[
        "chatterbot.adapters.logic.MathematicalEvaluation",
        "chatterbot.adapters.logic.TimeLogicAdapter",
        "chatterbot.adapters.logic.ClosestMatchAdapter"
    ],
    input_adapter="chatterbot.adapters.input.TerminalAdapter",
    output_adapter="chatterbot.adapters.output.TerminalAdapter",
    database="database.db"
)

bot.train(
    "Привет",
    "Привет)",
    "Как дела?",
    "Отлично)",
    "А у тебя?",
    "Хорошо",
)

print("Type something to begin...")

# The following loop will execute each time the user enters input
while True:
    try:
        # We pass None to this method because the parameter
        # is not used by the TerminalAdapter
        bot_input = bot.get_response(None)

    # Press ctrl-c or ctrl-d on the keyboard to exit
    except (KeyboardInterrupt, EOFError, SystemExit):
        break

Answer the question

In order to leave comments, you need to log in

2 answer(s)
V
Vladimir Kuts, 2016-05-16
@Shlyahten

Specify the encoding in the header in the first line
What version of python?
If the second - then import immediately after specifying the encoding:
Or explicitly specify:

u"Привет",
    u"Привет)",
    u"Как дела?",

S
sim3x, 2016-05-16
@sim3x

In [1]: s = b'\xd0\x9f\xd1\x80\xd0\xb8\xd0\xb2\xd0\xb5\xd1\x82'

In [2]: s
Out[2]: b'\xd0\x9f\xd1\x80\xd0\xb8\xd0\xb2\xd0\xb5\xd1\x82'

In [3]: print(s)
b'\xd0\x9f\xd1\x80\xd0\xb8\xd0\xb2\xd0\xb5\xd1\x82'

In [4]: print(s.decode('utf8'))

you have a problem with understanding encodings
Before you is Cyrillic, in utf8 in a safe ascii form (aka latin1)
You need to find the chatterbot settings and see how to set ensure_ascii=False there

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question