R
R
rkfddf2020-06-03 10:05:21
Python
rkfddf, 2020-06-03 10:05:21

How to get human readable characters in UTF-8 encoding?

Here's the code

with open ('grace.csv', 'a', 'utf-8') as csv_file:
                writer  = csv.writer(csv_file, delimiter=';')
                writer.writerow([name_title,  city, price, descrip])

I write Cyrillic to csv file. It turns out really in utf-8 'encoding, only it is not human-readable - it turns out something like Продам участок - 35.
Is there a way to solve this problem?

Answer the question

In order to leave comments, you need to log in

4 answer(s)
#
# ., 2020-06-03
@rkfddf

On the first line of code write: # -*- coding: utf-8 -*-

G
galaxy, 2020-06-03
@galaxy

it turns out something like Продам участок - 35

Do you read Excel?
He needs to specify the encoding when importing csv

S
Sergey Pankov, 2020-06-03
@trapwalker

Продам участок - 35

it looks like you are looking at utf-8 through a misconfigured viewer that is trying to read the text as if it were in 1251 encoding.
Let's explain. Unicode is not an encoding, it's just a nomenclature for a wide variety of characters. There are much more of these characters than 256, namely, so many (256) values ​​can be written in one byte.
Let me remind you. that in our computers everything is stored in the form of bytes and their sequences. But one byte is too small for calculations, so they were combined into "words". In short, at first the registers in the processor were 2 bytes (16 bits), then 4 bytes (32 bits), now the 64-bit architecture is gaining ground everywhere. Roughly speaking, this means that in one operation the processor processes these 4 or 8 bytes. But in what order do these bytes go in such a group? Here, too, there is a difference on different platforms and there is generally darkness. But somehow we live with it.
So, you need to write characters in bytes, which are very numerous. It is necessary that these bytes can be transmitted over the network to another computer, possibly with a different architecture and a different byte order in the "word". Many do not need all the symbols at all. For some, only English letters/numbers and some characters are enough - such a set does not exceed 128 characters (ASCII encoding).
What is an encoding? This is a way to specify a specific character with one or more bytes. Encodings have different sets of these characters. Text is a sequence of characters. In a specific encoding, it is specified by a sequence of bytes. To read the text, you need to know what encoding it is in, take bytes (knowing the encoding, we understand which characters correspond to them) and draw characters.
Historically, in windows in the Russian locale, two different single-byte encodings are used at once: cp1251 and cp866. The first half of the 256 characters in them is the same, as in ASCII, and in the second, Russian letters are in completely different places. Text from one encoding to another can be unambiguously translated, but you need to change the numbers according to a special table.
And then people wanted very different symbols, all sorts of hieroglyphs and pictograms. All this will not fit in a byte, and multi-byte encodings have appeared. ASCII characters are encoded there with one byte, often used by two, three, rare, and may require more bytes. There is a complex algorithm that tells how to turn a Unicode character with number such and such into a set of bytes. For each encoding, this algorithm is different. For single-byte encodings, it is much simpler.
The regularly alternating characters in your example string are those second character bytes. Note that ascii characters are represented as one byte. It's utf-8, which is looked at as 1251 or some other single-byte encoding.
That is, they try to represent each byte as a separate character.
Eventually.
You have a file in UTF-8. Open it with the right editor with the right settings and you will see everything correctly. Or encode the file in cp1251 when writing, then in this way it will open readably for you.

C
cactuss, 2014-06-04
@cactuss

<svg width="640" height="480" xmlns="http://www.w3.org/2000/svg" xmlns:svg="http://www.w3.org/2000/svg">
 <g>
  <path id="svg_7" d="m166.18999,120.48652c-67.97241,166.12807 -53.12786,130.22753 -53.64875,130.2275c-0.5209,-0.00003 320.85067,-4.92749 320.32973,-4.92752c-0.5209,-0.00003 72.39981,-140.7865 71.87888,-140.7865c-0.5209,0 -338.55986,15.48652 -338.55986,15.48652z" stroke-width="2" stroke="#000000" fill="none"/>
 </g>
</svg>

here on svg

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question