How to change file encoding in python to UNF-8-BOM without \ufeff?

D

Dmitry Prilepsky2021-07-19 01:13:42

Python

Dmitry Prilepsky, 2021-07-19 01:13:42

I am writing a script that will automatically translate the game. All localization of the game is in a file with utf-8-bom encoding. I'm translating and writing the translation to a file, which should also be utf-8-bom, but python insists on making it plain utf-8, to convert it to bom I created a function like this:

def encod_utf8_bom(self, path_on_file: str):
        file = open(path_on_file, encoding='utf-8', mode='r')
        encoding_file = [line.encode('utf-8-sig') for line in file]
        file.close()
        file = open(path_on_file, 'wb')
        [file.write(line) for line in encoding_file]
        file.close()

But she puts \ufeff in front (displayed as a dot in front)

And as a result, the translation does not work. However, if I make the translation to utf-8 and change the encoding to utf-8-bom through Notepad ++, the translation will work (and is displayed without dots). How can I do the same but in python?

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

J

javedimka, 2021-07-19
@HartX

\ufeff

This is bom.
It is not necessary to encode every line, but the entire data. I would open at least Wikipedia, for decency, to understand what you are working with:

According to the Unicode specification, a marker can only appear at the very beginning of a file or stream.

import shutil

def encode_utf8_bom(self, path_on_file: str):
    with open(path_on_file, encoding="utf-8") as f_in, open(path_on_file+".tmp", encoding="utf-8-sig", mode="w") as f_out:
        f_out.write(f_in.read())
        shutil.move(path_on_file + ".tmp", path_on_file)