D
D
Denis2020-07-25 20:55:37
Python
Denis, 2020-07-25 20:55:37

How to replace all hieroglyphs in a python string?

Good afternoon, habr users. Faced such a problem, using vk_api you need to upload a .csv document to a bot message, for this you have to upload the file first to the server, but VK swears and does not skip the request to add a file that contains Asian characters and spits the

'charmap' codec can' error t decode byte 0x98 in position 254: character maps to

with open(FILE,'r',encoding='utf-8',newline='') as file:
                    reader = csv.reader(file, delimiter = ",")
                    data = list(reader)
                    row_count = len(data) - 1 
                print('Opening file!')
                document = open(FILE, 'r') # ФАЙЛ 
                print('Uploading file!')
                document_url = vk_session.method("docs.getMessagesUploadServer", {"type": "doc", "peer_id": userid}) #ЗАГРУЗКА ФАЙЛА НА СЕРВЕР
                print('Post!')
                try:
                    document_post = requests.post(document_url["upload_url"], files={"file":document}).json() #POST
                except Exception as exc:
                    print(f'{exc}')
                print('Saving file!')
                document_save = vk_session.method("docs.save", {"file": document_post["file"], "title":f"Search_{userid}"})#СОХРАНЕНИЕ ФАЙЛА
                document = document_save.get('doc')
                document_url = document['url']
                document_url = document_url[:document_url.find('?')] 
                #ОТПРАВКА СООБЩЕНИЯ
                print('Sending message')
                vk_session.method('messages.send', {'user_id': userid, "message":f'По вашему запросу найдено езультатов: {row_count} \n Просмотреть все в формате таблицы скачав документ по ссылке: {document_url}', 'random_id': 0, })


Is there a way to replace all hieroglyphs with '?' for example?
Thank you!

Answer the question

In order to leave comments, you need to log in

3 answer(s)
A
Alexander Pikeev, 2020-07-25
@Baryon

import re

file = re.sub('Иероглиф', '?')

S
soremix, 2020-07-25
@SoreMix

Open by specifying the file encoding, as you did in the first line

A
Andrey Gladchenko, 2021-01-29
@AndreyGlad

Found a solution:

cjk_ranges = [
        ( 0x4E00,  0x62FF),
        ( 0x6300,  0x77FF),
        ( 0x7800,  0x8CFF),
        ( 0x8D00,  0x9FCC),
        ( 0x3400,  0x4DB5),
        (0x20000, 0x215FF),
        (0x21600, 0x230FF),
        (0x23100, 0x245FF),
        (0x24600, 0x260FF),
        (0x26100, 0x275FF),
        (0x27600, 0x290FF),
        (0x29100, 0x2A6DF),
        (0x2A700, 0x2B734),
        (0x2B740, 0x2B81D),
        (0x2B820, 0x2CEAF),
        (0x2CEB0, 0x2EBEF),
        (0x2F800, 0x2FA1F)
    ]

def is_cjk(char):
    char = ord(char)
    for bottom, top in cjk_ranges:
        if char >= bottom and char <= top:
            return True
    return False

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question