A
A
Alexander Mamaev2016-06-06 20:00:34
Python
Alexander Mamaev, 2016-06-06 20:00:34

How to fix unicode json in python?

To put it simply, I need to get information. Unfortunately, I don't know if I'm moving in the right direction. Therefore, instead of asking how to recode this page if decode("utf-8) does not work. I'd rather ask directly how to get data that can be parsed from this page .
Interested in a python solution, here's what I could do. In advance thank.

# -*- coding: utf-8 -*- ?
import requests
import json

def GetJSON(word):
  url = "https://ru.wiktionary.org//w/api.php?action=query&titles=%s&prop=revisions&rvprop=content&format=json"
  url = url%word
  answ = requests.get(url).text
  data = json.load(answ)
  return data
print(GetJSON("кот"))

PS hyperlink not working - for some reason, just copy this
https://ru.wiktionary.org/w/api.php?action=query&titles=%D0%BA%D0%BE%D1%82&prop=revisions&rvprop=content&format=json

Decided!
Decision:
# coding: utf-8
import requests
def WikiSearch(word):
  req = requests.get('https://ru.wiktionary.org/w/api.php?action=query&titles=%s&prop=revisions&rvprop=content&format=json' % word)
  req = req.json()["query"]["pages"]
  
  for key in req: 
    if key == "-1": return None #404 page not found
    req = str(req[key]["revisions"])
  a = req.find("слогам")+6
  req = req[a:a+req[a:].find("}")]
  req = req.replace("\u0301","'")
  req = req.replace("|","")
  return req
print(WikiSearch(input()))

The program displays the word with an accent.

Answer the question

In order to leave comments, you need to log in

3 answer(s)
G
GavriKos, 2016-06-06
@virtual_universe

Unfortunately, there is no example with the url, but there is a code for loading json from a file, UTF-8 encoding, and the contents of the file in Russian. Here:

def LoadJson(filname):
    if os.path.isfile(filname) is False:
        return None
    data_file = open(filname, "r")
    str = data_file.read()
    data_file.close()
    data = json.loads(str)
    str = json.dumps(data, ensure_ascii=False, indent=4)
    return json.loads(str)

I think the last three lines are what you need to pay attention to. After such shamanism, everything worked. Perhaps it can be somehow simpler, I did not really understand.

D
Dimonchik, 2016-06-06
@dimonchik2013

stackoverflow.com/questions/4004431/text-with-unic...

V
Valdemar Smorman, 2020-01-03
@smorman

I managed this way:

Fixing unicode JSON in python
import json
import requests

ace_data = requests.get('https://api.aceхххх.хх/хххххххххххххххх_api_key').content.decode('unicode-escape', 'ignore')
ace_json = json.dumps(ace_data)
ace_json_load = json.loads(ace_json)
print(ace_json_load)

Those. k: add: and get a great Cyrillic output, as expected! And if you remove from:
requests.get('url')
.content.decode('unicode-escape', 'ignore')
requests.get
ace_data = requests.get('https://api.aceхххх.хх/хххххххххххххххх_api_key').content.decode('unicode-escape', 'ignore')
.content.decode('unicode-escape', 'ignore')
then naturally there will be a Cyrillic output in full g ..., like:
\u04e7
although utf-8

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question