Answer the question
In order to leave comments, you need to log in
How to fix unicode json in python?
To put it simply, I need to get information. Unfortunately, I don't know if I'm moving in the right direction. Therefore, instead of asking how to recode this page if decode("utf-8) does not work. I'd rather ask directly how to get data that can be parsed from this page .
Interested in a python solution, here's what I could do. In advance thank.
# -*- coding: utf-8 -*- ?
import requests
import json
def GetJSON(word):
url = "https://ru.wiktionary.org//w/api.php?action=query&titles=%s&prop=revisions&rvprop=content&format=json"
url = url%word
answ = requests.get(url).text
data = json.load(answ)
return data
print(GetJSON("кот"))
https://ru.wiktionary.org/w/api.php?action=query&titles=%D0%BA%D0%BE%D1%82&prop=revisions&rvprop=content&format=json
# coding: utf-8
import requests
def WikiSearch(word):
req = requests.get('https://ru.wiktionary.org/w/api.php?action=query&titles=%s&prop=revisions&rvprop=content&format=json' % word)
req = req.json()["query"]["pages"]
for key in req:
if key == "-1": return None #404 page not found
req = str(req[key]["revisions"])
a = req.find("слогам")+6
req = req[a:a+req[a:].find("}")]
req = req.replace("\u0301","'")
req = req.replace("|","")
return req
print(WikiSearch(input()))
Answer the question
In order to leave comments, you need to log in
Unfortunately, there is no example with the url, but there is a code for loading json from a file, UTF-8 encoding, and the contents of the file in Russian. Here:
def LoadJson(filname):
if os.path.isfile(filname) is False:
return None
data_file = open(filname, "r")
str = data_file.read()
data_file.close()
data = json.loads(str)
str = json.dumps(data, ensure_ascii=False, indent=4)
return json.loads(str)
I managed this way:
import json
import requests
ace_data = requests.get('https://api.aceхххх.хх/хххххххххххххххх_api_key').content.decode('unicode-escape', 'ignore')
ace_json = json.dumps(ace_data)
ace_json_load = json.loads(ace_json)
print(ace_json_load)
requests.get('url')
.content.decode('unicode-escape', 'ignore')
ace_data = requests.get('https://api.aceхххх.хх/хххххххххххххххх_api_key').content.decode('unicode-escape', 'ignore')
.content.decode('unicode-escape', 'ignore')
\u04e7
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question