D
D
D55RUS2020-08-22 16:31:45
Python
D55RUS, 2020-08-22 16:31:45

How to extract json from html?

The code:

<script type="application/json" id="requirejs.config">
    {some_json}
  </script>

I don’t understand how to get {some_json} out of here, help someone fumbling please

Answer the question

In order to leave comments, you need to log in

2 answer(s)
T
Ternick, 2020-08-22
@D55RUS

You need to use a library that is friendly with html.
For example beautifulsoup in a snake.
It can get any text from any tag, even script.

S
Sergey Karbivnichy, 2020-08-22
@hottabxp

Decision:

script = soup.find('script',id='requirejs.config').string

the script will have everything in between
<script type="application/json" id="requirejs.config">
and you can do it like this:
</script>
import json
from bs4 import BeautifulSoup

html = '''
<!DOCTYPE html>
  <body>
<script type="application/json" id="requirejs.config">
{
  "name": "John",
  "age": 30,
  "isAdmin": false,
  "courses": ["html", "css", "js"],
  "wife": null
}
  </script>
  </body>
</html>
'''
soup = BeautifulSoup(html,"html.parser")

script = soup.find('script',type="application/json")
my_json = str(script)[55:-12]
print(json.loads(my_json)['name'])

Conclusion:
John
The most interesting thing here is the line "my_json = str(script)[55:-12]":
55 - delete from the very beginning
<script type="application/json" id="requirejs.config">
-12 - delete from the end In my_json we will have everything that is inside the script tag. </script>

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question