A
A
Artem2019-03-17 00:37:10
Python
Artem, 2019-03-17 00:37:10

How to extract JSON object from script tag content?

I feel like I'm wandering around the solution, but I can't find it. Help, please: when parsing the site, I noticed that the information I needed is contained in the script tag of the following content (shortened to fit into the character limit. The full version of the tag content can be found here ):

<script>
            var __RELAY_BOOTSTRAP__ = "";
          </script>

I want to extract from it the enclosed JSON object (in ), for which I removed " var __RELAY_BOOTSTRAP__ = " and ";" , tried to remove character escaping via scr.replace('\\', ''), and tried to load the resulting string via json.loads(scr). But I keep running into errors like json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1 column 84 (char 83). That is, if I understand correctly, the problem is in the correct "normalization" of the string for further conversion into a JSON object. And it looks like I have a problem with that.
Maybe there is some general approach to solving this problem?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
D
dollar, 2019-03-17
@Malodar

\\\"
This is generally similar to double coding.
Decode manually first. Yes, at least in the browser console. The advantage of this approach is that you see what you have at each stage. Here I got two more JSONs (and if you look closely, everything is more complicated):
Then each one needs to be decoded again. Good luck.

A
Artem, 2019-03-17
@Malodar

ah, sort of figured it out :)
used json.loads(json.loads(text))
Thanks for the double encoding tip!

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question