How to parse json from html using python?

J

Jay Marlow2016-03-10 10:25:56

Python

Jay Marlow, 2016-03-10 10:25:56

Good day to all.
There is one code that parses html using requests.get.
Next, from the resulting html, I need to get json ...

<script>
    try{
        try{
            var card = JSON.parse('{"CardPAN":"1234567890123456789","EndDate":"01.01.2016","TicketTypeDesc":"00.04 CardName","CityName":"City","CardSum":99,"Time":"01.09.2016 11:11:01"}');
        }
 ...

from which I need the CardSum parameter.
I tried with the help of beautifulsoup something like soup.find_all('try'))and soup.find_all('var'))so on - an empty value came out everywhere, obviously because I don’t catch up with how soup itself works, although I’m not flipping through the documentation for the first time.
But even if my previous examples were not empty, I would still get a json string.
How to proceed - I do not know.
Perhaps it would be possible to score the result in a variable, let's say card, and do it
card.json()['CardSum'], but this is only a theory.
Hint, please, how to deal with this case?

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

D

Dimonchik, 2016-03-10
@dimonchik2013

decoded = json.loads(html_body)
then put a regular dictionary
in html_body JSON, i.e.

html_body = '{"CardPAN":"1234567890123456789","EndDate":"01.01.2016","TicketTypeDesc":"00.04 CardName","CityName":"City","CardSum":99,"Time":"01.09.2016 11:11:01"}'

V

Vladimir Kuts, 2016-03-10
@fox_12

As I understand it - the problem is basically getting the string itself. There are regulars here.
Something like this:

for line in your_html_body_by_line:
    res = re.match('.*JSON.parse\(\'(.*)\'\).*',line)
    if res:
         print json.loads(res.group(1))['CardPAN']