J
J
Jay Marlow2016-10-05 22:47:11
Python
Jay Marlow, 2016-10-05 22:47:11

How to properly parse data from bs4 ResultSet?

Good day.
Using the following line I get a ResultSet:

abc = soup.findAll('script', text = re.compile('Data'))

The resulting ResultSet itself:
[<script type="text/javascript">
data = {"url":"haha.com", "id":"12345", "name":"haha",};
... function() {abc.devg....})'
...

From all this, the goal is to extract the parameters in data, namely, let's say the url and id values.
There are no ideas how to do it. Tried various parsing options with soup and what is given above is the closest to the desired option.
I would be grateful for any advice.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
Y
Yuri, 2016-10-06
@kolumbou

I did not have time to test it 100%, the laptop was discharged. I'm writing from my phone.

import re
import json
from ast import literal_eval

pattern = re.compile(r'data[= ]+(?P<dict>.*);')
raw = pattern.search(s).groupdict()

# if валидый JSON
j = json.loads(raw['dict'])
print(j['id'], j['url'])
# запасной вариант
# elif похоже на валидный python-dict
d = literal_eval(raw['dict'])
print(d['id'], d['url'])

upd: answer updated

D
Dimonchik, 2016-10-05
@dimonchik2013

soup -
lazha lxml
and concretize tasks - search for substrings or what?

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question