How to properly pull out a piece of text using bs4?

S

Sanya Hihi Haha2021-02-26 06:37:34

Parsing

Sanya Hihi Haha, 2021-02-26 06:37:34

Good day
I'm sitting trying to parse a piece of text

<p class="order-quantity j-orders-count-wrapper" data-link="class{merge: selectedNomenclature^ordersCount < 1 toggle='hide'}">Купили
 <span data-link="{include tmpl='productCardOrderCount' ^~ordersCount=selectedNomenclature^ordersCount}">
<script type="jsv#29_"></script>    
<script type="jsv#27^"></script>
<script type="jsv#30_"></script>        
<script type="jsv#26^"></script>более 700 раз<script type="jsv/26^">
</script>   
 <script type="jsv/30_"></script>
<script type="jsv/27^"></script>
<script type="jsv/29_"></script>
</span>
</p>

In general, speaking, I have not succeeded yet, well, except to pull out the word "bought".
The text method just returns the word "bought" to me.

But I would like to get a number, in this case 700

I suspect that you need to use a regular expression, but I'm not at all sure that this will help.
Although there is a piece of js-code a kilometer long (where all this data is stored), you can certainly try to parse it, but turn off the light there ..

Therefore, I would like to pull out the numbers within this piece of html
Thank you

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

D

datka, 2021-02-26
@ValarMayar

from bs4 import BeautifulSoup
import re

html = """ 
<p class="order-quantity j-orders-count-wrapper" data-link="class{merge: selectedNomenclature^ordersCount < 1 toggle='hide'}">Купили
 <span data-link="{include tmpl='productCardOrderCount' ^~ordersCount=selectedNomenclature^ordersCount}">
<script type="jsv#29_"></script>    
<script type="jsv#27^"></script>
<script type="jsv#30_"></script>        
<script type="jsv#26^"></script>более 700 раз<script type="jsv/26^">
</script>   
 <script type="jsv/30_"></script>
<script type="jsv/27^"></script>
<script type="jsv/29_"></script>
</span>
</p>
"""
soup = BeautifulSoup(html)

full_text = re.sub(' +', ' ',soup.find('p').get_text().strip().replace(u'\n', u' '))
print(full_text)

number = re.findall("[0-9]+",soup.find('p').get_text())
print(nunber)