S
S
sazhyk2017-10-25 13:34:39
Python
sazhyk, 2017-10-25 13:34:39

How to find specific text in html?

There is an html file. Well structured. The content is something like the following.

file
<!DOCTYPE html>
<html lang="en">
    <head>
        <meta charset="UTF-8">
        <title>Example</title>
    </head>
    <body>
        <div id="simple">
            <p class="one">
                Здесь какой-то текст
            </p>
            <p class="two">
                Здесь какой-то текст
            </p>
            <p class="three">
                Здесь какой-то текст
            </p>
            <p class="four">
                Здесь какой-то текст
            </p>
            <p class="five">
                Ваш уникальный идентификатор: 0123456789
            </p>
        </div>
    </body>
</html>

Using bs4 I find the element I need
<p class="five">
     Ваш уникальный идентификатор: 0123456789
</p>
The phrase Ваш уникальный идентификатор:always stands that way, but the meaning of this identifier is different. And I need to get this value from different documents.
Right now it looks like this
def find_id(document):
    with open(document) as fp:
        soup = BeautifulSoup(fp, "lxml")
    find_p = soup.find_all("p", {"class": "five"})
    # где-то тут надо найти этот самый идентификатор
    return uni_id # ну и вернуть его

The question is, how do I find and return the value of an id?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
R
Rostislav Grigoriev, 2017-10-25
@sazhyk

def find_id(document):
    with open(document) as fp:
        soup = BeautifulSoup(fp, "lxml")
    find_p = soup.find("p", {"class": "five"})
    text = find_p.get_text(strip=True) if find_p else ''
    if ':' in text:
        return text.split(':')[-1].strip()
    # возвратить пустую строку или рейзить ошибку или что-то другое
    return ''

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question