How to extract text from a tag and then replace it with Soup Python?

C

chrispsow2019-06-05 14:00:41

Python

chrispsow, 2019-06-05 14:00:41

There is this piece of HTML code:

<div class="f-subheader subheader f-subheader-sm" data-editable="true" data-main-class="subheader" data-param="subheader">
            <p>
             holding educativo internacional
            </p>
            <p>
             Academia STANDART LONDRES
            </p>
           </div>
           <div class="f-header header f-header-72" data-editable="true" data-main-class="header" data-param="header">
            <p>
             <br/>
            </p>
            <p>
             <br/>
            </p>
            <p>
             <br/>
            </p>
            <h1>
             Модные курсы и семинары
             <br/>
             парикмахеров, стилистов, визажистов, косметологов И мастеров маникюра
            </h1>
           </div>
           <div class="f-desc description f-desc-xl" data-editable="true" data-main-class="description" data-param="description">
            <p>
             <strong>
              ЕВРОПЕЙСКИЙ СТАНДАРТ ОБУЧЕНИЯ В Мексике и Колумбии
              <br/>
              ОТ ЭКСПЕРТОВ КРАСОТЫ ИЗ ЛОНДОНА
             </strong>
             <br/>
            </p>
           </div>
           <div class="buttons" data-main-class="buttons">
            <button class="btn f-btn btn-success" id="button3504888" style="color: #FFFFFF; background-color: #E31e24; " type="button">
             Ver todos los cursos
            </button>

The text that is not in Russian was successfully extracted, translated and inserted back
. And the one that is not translated was not found and, accordingly, was not processed.
Python script:

soup = Soup(html, features="html.parser")
tags = ['span', 'p', 'b', 'a', 'div', 'li', 'h1', 'h2', 'h3', 'button', 'small', 'strong', 'td', 'img', 'input']

for tag in tags:
  for htmltag in soup.find_all(tag):
    try:
      # print(f'Text: {htmltag.text}, string: {htmltag.string}')
      if htmltag.string and len(htmltag.string) > 0:
        # if tag == 'span' and 'Copyright' in htmltag.string : continue
        # print(f'Tag <{tag}> String: {htmltag.string}')
        translated = translator.translate(htmltag.string, dest=lang)
        print(f'<{tag}> {htmltag.string} > {translated.text}')
        htmltag.string.replace_with(translated.text)
      elif tag == 'img' and 'alt' in htmltag.attrs and len(htmltag["alt"]) > 0:
        # print(f'Tag <{tag}> Alt: {htmltag["alt"]}')
        translated = translator.translate(htmltag['alt'], dest=lang)
        print(f'<{tag}> {htmltag["alt"]} > {translated.text}')
        htmltag['alt'] = translated.text
      elif tag == 'input' and 'placeholder' in htmltag.attrs and len(htmltag["placeholder"]):
        # print(f'Tag <{tag}> Placeholder: {htmltag["placeholder"]}')
        translated = translator.translate(htmltag['placeholder'], dest=lang)
        print(f'<{tag}> {htmltag["placeholder"]} > {translated.text}')
        htmltag['placeholder'] = translated.text
    except Exception as e:
      pass
      print(f'*** ERROR Tag: {tag} , htmltag: {htmltag} , Str: {htmltag.string} / Err: {e} ***')
      errors += 1

Through htmtagl.text it finds the text, but it also finds the code of the <script> tag if it is in the <div> block, which the htmltag.string method does not do.
And through .string, as I understand it, it does not find the text that includes into itself < /br > or something else
How to extract text and then replace it in all tags that contain it?