F
F
FewSeconds2021-06-30 13:05:27
Python
FewSeconds, 2021-06-30 13:05:27

Does the algorithm eat internal tags instead of tag text?

Hello. There is the following algorithm:

from bs4 import BeautifulSoup
from word2word import Word2word
from tqdm import tqdm
import nltk

tr = Word2word("en", "ru")
soup = BeautifulSoup(html, "lxml")

for tag in tqdm(soup.find_all()):
    if tag.string:
        try:
            batch = nltk.word_tokenize(tag.string) # разделяем строку на слова

            # переводим каждое слово, составляя полноценное предложение, и вписываем в тег
            str_to_paste = ""
            for i in batch:
                str_to_paste += tr(i)[0] + " "
            tag.string = str_to_paste
        except:
            continue

with open("index.html", "w", encoding = "utf-8") as file:
    file.write(soup.prettify())


The problem is that it eats the link tags where the text should be.

Example of original html page (before translation):
https://jsfiddle.net/7hnm2kwq/

Example of translated html page (after translation):
https://jsfiddle.net/5ubnqw9L/

Answer the question

In order to leave comments, you need to log in

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question