A large bunch of spaces when parsing html tables, how to get rid of?

M

Mikhail Muntyan2022-02-23 14:27:40

Python

Mikhail Muntyan, 2022-02-23 14:27:40

I'm sitting on a scab site, I almost came to the end, and when parsing, the elements from the n column under the td tag are parsed with a bunch of spaces, is there any way to get rid of them?

The code:

import requests
from bs4 import BeautifulSoup as BS

s = requests.Session()

auth_html = s.get()
auth_bs = BS(auth_html.content, "html.parser")
csrf = auth_bs.select("meta[name=csrf-token]")[0]["content"]

payload = {
    "_token": csrf,
    "email": "",
    "password": ""
}

answ = s.post("", data = payload)

for i in range(1, 37): # pagenation
    parse = s.get(f"")

    soup = BS(parse.content, "lxml")
    items = soup.find_all(class_="clickable-row")

    for i in items:
        item = i.find_all_next("td")

        if item[6].parent.find(class_="editMarketplaceCategoryBlock") == None:
                print("""ID: {}
    Каталог: {}
    Название категории: {}
    Родители категории: {}
    Связь с категорией маркетплейса: {}\n""".format(item[2].text.strip(),
                                                   item[3].text.strip(),
                                                   item[4].text.strip(),
                                                   item[5].text.replace("\n", "").replace(" ", ""),
                                                   "Не заполнено"))

        else:
            print("""ID: {}
    Каталог: {}
    Название категории: {}
    Родители категории: {}
    Связь с категорией маркетплейса: {}\n""".format(item[2].text.strip(),
                                                           item[3].text.strip(),
                                                           item[4].text.strip(),
                                                           item[5].text.replace("\n", ""),
                                                           item[6].parent.find(class_="editMarketplaceCategoryBlock").text.replace("\n", "").replace(" ", "")))

    break

The output is something like this:

ID: 3
    Каталог: Самсон
    Название категории: Бумага белая марок А, В, С
    Родители категории:                     Офис                            /                                Бумага для офисной техники

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

V

Vladimir Kuts, 2022-02-23
@limontasher

import re
data = '''ID: 3
    Каталог: Самсон
    Название категории: Бумага белая марок А, В, С
    Родители категории:                     Офис                            /      '''

out = re.sub(' +', ' ', data)
print(out)

#ID: 3
# Каталог: Самсон
# Название категории: Бумага белая марок А, В, С
# Родители категории: Офис /