K
K
Katerina92_lomova2021-10-01 09:34:51
Python
Katerina92_lomova, 2021-10-01 09:34:51

How to simplify query parsing in Python?

Please let me know if the code can be simplified somehow.
I look for values ​​from the site, and group them by job type.
It is clear that the spelling of characters can be different, somewhere a large letter, somewhere a small one, somewhere a Russian letter, somewhere an English one. That is, I prescribe in the condition Java or JAVA or java. Is it possible to simplify the if statements in this case?

<source lang="python">
num_of_page = 39
job_elements = []
job_elements1 = []

for i in range(num_of_page):
    URL ="https://career.habr.com/vacancies?divisions[]=backend&page=" + str(i+1)+ "&type=all"
    page = requests.get(URL)
    soup = bs(page.text, "html.parser")
    vacancies_names = soup.find_all('a', class_='vacancy-card__title-link')
    for name in vacancies_names:
        if 'C# ' in name.get_text() or 'С#' in name.get_text()or '#C' in name.get_text():
            job_elements.append(name.get_text())
         elif'Java' in name.get_text() or 'java' in name.get_text() or 'JAVA' in name.get_text():
            job_elements1.append(name.get_text())
</source>

Answer the question

In order to leave comments, you need to log in

1 answer(s)
V
Vladimir Kuts, 2021-10-01
@Katerina92_lomova

As an option:

collected_data = [
  {'pattern': ['#c', 'c#'], 'result': []},
  {'pattern': ['java'], 'result': []}
]

...

for name in vacancies_names:
    for data in DATA:
        if any([x in name.get_text().lower() for x in data['pattern']]):
            data['result'].append(text)
            continue

At the end, collected_data will be enriched with parsed data.
If you still need to catch similar ones with typos - like "iava" instead of "java", then look at the Levenshtein distance

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question