Answer the question
In order to leave comments, you need to log in
How to parse the title and description of a vacancy at the same time from habr in python?
The task is to parse vacancies from Habr in accordance with their types.
There is a list of words in the code, if the vacancy contains them, then it gets into the desired list.
If not, then such vacancies are added to another list.
Tell me how can I implement the following:
If the vacancy does not contain any of the necessary words, then parse into the list not only its description, for example, but also the name of the vacancy, that is, is there another tag from the page?
<source lang="python">
num_of_page = 40
other_vacancies = [] # остальные вакансии будут валиться сюда
collected_data = [
{'pattern': ['angular'], 'result': [] },
{'pattern': ['react'], 'result': []},
{'pattern': ['vue','js'], 'result': []}
]
for i in range(num_of_page):
URL ="https://career.habr.com/vacancies?divisions[]=frontend&page=" + str(i+1)+ "&type=all"
page = requests.get(URL)
soup = bs(page.text, "html.parser")
vacancies_names = soup.find_all('a', class_='vacancy-card__title-link')
for name in vacancies_names:
for data in collected_data:
pattern_found = False
if any([x in name.get_text().lower() for x in data['pattern']]):
data['result'].append(name.get_text())
pattern_found = True
break
if not pattern_found:
other_vacancies.append(name.get_text())
</source>
Answer the question
In order to leave comments, you need to log in
It's easy, well. Why are you just looking at the DOM when there is apiha. You just need to turn on the Network tab in your browser and walk around the pages.
The only one passed in the request header is X CSRF Token. But this is also pulled out in the simplest way through a regular expression or a normal DOM scan
1
2
3
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question