R
R
rkhokhorin2020-08-03 08:40:53
Python
rkhokhorin, 2020-08-03 08:40:53

Why is one of the classes on the site periodically not found using bs4 (Although this class is on the site)?

There was such a problem. I have a first name, last name, date (from what year this person studied) and the city where he studied. I need to find his scientific papers (or make sure there aren't any). Here is an example of the page I was parsing from

https://cyberleninka.ru/search?q=%D0%B2%D0%B0%D0%B...
. A small explanation of my idea: If the first name, last name and date match the specified ones, we must take a link to check the city. Here is the function that does it
def __parse_links(self, name, page_count, year, geo):
    try:
        list_links = []
        for page in range(1, page_count + 1):
            url = 'https://cyberleninka.ru/search?q={}&page={}'.format(name + ' ' + geo, page)
            soup = self.__get_url_raw(url)
            time.sleep(4)
            all_variation_name = self.__creat_all_variation_name(name.split())
            all_teg_li_in_list_page = soup.find('ul', {'id': 'search-results'})
            all_teg_li_in_list_page = all_teg_li_in_list_page.find_all('li')
            list_links = list()
            for li in all_teg_li_in_list_page:
                name_text = li.find('span').text.lower()
                date_text = li.find_all('span')[-1].text.lower()
                if (self.__check_exist_name_in_text(name_text, all_variation_name)) and (self.__check_exist_date_in_text(date_text, year)):
                    list_links.append(li.find('a').get('href'))
        return list_links
    except Exception as e:
        if self.log is not None:
            log_error(self.log, e, '__parse_links')

But here I have a problem. The line all_teg_li_in_list_page = all_teg_li_in_list_page.find_all('li') periodically gets None i.e. it does not find ('ul', {'id': 'search-results'}), although this class is on the page (the link I provided as an example is one of those pages where it did not find this class)
5f27a28ad139d745046865.jpeg
What's unusual about this is that I can't find this class using search (ctrl+f, claims to have found the element but doesn't show it). Also, this problem does not always occur. It can easily find this class 20 times in a row, but then it loses and throws an ERROR: 'NoneType' object has no attribute 'find_all' error. Initially, I thought that the page was not loading (which is why time.sleep is there), but this did not help me. I also thought that the problem was in the scripts, but I don't see the scripts loading on the page. For what reason can bs4 periodically not find a class that is definitely there?

Answer the question

In order to leave comments, you need to log in

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question