A
A
Alexander Andropov2020-08-28 18:42:22
Python
Alexander Andropov, 2020-08-28 18:42:22

How to do it correctly (parsing)?

Good day. I am learning Python and now I decided to solve the problem of parsing. I use BeautifulSoup and requests.

There is an airline website. I want to parse airline offers of the type - Itinerary, Price and link.
Link from where I get information https://www.alrosa.aero/info/special-offers , in the end I do it like this:

def alrosa(link, header):
    r = requests.get(link, headers= header)
    html = BeautifulSoup(r.content,'html.parser')
    citylist = []; pricelist = []; linkdeplist = []
    for city in html.select('.city'):
        citylist.append(city.text)

    for price in html.select('.price'):
        pricelist.append(price.text)  
     
    for linkdep in html.select('.button a'):
        linkdeplist.append(linkdep.get('href'))
    i = 0
    while i < len(citylist):
        print('Маршрут: ' + citylist[i] + '. Цена: ' + pricelist[i] + '. Ссылка- ' + linkdeplist[i])
        i += 1

alrosa(config.urls['alrosa'],config.headers)

Everything is working. But mine looks wrong. 3 cycles and the conclusion ... in general, it confuses me how I did it, even though it works.
Tell me how to do it correctly so that it is competent from the point of view of the code. Thank you very much for your advice. I would like to hear the opinion of experienced guys. Thank you.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
D
Dmitry, 2020-08-28
@LazyTalent

I would rewrite it in OOP, something like this (Be sure to add error handling !!!):

class Alrosa:
    def __init__(self, page_source: str):
        self.html = BeautifulSoup(page_source, 'html.parser')

    @property
    def citylist(self) -> List[str]:
        return self._get_elements('.city')

    @property
    def pricelist(self) -> List[str]:
        return self._get_elements('.price')

    @property
    def linkdeplist(self) -> List[str]:
        elems = self._get_elements('.button a')
        return [i.get('href') for i in elems]

    def _get_elements(self, selector: str) -> List[str]:
        return [elem for elem in self.html.selector(selector)]

    def to_string(self, i: int) -> str:
        return f'Маршрут: {self.citylist[i]}. Цена: {pricelist[i]}. Ссылка-{linkdeplist[i]}'
    
    
def get(url: str, headers: Dict[str, str]) -> requests.Response:
    return requests.get(url, headers=headers)

r = get(url, headers)
alrosa = Alrosa(r.content)
for i in range(len(alrosa.citylist)):
    print(alrosa.to_string(i))

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question