S
S
SoulHunter0332021-06-13 15:15:05
Python
SoulHunter033, 2021-06-13 15:15:05

Is one link returned when parsing?

Hello everyone, I ran into a mini-problem while parsing. I parse a site with car ads in Turkey (if you do not have a Turkish IP, then you need to register when entering the site). Parsing through selenium, because with BS4, the site immediately determines the bot.
Parsed title, price, car mileage, etc. with a for loop that iterates over each listing through the common "searchResultsItem" class. Here is the code for one ad

<tr data-id="895749612"
    class="searchResultsItem     ">
    <td class="searchResultsLargeThumbnail">
            <a href="/ilan/vasita-otomobil-renault-fluence-ect-1.6-16-v-895749612/detay" title="fluence ect 1.6 16 v">
    <img class="searchResultThumbnailPlaceholder otherNoImage"
             src="https://s0.shbdn.com/assets/images/iconHasMegaPhotoLarge:d9417b1d5ff2b476ea61565150588a96.png"
             alt="fluence ect 1.6 16 v #895749612" title="Megafotolu ilan"/>
    </a>
</td>
    <td class="searchResultsTagAttributeValue">
                        1.6 Extreme</td>
                <td class="searchResultsTitleValue ">

                    <input id="favoriteClassifiedsVisibility" type="hidden" value="true"/>
<div class="action-wrapper" data-classified-id="895749612">
                            <div class="add-to-favorites last favorite">
        <a href="#"
           class="action classifiedAddFavorite trackClick trackId_favorite  hidden"
           data-content="Favorilerime Ekle">
        </a>
        <a href="#"
           class="action classifiedRemoveFavorite trackClick trackId_favorite disable"
           data-content="Favorilerimde">
      </a>
    </div>
<div class="compare hidden">
    <a class="facetedCheckbox action compare-classified" data-content="İlan Karşılaştır">
        <i></i>
    </a>
</div>
</div>
                    <a class=" classifiedTitle"
    title="fluence ect 1.6 16 v"
    href="/ilan/vasita-otomobil-renault-fluence-ect-1.6-16-v-895749612/detay">
    fluence ect 1.6 16 v</a>

<!-- This file has been included to desktop(classic, list, gallery view) and responsive -->

</td>
            <td class="searchResultsAttributeValue">
                    2011</td>
            <td class="searchResultsAttributeValue">
                    170.000</td>
            <td class="searchResultsAttributeValue">
                    Lacivert</td>
            <td class="searchResultsPriceValue">
                        <div> 69.000 TL</div></td>
                <td class="searchResultsDateValue">
                        <span>31 Mayıs</span>
                        <br/>
                        <span>2021</span>
                    </td>
                <td class="searchResultsLocationValue">
                        Kocaeli<br/>Derince</td>
                <td class="ignore-me">
    <a href="#" class="mark-as-ignored" title="Bu ilanla ilgilenmiyorum, gizle."></a>
    <a href="#" class="mark-as-not-ignored disable" data-content="Göster"></a>
</td>
</tr>


as you can see in the classifiedTitle class in tag A there is an href attribute, but when I try to parse this attribute, the code gives me the same link, that is, there are 20 ads on the site, but I get 20 identical ones, I have already tried everything I know, please help me decide! Here is my code
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
import random
from selenium.common.exceptions import NoSuchElementException

browser = webdriver.Chrome('../chromedriver/chromedriver')

def get_first_ads():
  try:
    browser.get('https://www.sahibinden.com/renault-fluence?sorting=price_asc')
    time.sleep(random.randrange(2, 4))
      
    for card in browser.find_elements_by_class_name('searchResultsItem     '):

      card_titles = card.find_elements_by_class_name('classifiedTitle')
      for card_title in card_titles:
        #print(card_title.text)
        pass

      card_descs = card.find_elements_by_class_name('searchResultsAttributeValue')
      for card_desc in card_descs:
        #print(card_desc.text)
        pass

      card_prices = card.find_elements_by_class_name('searchResultsPriceValue')
      for card_price in card_prices:
        #print(card_price.text)	
        pass

      card_models = card.find_elements_by_class_name('searchResultsTagAttributeValue')
      for card_model in card_models:
        pass

      card_hrefs = browser.find_elements_by_class_name('classifiedTitle')
      for card_href in card_hrefs:
        card_url = card_href.get_attribute('href')			
        #card_id = card_href.get_attribute('href')
      print(card_url)	

      #print(f"{card_title.text} | {card_desc.text} | {card_price.text} | {card_model.text} | {card_id}")


    browser.close()
    browser.quit()

  except Exception as ex:
    print(ex)
    browser.close()
    browser.quit()

get_first_ads()

Answer the question

In order to leave comments, you need to log in

1 answer(s)
E
Evgeniy _, 2021-06-13
@SoulHunter033

card_hrefs = browser.find_elements_by_class_name('classifiedTitle')
      for card_href in card_hrefs:
        card_url = card_href.get_attribute('href')
      print(card_url)

->
card_hrefs = browser.find_elements_by_class_name('classifiedTitle')
      for card_href in card_hrefs:
        card_url = card_href.get_attribute('href')
        print(card_url)

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question