M
M
Maxim2017-06-22 13:02:42
Python
Maxim, 2017-06-22 13:02:42

How to search for multiple classes in a tag, search by css selector and xpath?

Good afternoon, Parsing Medium and difficulties arose in this matter.
What do I use

from selenium import webdriver
import requests


def md():
    url = 'https://medium.com/@Tacenda/the-universe-is-all-of-space-and-time-spacetime-and-its-contents-9-which-includes-planets-3c9a58475b14'

    driver = webdriver.PhantomJS(executable_path='/usr/local/lib/node_modules/phantomjs-prebuilt/lib/phantom/bin/phantomjs')
    driver.get(url)
    driver.implicitly_wait(5)

    # avatar = driver.find_element_by_css_selector('#_obv\2e shell\2e _surface_1498118264269 > div > main > article > footer > div.u-padding0.u-clearfix.u-backgroundGrayLightest.u-print-hide.supplementalPostContent.js-responsesWrapper > div > div > div.responsesStream.js-responsesStream > div > div > div > div > div.u-clearfix.u-marginBottom10 > div > div > div.postMetaInline-avatar.u-flex0 > a')
    # avatar = driver.find_element_by_xpath('//*[@id="_obv.shell._surface_1498118264269"]/div/main/article/footer/div[5]/div/div/div[4]/div/div/div/div/div[1]/div/div/div[1]/a')
    # avatar = driver.find_elements_by_class_name('link avatar u-baseColor--link')
    avatar = driver.find_elements_by_class_name('link')
    print([i.text for i in avatar])

md()

All commented lines give an error that they cannot find these parameters.
Selector and xpath copied from browser.
Here the question is why it is impossible to find a tag that contains several classes?
I looked on the network that they add .link.avatar , but it did not help.
Looking for a class in a tree house
'link avatar u-baseColor--link' in driver.page_source

Returns true
Here's the part I need to parse, namely the href and img src tags
<a class="link avatar u-baseColor--link" href="https://medium.com/@Tacenda" data-action="show-user-card" data-action-value="17f98bbae51d" data-action-type="hover" data-user-id="17f98bbae51d" dir="auto"><img src="https://cdn-images-1.medium.com/fit/c/36/36/1*Jtm9sTbkCLI_0A0AadvVzw.jpeg" class="avatar-image u-size36x36 u-xs-size32x32" alt="Go to the profile of Adriano Celentano"></a>

Before that, I used bs4, lxml, everything was easy, you write a tag, a class and voila, everything was found, turned to the attributes and got it.
How to do everything right here? This is my second day doing this.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
D
Dmitry Eremin, 2017-06-22
@maximkv25

.link.avatar.u-baseColor--link

Here is a CSS selector that returns an array of 7 elements
. Take any (they are the same). Read the values ​​of the href attribute and find the nested img in it
There is an easy way to match CSS selectors:
  1. Find an element on a page
  2. RMB -> Inspect element
  3. The page with the source code will open. The desired item will be highlighted.
  4. See how you can locally set a unique css selector
  5. Go to browser console
  6. Write "$$('your css selector')"
  7. If an array with one element is returned, you have chosen a convenient selector

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question