Answer the question
In order to leave comments, you need to log in
How to get all links on a website page?
How to get all links on a website page using the command line? Or does it have to be done in python? there are no other options? If not, which is not desirable, then how can this be organized in python?
Answer the question
In order to leave comments, you need to log in
From the command line:
curl https://yandex.ru | grep -o -E 'href=\".*?\"' | sed 's/href=\"//' | sed 's/\"//' | sort | uniq
# //yandex.ru/opensearch.xml
# //yastatic.net/jquery/2.1.4/jquery.min.js
# https://afisha.yandex.ru/rostov-na-donu/cinema/cyrano-2022? utm_source=yamain&utm_medium=yamain_afisha_kp
# https://afisha.yandex.ru/rostov-na-donu/cinema/dog-2021?utm_source=yamain&utm_medium=yamain_afisha_kp
# https://afisha.yandex.ru/rostov-na-donu/cinema/kroletsyp-i-khomiak-tmy?utm_source=yamain&utm_medium=yamain_afisha_kp
...
import io
import requests
from lxml import etree
data = requests.get('https://yandex.ru').text
parser = etree.HTMLParser()
tree = etree.parse(io.StringIO(data), parser)
for im in tree.xpath('//a'):
print(im.get('href'))
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question