Answer the question
In order to leave comments, you need to log in
How to parse ip addresses from web pages in python?
The task is to follow the links of the site
www.zone-h.org/archive,
open links like www.zone-h.org/mirror/id/22714269
and copy the field with the IP address into a single text file.
How would you recommend implementing it in Python? What libraries/examples would you recommend to use? Thanks for the advice.
Answer the question
In order to leave comments, you need to log in
from urllib import request
def getIP(urls):
link = 'http://www.zone-h.org/mirror/id/22714269'
requestToLink = request.Request(link)
answerFromServ = request.urlopen(requestToLink).read()
result = answerFromServ.decode('utf8')
print (result[result.find('IP') + 20 : result.find('IP') + 37])
from urllib import request
def getUrls():
urls = []
link = 'http://www.zone-h.org/archive'
requestToLink = request.Request(link)
answerFromServ = request.urlopen(requestToLink).read()
result = answerFromServ.decode('utf8')
findIt = 'mirror/id'
for findIt in result:
urls.append(result[result.find('mirror/id') + 10 : result.find('mirror/id') + 18])
result = result[result.find('mirror/id'):]
return urls
Using existing Python tools:
1) Using urllib2, you can download the page from www.zone-h.org/archive.
2) Find all the necessary links on the page, for example, by searching using regex.
3) Go through the received links, use urllib2 to download the page, extract the necessary lines (ip-address) from it and write it to a file
4) ...
5) PROFIT
How to download a page using urllib2 can be easily found on the Internet.
How to find a specific line in a large text is also not a problem.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question