J
J
JRazor2013-11-27 21:44:57
Python
JRazor, 2013-11-27 21:44:57

How to write a file parser on a website (Python)?

Good day, gentlemen and a few ladies!
The situation is this: we have a site URL and we need to find all the files on this site.
Question: what to use? Regular expressions ( '\.(php|txt|css)' )? Substitution method? How?
Thank you!

Answer the question

In order to leave comments, you need to log in

3 answer(s)
A
alternativshik, 2013-11-27
@alternativshik

pull wget'om and then grab the right one?

B
borgch, 2013-11-30
@borgch

I used regular expressions (the module is called re). Specifically, I first received the page code:

from urllib import request
...
html = request.urlopen(your_url).read().decode('utf-8')

And then something like
This will remember in filenames all filenames (no spaces) followed by the desired extension. If the file names are surrounded by some specific tags (for example,
<tag1><div class='filenames'>имя файла.txt</div><br></tag1>
, then you can easily select the part of the text that matches the regexp that you need using parentheses.
Read the documentation for this module and do what you need by analogy.

M
maxaon, 2013-12-16
@maxaon

Not ideal, but quite working spider - Grab . Can visit sites. search for everything you need, including xPath and RegExp

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question