Answer the question
In order to leave comments, you need to log in
How to parse a python .doc file?
I need to parse the .doc file, this is the school schedule for the telegram bot. The question is, how to get the link to the file from it?
Answer the question
In order to leave comments, you need to log in
You can use regular expressions. The expression looks for a phrase from diff to a numeric combination up to the first "doc"
https://regex101.com/r/XLJ1t4/1
import re
import urllib
regexp1='(\/diff\/\d{1,2}-\d{1,2}.?doc)'
f=urllib.request.urlopen('http://1311.ru/info/info.php') #открывает, возвращает объект http (не текст)
b=f.read() #читает из него в bytes
text=b.decode() #из bytes в utf-8 (кодировка по умолчанию, поэтому в аргументах декод можно не писать) переводит в текст
out=re.findall(regexp1, text)
#далее, зная адрес сайта
for i in out:
print ("http://1311.ru"+i)
http://1311.ru/diff/16-09.doc
http://1311.ru/diff/17-09.doc
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question