Answer the question
In order to leave comments, you need to log in
How to download using wget by mask without knowing what pages are there?
There is a link like this: somename.livejournal.com/593.html
There can be any number before .html. There is no list and the number increases out of order. The next one might be somename.livejournal.com/22593.html, but I know the last number.
Is it possible to download all existing posts with one wget command? If so, how? A huge request to write a ready answer, because I smoked mana and somehow it didn’t work out.
The idea is to somehow insert a regular expression there. Or, a bash script using wget will do for me.
Thanks in advance.
Answer the question
In order to leave comments, you need to log in
#!/bin/bash
start= 593 #номер поста с которого начинаем брутфорсить
end=22593 #номер поста на котором заканчиваем
for (( i=start; i<=end; i++ ))
do
uri=https://somename.livejournal.com/$i.html
#скачиваем только со статусом 200
wget --server-response $uri -O $i.html 2>&1| grep -c 'HTTP/1.1 200 OK'
done
I have an idea where to dig:
- In any LJ post there is a link to the next post and the previous one (link like /www.livejournal.com/go.bml?journal=someone&itemid=123456&dir=next or dir=prev ).
You can try to emulate "going" by the link, and then "pull out" the id of the resulting record. And here's how to do it, I haven't figured it out yet.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question