How to download using wget by mask without knowing what pages are there?

E

Evgenii Borovoi2020-01-02 15:06:56

Wget

Evgenii Borovoi, 2020-01-02 15:06:56

There is a link like this: somename.livejournal.com/593.html
There can be any number before .html. There is no list and the number increases out of order. The next one might be somename.livejournal.com/22593.html, but I know the last number.
Is it possible to download all existing posts with one wget command? If so, how? A huge request to write a ready answer, because I smoked mana and somehow it didn’t work out.
The idea is to somehow insert a regular expression there. Or, a bash script using wget will do for me.
Thanks in advance.

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

O

O. J, 2020-01-02
@EugeneOne77

#!/bin/bash
start= 593 #номер поста с которого начинаем брутфорсить
end=22593 #номер поста на котором заканчиваем

for (( i=start; i<=end; i++ ))
do  
    uri=https://somename.livejournal.com/$i.html
    #скачиваем только со статусом 200
    wget --server-response $uri -O $i.html 2>&1| grep -c 'HTTP/1.1 200 OK'
done

G

Germanjon, 2020-01-03
@Germanjon

I have an idea where to dig:
- In any LJ post there is a link to the next post and the previous one (link like /www.livejournal.com/go.bml?journal=someone&itemid=123456&dir=next or dir=prev ).
You can try to emulate "going" by the link, and then "pull out" the id of the resulting record. And here's how to do it, I haven't figured it out yet.