How to download wget from a link from a file and save not index.html, but site.ru?

S

Sergey Karbivnichy2020-03-12 17:45:54

linux

Sergey Karbivnichy, 2020-03-12 17:45:54

There is a file with sites *.ru:

01-PLAN.RU
01-POKROV.RU
01-PRINT.RU
01-PROFI.RU
01-PTM.RU
01-R.RU
01-REGION.RU
01-REMONT.RU
01-RU.RU
01-S.RU
01-SB.RU
01-SBERBANK.RU
Далее около 5 млн. записей

I need to use wget to download the main pages of sites in several streams (if the site is working, of course)
I found this thing:
cat ru.txt | xargs -t -P 20 -n1 wget
It works, but it saves html files like this: 'index.html', 'index.html.1', etc. And I need the files to be saved as they are written in the file.
I know it's easy to do this with bash, but I'm confused.

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

V

Vitaly Karasik, 2020-03-12
@hottabxp

cat sites.txt| xargs -I % -t -P 20 -n1 wget % -O %

S

Sergey Pankov, 2020-03-12
@trapwalker

The previous laconic speaker meant that wget has a key -Othat says under what name to save the download.
The link to the manual, as it were, hints at the correct way to find such solutions, and I am extremely impressed with such an educational approach.
However, I recommend that you use this instead of xargs:

while read -r; do wget $REPLY -O $REPLY.html; done < ru.txt

V

Victor Taran, 2020-03-13
@shambler81

wget $(cat ru.txt)
and so?