S
S
Sergey Karbivnichy2020-03-12 17:45:54
linux
Sergey Karbivnichy, 2020-03-12 17:45:54

How to download wget from a link from a file and save not index.html, but site.ru?

There is a file with sites *.ru:

01-PLAN.RU
01-POKROV.RU
01-PRINT.RU
01-PROFI.RU
01-PTM.RU
01-R.RU
01-REGION.RU
01-REMONT.RU
01-RU.RU
01-S.RU
01-SB.RU
01-SBERBANK.RU
Далее около 5 млн. записей

I need to use wget to download the main pages of sites in several streams (if the site is working, of course)
I found this thing:
cat ru.txt | xargs -t -P 20 -n1 wget
It works, but it saves html files like this: 'index.html', 'index.html.1', etc. And I need the files to be saved as they are written in the file.
I know it's easy to do this with bash, but I'm confused.

Answer the question

In order to leave comments, you need to log in

3 answer(s)
V
Vitaly Karasik, 2020-03-12
@hottabxp

cat sites.txt| xargs -I % -t -P 20 -n1 wget % -O %

S
Sergey Pankov, 2020-03-12
@trapwalker

The previous laconic speaker meant that wget has a key -Othat says under what name to save the download.
The link to the manual, as it were, hints at the correct way to find such solutions, and I am extremely impressed with such an educational approach.
However, I recommend that you use this instead of xargs:

while read -r; do wget $REPLY -O $REPLY.html; done < ru.txt

V
Victor Taran, 2020-03-13
@shambler81

wget $(cat ru.txt)
and so?

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question