D
D
Dmitry2018-09-24 17:52:24
Parsing
Dmitry, 2018-09-24 17:52:24

How to remove pages from the list of URLs that contain the text "Not found"?

Greetings friends
There is a list of site URLs (more than 1000)
Task:
1. find out which URLs contain the phrase "not found" in the content
2. remove these URLs from the list or mark them somehow to manually delete them later.
What is the easiest way to do this and how?
Thank you.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
L
Lander, 2018-09-24
@usdglander

1. Take the n-th URL from the list
2. Request the content of the page via cURL
3. Search the content for the string "not found"
4. If the result was found, then increase n by 1 and go to step 1
5. If the result is was not found, then add a link to the file, increase n by 1 and go to step 1

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question