P
P
PO6OT2015-10-18 18:23:40
PHP
PO6OT, 2015-10-18 18:23:40

Why doesn't the indexing script crawl the site?

Here's the script:
godaemon.tk/script.txt
It keeps adding the same link to the list when it shouldn't:

http://habrahabr.ru/                                                                                                            
http://tmfeed.ru?utm_source=tm_habrahabr&utm_medium=tm_top_panel&utm_campaign=tm_promo
http://tmfeed.ru?utm_source=tm_habrahabr&utm_medium=tm_top_panel&utm_campaign=tm_promo
http://tmfeed.ru?utm_source=tm_habrahabr&utm_medium=tm_top_panel&utm_campaign=tm_promo
http://tmfeed.ru?utm_source=tm_habrahabr&utm_medium=tm_top_panel&utm_campaign=tm_promo
http://tmfeed.ru?utm_source=tm_habrahabr&utm_medium=tm_top_panel&utm_campaign=tm_promo
http://tmfeed.ru?utm_source=tm_habrahabr&utm_medium=tm_top_panel&utm_campaign=tm_promo
http://tmfeed.ru?utm_source=tm_habrahabr&utm_medium=tm_top_panel&utm_campaign=tm_promo
http://tmfeed.ru?utm_source=tm_habrahabr&utm_medium=tm_top_panel&utm_campaign=tm_promo
http://tmfeed.ru?utm_source=tm_habrahabr&utm_medium=tm_top_panel&utm_campaign=tm_promo
http://tmfeed.ru?utm_source=tm_habrahabr&utm_medium=tm_top_panel&utm_campaign=tm_promo
http://tmfeed.ru?utm_source=tm_habrahabr&utm_medium=tm_top_panel&utm_campaign=tm_promo
...

UPD:
I found something - gethrefs() does not return all links, but only the first one. I do not know why.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
P
PO6OT, 2015-10-18
@woonem

preg_match in the gethrefs function should be replaced with preg_match_all

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question