C
C
CommonNewbie2020-01-28 00:29:11
PHP
CommonNewbie, 2020-01-28 00:29:11

What is the best way to optimize the crawler for the site?

There is a task of searching for images on the ibb.co website that contain a specific phone model in exif (this information is displayed directly on the page)
iyZqosY9RAyM4lUZ85Ouvg.png
When uploading an image, the link is formed as ibb.co + 7 characters (0-9, az, AZ) I
wrote my own script

<?php
$string = "HUAWEI";
$permitted_chars = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
$log = "log.txt";
$found = "found.txt";
$shit = "smth.txt";
 
function generate_string($input, $strength = 16) {
    $input_length = strlen($input);
    $random_string = '';
    for($i = 0; $i < $strength; $i++) {
        $random_character = $input[mt_rand(0, $input_length - 1)];
        $random_string .= $random_character;
    } 
    return $random_string;
} 
for ($i = 1; $i <= 5000000; $i++) {
    $generate = generate_string($permitted_chars, 7);
    $url = "https://ibb.co/".$generate;
    $headers = get_headers($url);
    if ($headers[0] == "HTTP/1.1 200 OK"){
        file_put_contents($shit, $url, FILE_APPEND);
        $content = file_get_contents($url);    
        $pos = strpos($content, $string);
        if ($pos !== false) {
            $format_url = $url . "\n";
            file_put_contents($found, $format_url, FILE_APPEND);   
>

But it has disadvantages, the link is generated randomly, i.e. not the whole site will be checked and it works slowly.
After uploading to 20 servers, on average 1 server makes something like 1 million requests per day, which is very sad.
Suggest how this can be done more efficiently.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
A
Anton Shamanov, 2020-01-28
@SilenceOfWinter

collect links with wget

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question