B
B
Bogdan2018-11-12 07:11:55
PHP
Bogdan, 2018-11-12 07:11:55

Why does packet loss occur when parsing links?

Wrote a miniature code for parsing "some" values ​​from the site. Actually here it is:

<?php 
    $link  = explode("\n", file_get_contents('link.txt'));
    $proxy = explode("\n", file_get_contents('proxy.txt'));

    $str = [*массив регулярных выражений*];
   
    $i = 0; $range = [13, 30];

    function SetProxy($mas){
        $config = array(
            'http' => array(
                'timeout' => 1.5,
                'proxy' => trim($mas[rand(0, count($mas))]),
                'request_fulluri' => true,
            ),
        );

        return stream_context_create($config);
    }


    $crContext = SetProxy($proxy);

    for($set = $range[0]; $set < $range[1]; $set++){

        if ($set % 2 == 0){ ReSetProxy:$crContext = SetProxy($proxy); }

        if($get_page = @file_get_contents(trim($link[$set]), False, $crContext)){

            $encoding = iconv("cp1251", "UTF-8", $get_page);

            for($l=0; $l < count($str); $l++){
                preg_match_all('~'.$str[$l].'~si', $encoding, $result);
                if($result[1][0] != null) 
                    if(count($result[1]) < 2) $MoveList[$i][] = preg_replace('~(<br[^>]*>|&nbsp;)~is', ' ', $result[1][0]); else{
                         for($g=0; $g < count($result[1]); $g++)
                             if($result[1][$g] != null) $temp .= ' '.$result[1][$g];
                        $MoveList[$i][] = str_replace(' ', ', ', trim($temp)); 
                    }
                unset($temp);
            }

            if(isset($MoveList[$i])){ $i++ }else{ goto ReSetProxy; }
        
        }else{ goto ReSetProxy; } 
    }
    
   print_r($MoveList);
?>

As a result, the whole thing returns only 12 values, no more. Who can say why this happens and whether it is treated?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
A
Alexey Sundukov, 2018-11-22
@alekciy

Try it with XPath.
Related material: https://youtu.be/id_MNxmdRvk

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question