PHP script that downloads an image from a page to its server?

M

midarovrk2015-10-19 15:29:56

PHP

midarovrk, 2015-10-19 15:29:56

Hello! Please help tweak the script a bit.

<?php
get_img_in_dir("http://www.jurnalu.ru/online-reading/comicsonline/gothambymidnight2014/gothambymidnight2014004/5", "temp");
 
function get_img_in_dir($url, $dir) {
 
    $host = parse_url($url, PHP_URL_HOST); // Нахожу хост в урле
 
    /* Для начала скачиваю код страницы... */
    $curl = curl_init(); // Инициализирую CURL
    curl_setopt($curl, CURLOPT_HEADER, 0); // Отключаю в выводе header-ы
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); //озвратить данные а не показать в браузере
    curl_setopt($curl, CURLOPT_URL, $url); // Указываю URL
    curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12');
    curl_setopt($curl, CURLOPT_REFERER, "http://www.jurnalu.ru/");
    $code = curl_exec($curl); // Получаю данные
    curl_close($curl); // Закрываю CURL сессию
     
     
 
    // Код скачан и сидит в переменной $code
    // Теперь можно через регулярные выражения
    // вынимать из него ссылки
    $arrayImg = array(); // Массив для ссылок изображений
    $regex = '/<\s*img[^>]*src=[\"|\'](.*?)[\"|\'][^>]*\/*>/i';
    preg_match_all($regex, $code, $arrayImg);
 
    // Теперь в $arrayImg[1] сидит массив url-ами изображений
 
    // Исправляю все ссылки на абсолютные и скачиваю их...
    for($i=0; $i<count($arrayImg[1]); $i++) {
         
        $path = parse_url($arrayImg[1][$i], PHP_URL_PATH); // Нахожу в ссылке путь
        $path2 = parse_url($arrayImg[1][$i], PHP_URL_QUERY); // Нахожу в ссылке путь
        $absolute_url = 'http://comicsonline.ru'.$path.'?'.$path2; // Создаю абсолютный путь
    
        // Вот так я нахожу имя файла....
        $name = explode("/", $absolute_url);
        $name = $name[count($name)-1];
 
        // Скачиваю изображение
        if (!copy($absolute_url, $dir.'/'.$name)) {
            echo '<p style="color:red;">Error copy - '.$name.'</p>';
            }      
         
    }
}
?>

What the script does:
Goes to the specified page, finds a link to an image like this

http://comicsonline.ru/1/gothambymidnight2014/004/5.png?st=lZTSjOiDOdU8KuQXYVuRqw&e=1445256843

then downloads and saves the file to the specified temp folder on its server.
As a result, a file with this name appears on the server - 5.png?st=lZTSjOiDOdU8KuQXYVuRqw&e=1445256843
In this form, the image does not open, you need to remove this from the name - ?st=lZTSjOiDOdU8KuQXYVuRqw&e=1445256843 and then everything will work.
So, how can you make the script itself automatically write the file to the server without this hash? Those. like this 5.png

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

D

Dimonchik, 2015-10-19
@midarovrk

skriptets spioneril somewhere?
You get links with regexps, but you can't remove them with regexps? ))
in general, it’s even simpler here: in your $name
php.net/manual/en/function.strrchr.php you find the occurrence of "?"
and
php.net/manual/en/function.substr.php you get a substring from the beginning to the found "?"
see the first link - there, right after the description, examples are given, only there with directories,
but if you do it like an adult - the URL should be processed with the
php.net/manual/en/function.parse-url.php
function, and if it’s even older - and links with pages should be received by the library
simplehtmldom.sourceforge.net/manual.htm