A
A
Alexey Grebenikov2014-04-09 17:09:14
PHP
Alexey Grebenikov, 2014-04-09 17:09:14

How to download a file with Cyrillic in URL?

Good evening! I am writing a small news parser. There was a problem with downloading files with Cyrillic characters in the url.
Example:

http://blobproxy-cdn.skoda-auto.com/wwk2-sitecollectionimages/news/march/škoda_на_ралли_акрополис_(2)__201403302311.jpg
http://blobproxy-cdn.skoda-auto.com/wwk2-sitecollectionimages/news/march/в_млада-болеславе_произведено_11_000_000_автомобилей_škoda_201403302311.jpg

How can I download these files? file_get_contents doesn't work. cUrl. too.

Answer the question

In order to leave comments, you need to log in

4 answer(s)
A
Andr'U Sender, 2014-04-09
@grealexti

Hello) Catch the solution)

$url = urlencode('http://blobproxy-cdn.skoda-auto.com/wwk2-sitecollectionimages/news/march/в_млада-болеславе_произведено_11_000_000_автомобилей_škoda_201403302311.jpg');
$url = str_replace(array('%3A','%2F'), array(':','/'), $url);
$data = file_get_contents($url);

D
DDanya, 2014-04-09
@DDanya

Need to use rawurlencode function

F
Flamestorm, 2018-12-29
@FlameStorm

In my case, the target link looked like
those. a mixture of a bulldog and a rhinoceros - both slashes are superfluous, and non-Latin, and a space in the form %20.
The following solution, inspired by Andr'U Sender , helped :

if (preg_match('#^([\w\d]+://)([^/]+)(.*)$#iu', $filenameSrc, $m)){
    $filenameSrc = $m[1] . idn_to_ascii($m[2], IDNA_DEFAULT, INTL_IDNA_VARIANT_UTS46) . $m[3];
}
$filenameSrc = urldecode($filenameSrc);
$filenameSrc = rawurlencode($filenameSrc); 
$filenameSrc = str_replace(array('%3A','%2F'), array(':', '/'), $filenameSrc);

I note that if you try to use urlencode, and not rawurlencode, then it encodes spaces into "+" pluses and the link did not want to open in this form. And with %20how it does rawurlencode- a ride.
I hope it saves someone some hair :)
Added another piece of code - the first three lines as a block if {...}(on a tip from the stack ) to support Cyrillic (and other national) domains - now it's http://сайт.рус/app/img/hero-bg.gifnot terrible :)

J
Jook87, 2020-02-27
@Jook87

This solution helped me:

$tmpurl=explode("/", $url);
  $url='';
  foreach($tmpurl as $k=>$e):
    if(count($tmpurl)==$k+1):
      $url.=rawurlencode($e);
    else:
      if($k<2)
        $url.=$e."/";
      else
        $url.=rawurlencode($e)."/";
    endif;
  endforeach;

    curl_setopt($ch, CURLOPT_URL, $url);

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question