M
M
Michael_Mel2019-03-19 12:42:41
PHP
Michael_Mel, 2019-03-19 12:42:41

How to optimize PDF to PNG conversion?

Hello!
I convert PDF to PNG on the server page by page

$location = "папка"; $location_file = "имя файла";
if($_FILES['filename_less']['name'] != ''){
  if(!mkdir($location, 0777, true)) 
   { echo "Не удалось создать директорию. Выполнение операции невозможно"; exit(); } 
   else
   {
     if(move_uploaded_file($_FILES['filename_less']['tmp_name'], $location_file)) 
     { 
       // преобразование файла pdf в png или jpeg
  $im = new Imagick();
  $im->setResolution(300,300);
  $im->readimage($location_file); 
 	$noOfPagesInPDF = $im->getNumberImages(); 
 	if($noOfPagesInPDF) 
  { 
 	   for ($i = 0; $i < $noOfPagesInPDF; $i++) 
    { 
 	    $url = $location_file.'['.$i.']'; 
 	    $image = new Imagick($url);
 	    $image->setImageFormat("png"); 
 	    $image->writeImage($location."/".($i+1).'_'.$way.'.png'); 
      echo "Страница создана";
 	  }
      } else { echo "PDF не содержит страниц"; }
    } 
   else { echo "Файл не загружен. Обратитесь к администратору портала"; exit(); }
   }
}

and everything works, but there is 1 moment - a file of more than 10 pages gives out BAD GETWAY
and on the hoster they say that the memory is eaten up to 400mb, so due to the limitations of Wirth hosting, I'm sorry ....
somehow you can optimize the script?
and is it possible to see some information useful for debugging in the console?
well, except for the execution time of the script...

Answer the question

In order to leave comments, you need to log in

5 answer(s)
B
Boris Korobkov, 2019-03-19
@Mihail_Mel

Add profiling. Possibly crashes on readimage() when reading the entire file into memory.
Try removing this code. And instead of determining the number of pages, you can iterate through them all in a row until there is an error that try-catch
catches. You also need to optimize, but in any case, go to VPS. It costs from 340 rubles / month. You have already spent time writing this question alone, which is worth more than a month's rent.

A
Adamos, 2019-03-19
@Adamos

Shared is not one of those that allow you to run arbitrary scripts on cron?
So walking through the folder with the PDF console ImageMagick can be much faster than all this with puff. Then disassemble the already finished pictures, where necessary.

T
ThunderCat, 2019-03-19
@ThunderCat

1) Why such a resolution - 300 dpi? Is it critical? Put 100-150, it's more than enough for the screen.
2) Separate processing into a separate script and run each page from the console in a separate thread - console applications have no execution time limit, on the other hand, cli may not be supported on the shared.
3) Did you measure the timing? Which of the operations devours the resource? First, to determine what to optimize ...
4) It's not bad to use clear after processing

A
Anton, 2019-03-19
@sHinE

If you run external commands through exec (), the same imagemagick or ghostscript? At one time, imagemagick failed to force the output file to change dpi, so ghostscript was used. As for the memory being eaten and, in principle, the possibility of launching on your hosting, I won’t tell you.

M
Mihail_Mel, 2019-03-19
@Mihail_Mel

Thanks everyone for the ideas! Most likely you will have to dig towards the API, because everything else really hangs up a virtual server, and it makes no sense to use a dedicated one, since this is not a PDF distillation service ((((

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question