How to get binary data of each page of Pdf file?

V

Vladislav Sofienko2018-02-27 16:16:43

PHP

Vladislav Sofienko, 2018-02-27 16:16:43

Hello everyone, comrades. I ran into a problem where I would need to parse a PDF file page by page, but how to get the binary data of this PDF file in PHP like file_get_contents() only for each page? I thought that PDF Parser would help me , but I did not find a method that could implement this.

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

I

ivankomolin, 2018-02-27
@ivankomolin

In essence, tasks of this kind boil down to the following:
1. Split pdf pages into separate images (for example, using imagemagick)
2. Run images through some kind of OCR (for example, Tesseract)
3. Parse the received data
Why get the binary data of each pdf page?