How to extract words from a PDF file by mask and put them into a list?

T

tuccar2016-04-30 18:04:09

PDF

tuccar, 2016-04-30 18:04:09

Good afternoon.
There is a PDF file of several hundreds or thousands of pages. On almost every page of this file there is a specific word "city", and after the word "city" is the actual name of the city. How to extract from this entire document all the names of cities that are after the word "city" (that is, without the word "city" itself) and give them out in one list one under the other?
Are there special programs for this, or are scripts written in PL? I will be glad of any useful information (I can not be limited only to joy :)).
Thank you.

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

A

Alexey Ukolov, 2016-04-30
@alexey-m-ukolov

I can only help with links:
pdfbox.apache.org
How can I parse a PDF using PHP?
Automated transfer of PDF to SQL
There is quite a lot of information there. There are, of course, all sorts of nuances depending on what the source files are, but I, using what is written on the links, most likely could do it, which means you can too :)