Answer the question
In order to leave comments, you need to log in
Automated migration of PDF to SQL
Good, people.
There are enough jokes in the life of developers. Especially with the fantasies of customers. Another such joke befell me.
There is a catalog of spare parts for construction equipment. In…PDF format.
25 Gb files contain explosion diagrams, part numbers, names and other necessary information. And you need to overtake this excellent amount into an acceptable database format. Currently SQL.
I'm sure there is a text format. But no one will provide it. Show jumpers and the manufacturer are not interested in this. Any AutoCD are sewn up in a closed format.
Prompt the shortest way from PDF to SQL. So far, only PDF->XLSX->Parser->SQL comes into my head
. But figs knows it. Suddenly who faced.
Thanks in advance for your replies.
Answer the question
In order to leave comments, you need to log in
Look, it's close to the topic, especially in the comments: habrahabr.ru/post/130601/
Here is another utility for extracting text multivalent.sourceforge.net/Tools/ By the way, ABBY also has a utility that can be useful
Frankly, PDF can be so cleverly heaped up that you can get figs out of it in machine-readable form
Did some simple search in pdf. Converted pdf2xml, then stupidly searched for xml.
In your case, I think this will not help much, because the layout differs from page to page, and text blocks are written in xml with the coordinates of the text location and the text itself. That is, structured data can hardly be obtained.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question