I
I
Ivan Ivanov2021-04-23 11:22:17
PDF
Ivan Ivanov, 2021-04-23 11:22:17

Is there any normal pdf converter with table to html, csv, object?

You need to get adequate data from the table in pdf. When using https://github.com/mgufrone/pdf-to-html , in some cases table columns are merged into one column if the text in the table cell on the left is very close to the text in the table cell on the right.
For example, the file rupoisk.pro/78.pdf is table row number 85.
The text from the second and third columns is combined into one tag during conversion. And it turns out

<p style="position:absolute;top:257px;left:102px;white-space:nowrap" class="ft00">85</p>
<p style="position:absolute;top:257px;left:140px;white-space:nowrap" class="ft00">Серия, номер и дача выдачи свидетельства 64 002369255</p>

Although "Series, number and issue of the certificate" and "64 002369255" should be in different p tags . You need a converter that works on linux, preferably on debian. Thank you.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
D
Denis Yuriev, 2021-04-23
@dyuriev

I'm tired of answering the question about PDF parsing
There is no ideal option and never will be

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question