A
A
Ambrose2012-02-21 09:30:16
HTML
Ambrose, 2012-02-21 09:30:16

Libraries for converting PDF to HTML for .Net

I welcome everyone!
I ran into a problem: you need to pull out a lot of pdf-files with tables from a certain site and extract certain information from them.
Previously, in such cases, I used the Apache PDFBox for .Net library, it can convert pdf to html, which can already be parsed with regexps and pull out the necessary information from there.
However, this time it was not so easy to do, either the pdfs are too good, or something else, but the html code from them turns out to be very strange, in some cases it is almost impossible to parse it.
Do you know of equivalents to PDFBox that you can try using in .NET for such a task?

Answer the question

In order to leave comments, you need to log in

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question