Answer the question
In order to leave comments, you need to log in
Libraries for converting PDF to HTML for .Net
I welcome everyone!
I ran into a problem: you need to pull out a lot of pdf-files with tables from a certain site and extract certain information from them.
Previously, in such cases, I used the Apache PDFBox for .Net library, it can convert pdf to html, which can already be parsed with regexps and pull out the necessary information from there.
However, this time it was not so easy to do, either the pdfs are too good, or something else, but the html code from them turns out to be very strange, in some cases it is almost impossible to parse it.
Do you know of equivalents to PDFBox that you can try using in .NET for such a task?
Answer the question
In order to leave comments, you need to log in
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question