How to read separation names in PDF file in C# Windows Form?

D

domanskiy2019-03-05 14:22:15

C++ / C#

domanskiy, 2019-03-05 14:22:15

There is a project on C# Windows Form
A simple form with a button text field and a COM component Acrobat Reader to display a PDF file on the form.

private void button2_Click(object sender, EventArgs e)
            {
            string pFile = textBox1.Text;
                string filePath = @"\\TS\Obmen\Штампы\D\" + pFile + ".pdf";
                this.axAcroPDF1.LoadFile(filePath);
                this.axAcroPDF1.src = filePath;
                this.axAcroPDF1.setShowToolbar(true); // показать/отключить панель инструментов
                this.axAcroPDF1.setView("FitH");
                this.axAcroPDF1.setLayoutMode("SinglePage");
                this.axAcroPDF1.Show();
            }

It is necessary to read the names of the separations from the XMP data of the PDF file, write to the array variable and display the array in Label1 separated by commas.
How to implement it and with what library.

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

R

rPman, 2019-03-05
@rPman

In general, no way! pdf is a picture with optional text information.
In your case, you can try to convert a pdf file into pictures, cut out a piece from a certain area (imagemagic) and send it for recognition using the same tesseract.

D

domanskiy, 2019-03-05
@domanskiy

I was able to pull out all the XMP code. It's essentially XML

PdfReader pdf = new PdfReader(filePath);
            string metadataXml = System.Text.Encoding.Default.GetString(pdf.Metadata);
            label1.Text = metadataXml;

Now I'm wondering how to deduce, let's say xpath from this XML, extract the value
//xmpmeta/RDF/Description/inks/Seq/li[1]/egname