C
C
CityzenUNDEAD2022-03-24 21:05:47
C++ / C#
CityzenUNDEAD, 2022-03-24 21:05:47

How to process html characters in the resulting xml document?

Good evening!
I have such a situation that I get a huge xml file and split it into several small files. The source file is a huge number of elements <organization>...</organization>, and I make each such element a separate file.
So that the application does not fall from processing such an amount of data, I use the XmlReader class.
Source:

using (XmlReader reader = XmlReader.Create(xmlFile))
            {
                XElement organization = null;
                string exportPath = @"C:\Export..";

                reader.MoveToContent();

                while (!reader.EOF)
                {
                    if (reader.NodeType == XmlNodeType.Element && reader.Name == "organization")
                    {
                        organization = XElement.ReadFrom(reader) as XElement;
                        string xmlInside = organization.ToString();
                        ....
                    }
                    else
                    {
                        reader.Read();
                    }
                }
            }


Exception is thrown on this line
organization = XElement.ReadFrom(reader) as XElement;

System.Xml.XmlException: "Reference to undeclared entity 'nbsp'. Line 9, position 70."
That is, html encoded characters occur in xml.
The question is how to process them?
I saw some solution related to the use of the xmlDocument class, but it doesn’t really suit me, in any case I need to read the data from the XmlReader, otherwise the memory can’t cope, but I don’t know how to process the text, I didn’t find a solution.

Answer the question

In order to leave comments, you need to log in

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question