A
A
Antoxa Zimm2016-01-05 00:48:01
.NET
Antoxa Zimm, 2016-01-05 00:48:01

How to read data from huge xml?

You need to read data from a huge xml file, for example, 50Gb in size, the file is a root element with a collection of similar nodes (one name, almost always the same set of attributes and nested nodes), the structure of each node is not known in advance, some fields must be read as they are , some are converted/calculated on the fly or after reading all other fields. Before parsing xml, the structure of the entity in the file is not known, everything goes into the database, the list of tables and columns can change throughout the life of the application, there are rules by which we parse xml, for example:

  1. from the attribute PrimaryKeywe consider the hash value, if any, and put it in the cell KeyHashof each table
  2. for the cell XXXId, we look at the parent and child nodes with the name XXXand take the value from it
  3. it is planned to depend on a file with more complex wheels, so SqlBulkCopyimmediately in the database is not an option
  4. etc.

The most logical solution is to read one node from the file through while(reader.read()){}and store it all in memory in arrays, then dump it into the database when enough data is collected (via sqlbulkcopy(writetoserver(idatareader))), but the problem is, parsing even a couple of gigabytes takes a very long time. There is a description of the parsec on the page with beautiful processing results, but I don’t find anything about this either.Question
: how quickly can you parse xml into parts (pick up the parent element, child, check if there is a node with a name starting with XXX or pick up an attribute by name), standard LINQ to XML is slow, which ones are fast and stable, with good documentation, are there any for .net applications?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
O
one pavel, 2016-01-05
@onepavel

sax

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question