V
V
Victor Marquardt2017-03-26 12:56:59
C++ / C#
Victor Marquardt, 2017-03-26 12:56:59

How to parse large xml using RapidXML?

I have an xml file, I'm trying to parse it. Yuzayu RapidXML, looked at the examples on the website, everywhere they refer to the nodes by name and the examples are simple. But if I have a large file, 800+ lines, and the nodes are nested there in different ways, for example, here is a piece of the file

<?xml version="1.0"?>
<data grt_format="2.0" document_type="MySQL Workbench Model" version="1.4.4">
  <value type="object" struct-name="workbench.Document" id="{C1D1EED9-0895-4E88-B9A7-D71E1D4C4867}" struct-checksum="0x7131bf99">
    <value type="object" struct-name="workbench.logical.Model" id="{8AD965A6-D925-4D18-95FC-1616F0B7712F}" struct-checksum="0xf4220370" key="logicalModel">
      <value _ptr_="0000000022DAB1F0" type="list" content-type="object" content-struct-name="workbench.logical.Diagram" key="diagrams"/>
      <value _ptr_="0000000022D87820" type="dict" key="customData"/>
      <value _ptr_="0000000022DAB260" type="list" content-type="object" content-struct-name="model.Marker" key="markers"/>
      <value _ptr_="0000000022D85660" type="dict" key="options"/>
      <value type="string" key="name"></value>
      <link type="object" struct-name="GrtObject" key="owner">{C1D1EED9-0895-4E88-B9A7-D71E1D4C4867}</link>
    </value>
    <value _ptr_="0000000022DAB110" type="list" content-type="object" content-struct-name="workbench.OverviewPanel" key="overviewPanels"/>
    <value _ptr_="0000000022DAB180" type="list" content-type="object" content-struct-name="workbench.physical.Model" key="physicalModels">
      <value type="object" struct-name="workbench.physical.Model" id="{A756AA88-707D-4A97-B8B3-9035F7D93E14}" struct-checksum="0x5f896d18">
        <value type="object" struct-name="db.mysql.Catalog" id="{96A82A8F-0707-4C91-9959-F4E7CC4E61D7}" struct-checksum="0x82ad3466" key="catalog">
          <value _ptr_="0000000022DAB6C0" type="list" content-type="object" content-struct-name="db.mysql.LogFileGroup" key="logFileGroups"/>
          <value _ptr_="0000000022DAB7A0" type="list" content-type="object" content-struct-name="db.mysql.Schema" key="schemata">
            <value type="object" struct-name="db.mysql.Schema" id="{5CC0352E-3C93-4169-B4AE-99598891D758}" struct-checksum="0x20b94c22">
              <value _ptr_="0000000022DABAB0" type="list" content-type="object" content-struct-name="db.mysql.RoutineGroup" key="routineGroups"/>
              <value _ptr_="0000000022DABB20" type="list" content-type="object" content-struct-name="db.mysql.Routine" key="routines"/>
              <value _ptr_="0000000022DABB90" type="list" content-type="object" content-struct-name="db.mysql.Sequence" key="sequences"/>
              <value _ptr_="0000000022DABC00" type="list" content-type="object" content-struct-name="db.mysql.StructuredDatatype" key="structuredTypes"/>
              <value _ptr_="0000000022DABC70" type="list" content-type="object" content-struct-name="db.mysql.Synonym" key="synonyms"/>
              <value _ptr_="0000000022DABCE0" type="list" content-type="object" content-struct-name="db.mysql.Table" key="tables">
                <value type="object" struct-name="db.mysql.Table" id="{F92DDB7E-AC06-48B4-A765-C9D056C7E95B}" struct-checksum="0x4564421a">
                  <value type="string" key="avgRowLength"></value>
                  <value type="int" key="checksum">0</value>
                  <value _ptr_="0000000022DABDC0" type="list" content-type="object" content-struct-name="db.mysql.Column" key="columns">
                    <value type="object" struct-name="db.mysql.Column" id="{BEA3BA30-156C-4AC3-9D68-FFE4B0509B28}" struct-checksum="0xba88e21c">
                      <value type="int" key="autoIncrement">0</value>
                      <value type="string" key="expression"></value>

it turns out to parse 5 nested nodes, that is, to get here
<value _ptr_="0000000022D85660" type="dict" key="options"/>

and accordingly to all nodes at the same level
<value type="object" struct-name="workbench.physical.Model"...

but it is not clear how. I can't get here
<value type="object" struct-name="db.mysql.Catalog"

This is how I try
xml_node<> *rootNode = doc.first_node();

  for (xml_node<>* node = rootNode->first_node(); node; node = node->next_sibling())
  {
    parseNode(node);
  }

  for (xml_node<>* node = rootNode->first_node()->first_node(); node; node = node->next_sibling())
  {
    parseNode(node);

    for (xml_node<>* nodeTwo = node->first_node(); nodeTwo; nodeTwo = nodeTwo->next_sibling())
    {
      parseNode(nodeTwo);
    }
  }

but if you add more nested loops, then the memory access error. Google did not help, there are examples on simple files. I will be glad for any help.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
X
Xano, 2017-03-27
@Cempl

First:
rootNode->first_node()->first_node() - are you sure this is the correct code? You skip nested nodes for all nodes except the first one.
Second,
try not to manually write nested loops for each level of nesting, like so:

void parseChildren( xml_node<> &parent )
{
    for( xml_node<>* node = parent .first_node()->first_node(); node; node = node->next_sibling() )
    {
        parseNode( node );
        parseChildren( *node );
    }
}

void foo()
{
    xml_node<> *rootNode = doc.first_node();
    if ( rootNode )
        parseChildren( *rootNode );
}

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question