I
I
IvanN7772016-04-08 15:31:24
PHP
IvanN777, 2016-04-08 15:31:24

Memory leak when parsing SAX method (XML), what did I do wrong?

class XmlSaxClass {
    
   private $file_path;
   private $encoding;
   
   private $output   = array();
   private $element   = null;
   
   private $full_path_elements = array();
   private $necessary_elements = array();
   private $branch_elements = array();
   private $deep;
   
   public function __construct($file_path, $encoding = 'UTF-8'){
      $this -> encoding = $encoding;
      $this-> file_path = $file_path;
   }
   
   public function execute($necessary_elements, $full_path_elements){

        $this -> necessary_elements = $necessary_elements;
        $this -> full_path_elements = $full_path_elements;
       
        $parser = xml_parser_create($this -> encoding); 
        xml_set_object($parser, $this);
        xml_set_element_handler($parser, 'startElements', 'endElements');
        xml_set_character_data_handler($parser, "characterData");
        xml_parser_set_option($parser, XML_OPTION_CASE_FOLDING, false);
        
        $handle = fopen($this-> file_path, "r");
        $base_memory_usage = memory_get_usage();
        while ($data = fread($handle, 4096)) {
           xml_parse($parser, $data, feof($handle));
           echo("<br>");
           print(memory_get_usage() - $base_memory_usage);
        }
        xml_parser_free($parser);
        
        return $this -> output;
   }
   
   private function startElements($parser, $name, $attrs){
      $this -> deep++;
       
      if(!empty($name)) {
            if($this -> deep == 1){
                if ($name == $this -> full_path_elements[0]){
                    $this -> branch_elements[0] = $name;
                }
            }
            else{
                $this -> cut_elements_in_branch_if_down();
                $this -> push_element_to_branch($name);
            }
            
            $branch_elements_count = count($this -> branch_elements);
            
            if($branch_elements_count == count($this -> full_path_elements)){
                if($this -> deep == $branch_elements_count){
                    $this -> output[] = array();
                }
                elseif (in_array($name, $this -> necessary_elements)){
                    $this -> element = $name;
                }
            }
      }
   }
   
   private function endElements($parser, $name){
      $this -> deep --;
      if(!empty($name)) {
         $this -> element = null;
      }
   }
   
   private function characterData($parser, $data){
      if(!empty($data)) {
         if (in_array($this -> element, $this -> necessary_elements)) {
            $this ->output[count($this -> output)-1][$this -> element] = trim($data);
         }
      }
   }
   
   private function cut_elements_in_branch_if_down(){
       if (count($this -> branch_elements) > $this -> deep -1 ){
           $this -> branch_elements = array_slice($this -> branch_elements, 0,  $this -> deep -1); 
       }
   }
   
   private function push_element_to_branch($name){
       if (count($this -> branch_elements) == (($this -> deep)-1)){
           if (isset($this -> full_path_elements[$this -> deep-1] ) && 
                   $name == ($this -> full_path_elements[($this -> deep)-1])){
               $this -> branch_elements[($this -> deep)-1] = $name;
           }
       }
   }

}

Here is a small simple script. In the execute method, the xml document is parsed and the leaves of the current branch are taken.
The first parameter is a list of attributes, the second is the full path to the desired xml branch.
$sax_parser = new \XmlSaxClass($file_path);
    $response = $sax_parser->execute(array(
        'learner',
        'teacher'
    ), array(
        'school',
        'class'
    ));

In theory, I wrote using the SAX method, but there is a memory leak and cannot process 1g.
Perhaps I misunderstand the SAX method?
In theory, it is parsed in parts without loading the entire DOM tree.
But apparently I did something wrong.
The script normally works on small values, but on large ones it fails.
I delete all echo of course, here only for an example of memory loss.
Here is a very simple xml
<?xml version="1.0" encoding="UTF-8"?>
<school>
    <class>
        <learner>Ученик1</learner>
    </class>

    <class>
        <learner>Ученик2</learner>
    </class>

    <class>
        <learner>Ученик3</learner>
       <teacher>Училка</teacher>
    </class>
</school>

Answer the question

In order to leave comments, you need to log in

1 answer(s)
I
Igor Makarov, 2016-04-11
@IvanN777

Apparently, the $output array grows very large, there is not enough memory. You can solve it in different ways:
1. make a loop in the execute method with yield if php >= 5.5, but you will need to monitor the consistency of the $output array in order to delete the given data and add new ones
2. wrap it in an ArrayIterator, but you will also need to write other code
3. pass to execute callback a function that will be called in the endElements method and if $name is equal to 'class', that is, when the wrapper is closed

private function endElements($parser, $name){
        $this -> deep --;
        if(!empty($name)) {
            $this -> element = null;
        }

        // $this->wrapper == 'class'
        if ($name === $this->wrapper && !empty($this->output)) {
                // отдать данные в коллбэк
                call_my_callback($this->output);
                // очистить буфер
                $this->output = [];
        }
    }

ps . an error may occur in the characterData method, data can be added to this method, for example:
fread returned data that ends in "<learner> Uche", as a result, "Uche" will be written to the data by the parser, "nick1" will arrive in the method the next time it is called , and overwrite "Uche".
you need to do something like:
private function characterData($parser, $data){
        if(!empty($data)) {
            if (in_array($this->element, $this->necessary_elements)) {
                if (!isset($this->output[$this->element])) {
                    $this->output[$this->element] = '';
                }
                $this->output[$this->element] .= trim($data);
            }
        }
    }

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question