R
R
rda19022015-05-07 18:53:01
Ruby on Rails
rda1902, 2015-05-07 18:53:01

How to create large xml files correctly?

Hello everyone!)) There was a task to create a large xml file for sphinx, about 40-50mb. I use nokogiri. In most cases, the task fails with the error NoMemoryError: failed to allocate memory

/usr/local/rvm/gems/ruby-2.2.2/gems/nokogiri-1.6.6.2/lib/nokogiri/xml/node.rb:686:in `write'
/usr/local/rvm/gems/ruby-2.2.2/gems/nokogiri-1.6.6.2/lib/nokogiri/xml/node.rb:686:in `native_write_to'
/usr/local/rvm/gems/ruby-2.2.2/gems/nokogiri-1.6.6.2/lib/nokogiri/xml/node.rb:686:in `write_to'
/usr/local/rvm/gems/ruby-2.2.2/gems/nokogiri-1.6.6.2/lib/nokogiri/xml/node.rb:618:in `serialize'
/usr/local/rvm/gems/ruby-2.2.2/gems/nokogiri-1.6.6.2/lib/nokogiri/xml/builder.rb:346:in `to_xml'

On vps 2gb of memory. During the formation of xml, the process has up to 1.5gb of memory. Here is the code
builder = Nokogiri::XML::Builder.new(:encoding => 'utf-8')
    builder['sphinx'].docset('xmlns:sphinx' => 'http://sphinxsearch.com/') {
      builder.schema {
        builder.field(name: :name, attr: :string)
        builder.field(name: :content, attr: :string)
        builder.field(name: :sender, attr: :string)
        builder.field(name: :recipient, attr: :string)
      }
      SpamMail.includes(:recipient).find_each do |sm|
          content = sm.content
        builder.document(id: sm.id) { |doc|
          doc.parent.default_namespace = ''
          doc.name sm.name
          doc.content content
          doc.sender sm.sender_email
          doc.recipient sm.recipient.email
        }
      end
    }
File.open(File.join(ENV['SPHINX_DIR'],"out_#{Rails.env}.xml"), "w") do |f|
      f.puts xml
    end

What can you advise for optimization?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
A
Alexader Klyanchin, 2015-05-07
@jbmeerkat

I think libxml is good for this purpose . Nokogiri is a great solution for parsing files, but it's better not to use it for building large files.

V
Viktor Vsk, 2015-05-07
@viktorvsk

There are two kinds of XML parsers/builders: DOM and SAX. DOM first builds a complete tree model of the document, then processes it. SAX - handles element by element.
For example, look at the options here:
www.plugingeek.com/categories/xml-parsers-and-buil...
https://www.ruby-toolbox.com/categories/xml_mapping
Although, your case is so simple, sort of like, that you can also manually compose a file:
- Selected records
- Process one at a time
- Each record adds appends to the file

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question