P
P
Pavel2017-08-19 07:48:13
Python
Pavel, 2017-08-19 07:48:13

How to import large xml file into PostgreSQL (sqlalchemy)?

Now I do like this:

class XmlHandler(ContentHandler):

    def startElement(self, name, attrs):
        if name == 'Objects':
            print('start')
        elif name == 'Object':
             self.obj = attrs['param]'

    def endElement(self, name):
        if name == 'Objects':
            print('end')
        elif name == 'Object':
             self.obj.save()

xml.sax.parse(xml_path, XmlHandler())

Which, of course, is very slow.
And related questions:
1. How to do it faster?
2. How to calculate which operations of this particular section of code take the most CPU time?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
P
Pavel, 2017-08-19
@toobinks

At least I'm wrong that I did not show the entire implementation of the method.
At the endElement at the end I had gc.collect() The
profiler suggested that the first problem was
pix.toile-libre.org/upload/original/1503120966.png
After
pix.toile-libre.org/upload/original/1503121255. png
search for an object in the database

class Obj(Model):
    id = Column(UUID, primary_key=True)

class XmlHandler(ContentHandler):
    def __init__(self):
        pass

    def startElement(self, name, attrs):
        if name == 'Objects':
            print('start parse')
        elif name == 'Object':
            aoid = attrs['ID']
            self.obj = Obj.query.filter_by(id=id).first() # это 2-ое узкое место
            if not self.obj:
                self.obj = Obj()


    def endElement(self, name):
        if name == 'Objects':
            print('end')
        elif name == 'Object':
        	self.obj.save()
        gc.collect() # это было 1-ое узкое место

xml.sax.parse(xml_path, XmlHandler())

What can be done about it?

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question