M
M
Mitya Kolesnikov2013-11-12 19:42:01
MySQL
Mitya Kolesnikov, 2013-11-12 19:42:01

Parsing an XML file using MySQL

Hello, please help me understand the import of data from an XML file into MySQL. The file itself is quite large, and it needs to be parsed every 20 minutes. Therefore, if possible, we would like to avoid solutions with which the file would have to be accessed several times for the next run in the database.

Its structure is as follows (it cannot be changed, this is an API from a third-party service):

<data>
  <user id="..." login="..." realname="...">
    <items>
      <item_1 img="...">...</item_1>
      <item_2 img="...">...</item_2>
    </items>
  </user>
  <user id="..." login="..." realname="...">
    <items>
      <item_1 img="...">...</item_1>
      <item_2 img="...">...</item_2>
    </items>
  </user>
</data>


My db table is created like this:
CREATE TABLE user (
    id INT PRIMARY KEY NOT NULL,
    login VARCHAR(40) NOT NULL,
    realname VARCHAR(40) NOT NULL,
    item_1 VARCHAR(40) NOT NULL
    item_2 VARCHAR(40) NOT NULL
);


The query I am running to import data into the database is:
LOAD XML INFILE 'file.xml'
REPLACE INTO TABLE user
ROWS IDENTIFIED BY '<user>';


The question is how you can get the data from item_1 that is in the img attribute. When I create a field named “img” in the table, the data from the attribute of the last item tag is parsed there, and duplication of field names in MySQL is not allowed, and attribute names cannot be changed. If anyone knows how else you can access the attribute via LOAD XML, please tell me.

And another question, is it possible, within the framework of such a request, to delete from the database those records whose id is not found in the XML file?

Thank you!

Answer the question

In order to leave comments, you need to log in

3 answer(s)
E
edogs, 2013-11-12
@mittus

you cannot change it, this is an API from a third-party service
load xml - takes a file from the local FS, what prevents (in one way or another) from processing the xml (even if in the console with a regular program cutting out all unnecessary items)?
We received the file from the API, processed it, uploaded it to the muscle.
Is it possible to delete from the database those records whose id is not found in the XML file within the framework of such a request?
Not directly. You can import data from CML into a temporary table and, based on the IDs that appear, remove everything unnecessary from the main one (delete from table main where id not in (select from table tempforxml))
The question is how you can get the data from item_1 which is in the img attribute. When I create a field named “img” in the table, the data from the attribute of the last item tag is parsed there, and duplication of field names in MySQL is not allowed, and attribute names cannot be changed. If anyone knows how else you can access the attribute via LOAD XML, please tell me.
You can't directly. People are perverted with load data infile followed by work with xml in the database, like this newtover.tumblr.com/post/14858246616/mysql-load-data-from-xml . But this is unlikely to be an efficient way.

S
serverkon, 2013-11-12
@servekon

In my experience, MySQL+XML is a very resource-demanding bundle, with a file size of more than 50 MB, problems begin even with LOAD XML.
There is such a thing as xml2json . Thus, my advice is to translate XML into another format and work with it already.

D
Dmitry Skogorev, 2013-11-12
@EnterSandman

Look at the request "mysql xml ExtractValue attribute"
and may you be happy =)

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question