A
A
Artyom Zubkov2011-02-22 09:07:50
Parsing
Artyom Zubkov, 2011-02-22 09:07:50

Awk parsing xml?

Hey!
Given:
1. xml file

<font color="black"><font color="#0000ff">&lt;?</font><font color="#800000">xml</font> <font color="#ff0000">version</font>=<font color="#ff0000">1</font>.<font color="#ff0000">0</font>?<font color="#0000ff">&gt;</font><br/>
<font color="#0000ff">&lt;</font><font color="#800000">file_events</font><font color="#0000ff">&gt;</font><br/>
  <font color="#0000ff">&lt;</font><font color="#800000">event</font> <font color="#ff0000">date</font><font color="#0000ff">=&quot;1254728164000&quot;</font> <font color="#ff0000">author</font><font color="#0000ff">=&quot;Bin/.svn/entries&quot;</font> <font color="#ff0000">filename</font><font color="#0000ff">=&quot;f4d64c1a/497b733f81c2866d/81c2866da7e4d268.68&quot;</font> <font color="#ff0000">action</font><font color="#0000ff">=&quot;D&quot;</font> <font color="#ff0000">comment</font><font color="#0000ff">=&quot;&quot;</font><font color="#0000ff">/&gt;</font><br/>
  <font color="#0000ff">&lt;</font><font color="#800000">event</font> <font color="#ff0000">date</font><font color="#0000ff">=&quot;1254728164000&quot;</font> <font color="#ff0000">author</font><font color="#0000ff">=&quot;Bin/.svn/entries&quot;</font> <font color="#ff0000">filename</font><font color="#0000ff">=&quot;51d46ff1/fdb0cf112ec24d1e/2ec24d1e87c7a87a.7a&quot;</font> <font color="#ff0000">action</font><font color="#0000ff">=&quot;D&quot;</font> <font color="#ff0000">comment</font><font color="#0000ff">=&quot;&quot;</font><font color="#0000ff">/&gt;</font><br/>
  <font color="#0000ff">&lt;</font><font color="#800000">event</font> <font color="#ff0000">date</font><font color="#0000ff">=&quot;1254728164000&quot;</font> <font color="#ff0000">author</font><font color="#0000ff">=&quot;Bin/.svn/entries&quot;</font> <font color="#ff0000">filename</font><font color="#0000ff">=&quot;384bccff/ba9fc3f089695f6d/89695f6dea4210c1.c1&quot;</font> <font color="#ff0000">action</font><font color="#0000ff">=&quot;D&quot;</font> <font color="#ff0000">comment</font><font color="#0000ff">=&quot;&quot;</font><font color="#0000ff">/&gt;</font><br/>
  <font color="#0000ff">&lt;</font><font color="#800000">event</font> <font color="#ff0000">date</font><font color="#0000ff">=&quot;1254728164000&quot;</font> <font color="#ff0000">author</font><font color="#0000ff">=&quot;Bin/.svn/entries&quot;</font> <font color="#ff0000">filename</font><font color="#0000ff">=&quot;486c2459/24e0b8e2d1c311d8/d1c311d80290ed01.01&quot;</font> <font color="#ff0000">action</font><font color="#0000ff">=&quot;D&quot;</font> <font color="#ff0000">comment</font><font color="#0000ff">=&quot;&quot;</font><font color="#0000ff">/&gt;</font><br/>
  <font color="#0000ff">&lt;</font><font color="#800000">event</font> <font color="#ff0000">date</font><font color="#0000ff">=&quot;1254728164000&quot;</font> <font color="#ff0000">author</font><font color="#0000ff">=&quot;Bin/.svn/entries&quot;</font> <font color="#ff0000">filename</font><font color="#0000ff">=&quot;415eef3b/1c681c2b8a542c77/8a542c77cb1839ce.ce&quot;</font> <font color="#ff0000">action</font><font color="#0000ff">=&quot;D&quot;</font> <font color="#ff0000">comment</font><font color="#0000ff">=&quot;&quot;</font><font color="#0000ff">/&gt;</font><br/>
  <font color="#0000ff">&lt;</font><font color="#800000">event</font> <font color="#ff0000">date</font><font color="#0000ff">=&quot;1254728164000&quot;</font> <font color="#ff0000">author</font><font color="#0000ff">=&quot;Bin/.svn/entries&quot;</font> <font color="#ff0000">filename</font><font color="#0000ff">=&quot;b3008424/6da995605f28165c/5f28165c84475335.35&quot;</font> <font color="#ff0000">action</font><font color="#0000ff">=&quot;D&quot;</font> <font color="#ff0000">comment</font><font color="#0000ff">=&quot;&quot;</font><font color="#0000ff">/&gt;</font><br/>
  <font color="#0000ff">&lt;</font><font color="#800000">event</font> <font color="#ff0000">date</font><font color="#0000ff">=&quot;1254728164000&quot;</font> <font color="#ff0000">author</font><font color="#0000ff">=&quot;Bin/.svn/entries&quot;</font> <font color="#ff0000">filename</font><font color="#0000ff">=&quot;ff4d0e6d/ea7152595adb7c97/5adb7c97bf59427e.7e&quot;</font> <font color="#ff0000">action</font><font color="#0000ff">=&quot;D&quot;</font> <font color="#ff0000">comment</font><font color="#0000ff">=&quot;&quot;</font><font color="#0000ff">/&gt;</font><br/>
<font color="#0000ff">&lt;/</font><font color="#800000">file_events</font><font color="#0000ff">&gt;</font></font><br/>
<br/>
<font color="gray">* This source code was highlighted with <a href="http://virtser.net/blog/post/source-code-highlighter.aspx"><font color="gray">Source Code Highlighter</font></a>.</font>

Moreover, in the node event , the order of the arguments can be arbitrary.
Task:
1. Convert the given file to the following format:
date|author|action|filename|comment
2. optional Sort the data by the date parameter.
Basically I do this:
<font color="black">cat $1 | \<br/>
grep -e <font color="#A31515">&quot;event &quot;</font> | \<br/>
sed -e <font color="#A31515">&quot;s/^[   ]*//&quot;</font> | \<br/>
awk <font color="#A31515">'<br/>
  $2 ~ /data/ { p1=$2; } <br/>
  $2 ~ /author/ { p2=$2; } <br/>
  $2 ~ /action/ { p3=$2;} <br/>
  $2 ~ /filename/ { p4=$2; } <br/>
  $2 ~ /comment/ { p5=$2; } <br/>
  <br/>
  $3 ~ /data/ { p1=$3; } <br/>
  $3 ~ /author/ { p2=$3; } <br/>
  $3 ~ /action/ { p3=$3; } <br/>
  $3 ~ /filename/ { p4=$3; } <br/>
  $3 ~ /comment/ { p5=$3; } <br/>
  <br/>
  $4 ~ /data/ { p1=$4; } <br/>
  $4 ~ /author/ { p2=$4; } <br/>
  $4 ~ /action/ { p3=$4; } <br/>
  $4 ~ /filename/ { p4=$4; } <br/>
  $4 ~ /comment/ { p5=$4; } <br/>
  <br/>
  $5 ~ /data/ { p1=$5; } <br/>
  $5 ~ /author/ { p2=$5; } <br/>
  $5 ~ /action/ { p3=$5; } <br/>
  $5 ~ /filename/ { p4=$5; } <br/>
  $5 ~ /comment/ { p5=$5; }     <br/>
  <br/>
  $6 ~ /data/ { p1=$6; } <br/>
  $6 ~ /author/ { p2=$6; } <br/>
  $6 ~ /action/ { p3=$6; } <br/>
  $6 ~ /filename/ { p4=$6; } <br/>
  $6 ~ /comment/ { p5=$6; }<br/>
  <br/>
  { print p1&quot;|&quot;p2&quot;|&quot;p3&quot;|&quot;p4&quot;|&quot;p5&quot;\n&quot;; } '</font> | \<br/>
sort -t <font color="#A31515">&quot;|&quot;</font> -k1 &gt; $result <br/>
</font><br/>
<font color="gray">* This source code was highlighted with <a href="http://virtser.net/blog/post/source-code-highlighter.aspx"><font color="gray">Source Code Highlighter</font></a>.</font>

at the output I have:
<font color="black"><font color="#0000ff">date</font>=&quot;1254728164000&quot;|author=&quot;Bin/.svn/entries&quot;|<font color="#0000ff">action</font>=&quot;D&quot;|filename=&quot;f4d64c1a/497b733f81c2866d/81c2866da7e4d268.68&quot;|comment=&quot;&quot;/&gt;<br/>
<font color="#0000ff">date</font>=&quot;1254728164000&quot;|author=&quot;Bin/.svn/entries&quot;|<font color="#0000ff">action</font>=&quot;D&quot;|filename=&quot;51d46ff1/fdb0cf112ec24d1e/2ec24d1e87c7a87a.7a&quot;|comment=&quot;&quot;/&gt;<br/>
<font color="#0000ff">date</font>=&quot;1254728164000&quot;|author=&quot;Bin/.svn/entries&quot;|<font color="#0000ff">action</font>=&quot;D&quot;|filename=&quot;384bccff/ba9fc3f089695f6d/89695f6dea4210c1.c1&quot;|comment=&quot;&quot;/&gt;<br/>
<font color="#0000ff">date</font>=&quot;1254728164000&quot;|author=&quot;Bin/.svn/entries&quot;|<font color="#0000ff">action</font>=&quot;D&quot;|filename=&quot;486c2459/24e0b8e2d1c311d8/d1c311d80290ed01.01&quot;|comment=&quot;&quot;/&gt;<br/>
<font color="#0000ff">date</font>=&quot;1254728164000&quot;|author=&quot;Bin/.svn/entries&quot;|<font color="#0000ff">action</font>=&quot;D&quot;|filename=&quot;415eef3b/1c681c2b8a542c77/8a542c77cb1839ce.ce&quot;|comment=&quot;&quot;/&gt;<br/>
<font color="#0000ff">date</font>=&quot;1254728164000&quot;|author=&quot;Bin/.svn/entries&quot;|<font color="#0000ff">action</font>=&quot;D&quot;|filename=&quot;b3008424/6da995605f28165c/5f28165c84475335.35&quot;|comment=&quot;&quot;/&gt;<br/>
<font color="#0000ff">date</font>=&quot;1254728164000&quot;|author=&quot;Bin/.svn/entries&quot;|<font color="#0000ff">action</font>=&quot;D&quot;|filename=&quot;ff4d0e6d/ea7152595adb7c97/5adb7c97bf59427e.7e&quot;|comment=&quot;&quot;/&gt;<br/>
<font color="#0000ff">date</font>=&quot;1254728164000&quot;|author=&quot;Bin/.svn/entries&quot;|<font color="#0000ff">action</font>=&quot;D&quot;|filename=&quot;a0c052d4/b0a0b0c0f70a7d29/f70a7d29231dacbd.bd&quot;|comment=&quot;&quot;/&gt;<br/>
<font color="#0000ff">date</font>=&quot;1254728164000&quot;|author=&quot;Bin/.svn/entries&quot;|<font color="#0000ff">action</font>=&quot;D&quot;|filename=&quot;eabd8551/ccb2616f5be66fdb/5be66fdb0d4c9a77.77&quot;|comment=&quot;&quot;/&gt;<br/>
<font color="#0000ff">date</font>=&quot;1254728164000&quot;|author=&quot;Bin/.svn/entries&quot;|<font color="#0000ff">action</font>=&quot;D&quot;|filename=&quot;25046ffa/0dfcd577c31d07d8/c31d07d855ade3e5.e5&quot;|comment=&quot;&quot;/&gt;<br/>
<font color="#0000ff">date</font>=&quot;1254728164000&quot;|author=&quot;Bin/.svn/entries&quot;|<font color="#0000ff">action</font>=&quot;D&quot;|filename=&quot;cb86925a/bf4f23acb14c6c47/b14c6c474628ff82.82&quot;|comment=&quot;&quot;/&gt;<br/>
<font color="#0000ff">date</font>=&quot;1254728164000&quot;|author=&quot;Bin/.svn/entries&quot;|<font color="#0000ff">action</font>=&quot;D&quot;|filename=&quot;51d46ff1/fdb0cf112ec24d1e/2ec24d1e87c7a87a.7a&quot;|comment=&quot;&quot;/&gt;<br/>
<font color="gray">Source Code Highlighter</font></a>.</font>

Answer the question

In order to leave comments, you need to log in

4 answer(s)
F
faust0, 2011-02-22
@faust0

BEGIN {
RS="/>"
}
{
fields = 0;
for(i = 1; i <= NF; i++) {
if($i=="<event") {
fields = 1;
continue;
}
if(!fields) continue;
split($i, a, "[=\"]");
res[a[1]] = a[3];
}
print res["date"]"|"res["author"]"|"res["action"]"|"res["filename"]"|"res["comment"]
}

K
Konstantin Kitmanov, 2011-02-22
@k12th

It's a shame to parse XML with regular expressions.
Try xml-coreutils or XMLStarlet

M
mitry, 2011-02-22
@mitry

Maybe use xmllintfrom libxml2(I don’t know if msysgit is included, but, in theory, it should) or windows msxsland write a transformation for this task xslt?

@
@sledopit, 2011-02-22
_

sketched in haste:
xml2 < 1 | sed 's=/file_events/event[/]*[@]*==;' | awk '/^$/{s++}{printf "%05d %s\n",s,$0}' | sort -k1 -k2rn | sed 's/^[^ ]* //;s/[^=]*=//;s/^$/\&\&\&/' | tr '\n' '|' | sed 's/|&&&|/\n/g'
xml2 from the xml2 package. turns xml d like this:
/file_events/event
/file_events/event/@date=1254728164000
/file_events/event/@author=Bin/.svn/entries
/file_events/event/@filename=ff4d0e6d/ea7152595adb7c97/5adb7c97bf59427e.7e
/file_events/event/@action=D
/file_events/event/@comment

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question