Q
Q
qugu2014-04-08 18:55:25
Command line
qugu, 2014-04-08 18:55:25

How to implement XML parsing with shell and writing to csv format?

Good afternoon.
I need to parse the following xml document using standard shell tools and output the result in a la csv format:

<ns2:OperatorDefinition xmlns="urn:swift:saa:xsd:operatorprofile" xmlns:ns2="urn:swift:saa:xsd:impex:operator" xmlns:ns3="urn:swift:saa:xsd:unit" xmlns:ns4="urn:s
wift:saa:xsd:licenseddestination" xmlns:ns5="urn:swift:saa:xsd:operator">
    <ns2:Operator>
        <ns5:Identifier>
            <ns5:Name>IIvanov</ns5:Name>
        </ns5:Identifier>
        <ns5:Description>Ivanov, Ivan</ns5:Description>
        <ns5:OperatorType>HUMAN</ns5:OperatorType>
        <ns5:AuthenticationType>LOCAL</ns5:AuthenticationType>
        <ns5:Profile>
            <Name>PROFILE_1</Name>
        </ns5:Profile>
        <ns5:Unit>
            <ns3:Name>None</ns3:Name>
        </ns5:Unit>
        <ns5:Unit>
            <ns3:Name>Custody</ns3:Name>
        </ns5:Unit>
    </ns2:Operator>
</ns2:OperatorDefinition>

<ns2:OperatorDefinition xmlns="urn:swift:saa:xsd:operatorprofile" xmlns:ns2="urn:swift:saa:xsd:impex:operator" xmlns:ns3="urn:swift:saa:xsd:unit" xmlns:ns4="urn:s
wift:saa:xsd:licenseddestination" xmlns:ns5="urn:swift:saa:xsd:operator">
    <ns2:Operator>
        <ns5:Identifier>
            <ns5:Name>PPetrov</ns5:Name>
        </ns5:Identifier>
        <ns5:Description>Petrov, Petr</ns5:Description>
        <ns5:OperatorType>HUMAN</ns5:OperatorType>
        <ns5:AuthenticationType>LOCAL</ns5:AuthenticationType>
        <ns5:Profile>
            <Name>PROFILE_2</Name>
        </ns5:Profile>
        <ns5:Unit>
            <ns3:Name>None</ns3:Name>
        </ns5:Unit>
    </ns2:Operator>
</ns2:OperatorDefinition>

What I expect to see:
IIvanov;Ivanov,Ivan;HUMAN;LOCAL;PROFILE_1
PPetrov;Petrov, Petr;HUMAN;LOCAL;PROFILE_2

To solve the parsing problem, I used the advice on stackoverflow: stackoverflow.com/questions/893585/how-to-parse-xm...
read_dom () {
    local IFS=\>
    read -d \< ENTITY CONTENT
}
while read_dom; do
    if  ; then
        echo $CONTENT
    fi
done < input.xml

Question: how to modify the script to get the desired result?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
A
axell1, 2014-04-14
@axell1

something like this
egrep '||||' 1 | awk -F\> '{print $2}' | awk -F\< '{print $1}' | sed 's/$/;/g;:a;N;0~5!ba;s/\n//g;'
IIvanov;Ivanov, Ivan;HUMAN;LOCAL;PROFILE_1;
PPetrov;Petrov, Petr;HUMAN;LOCAL;PROFILE_2;
first we grab the necessary lines, then we cut off the tags, then we put a semicolon at the end of all lines and as a result we combine 5 lines into one. If the last semicolon is not needed, cut it off.
Not pure bash, but usable

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question