M
M
MrQwerty2017-03-29 21:59:06
Data processing
MrQwerty, 2017-03-29 21:59:06

How to remove xml formatting?

Good afternoon. There is an xml document (Wikipedia dump, to be exact).
head -n 30 showed it was formatted. How to reformat it from type A to B (remove line breaks and tabs / spaces needed to improve human readability)? The parser will not care in the future.
Googling didn't lead to anything.
To the questions "Why?" - Wikipedia dumps are not small and there are a lot of these signs.

View A
<?xml version="1.0" encoding="UTF-8"?>
<note>
  <to>Tove</to>
  <from>Jani</from>
  <heading>Reminder</heading>
  <body>Don't forget me this weekend!</body>
</note>

Type B
<?xml version="1.0" encoding="UTF-8"?><note><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body></note>

Answer the question

In order to leave comments, you need to log in

2 answer(s)
D
Dimonchik, 2017-03-30
@dimonchik2013

you can get confused with SAX (like for digesting large XML files), so the integrity will be accurately preserved,
or you can use a simple regular expression (>\s<, >< )

V
Vapaamies, 2017-03-31
@vapaamies

It seems that there should be XML TiDy, or the latest versions of TiDy (from GitHub) can also work with XML.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question