A
A
Alexey2020-04-21 04:32:11
PowerShell
Alexey, 2020-04-21 04:32:11

How to remove text between words from txt/xml file?

Is it possible to solve such problems with the help of regular expressions, if so, I ask for help.

There is a file split into lines, UTF-8 encoding.

1. From the beginning of the file, you need to delete everything up to the word
<xml_catalog

2. Delete everything between

<catalog>
...
<items>


I wrote the PS command - (gc source.xml -Encoding UTF8) -replace ' %regexp% ', '' >out.xml , but there is no way with the regular expression

Answer the question

In order to leave comments, you need to log in

3 answer(s)
A
Alexey, 2020-04-21
@lxnvr

I solved the problem like this:
1. There are always 4 garbage lines in the file, so we just delete the first 4 lines:

(gc source.xml -Encoding UTF8 | select -Skip 4) >out1.xml

| sc out1.xml won't work because will convert line break characters.
2. Thank you dodo512
(gc out1.xml -Encoding UTF8 | out-string) -replace '(?s)(?<=<catalog>).*?(?=<items>)', '' >out.xml

10-MB file processes in a couple of seconds.

M
MaxKozlov, 2020-04-21
@MaxKozlov

0. Файл нужно слепить в одну строку
1. ".*<xml_catalog'"
2. "<catalog>.*?<items>"

D
dodo512, 2020-04-21
@dodo512

(gc source.xml -Encoding UTF8 | out-string) -replace '(?s).*?(?=<xml_catalog)', '' -replace '(?s)(?<=<catalog>).*?(?=<items>)', '' >out.xml

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question