A
A
Alexey2021-10-27 19:33:33
PowerShell
Alexey, 2021-10-27 19:33:33

Powershell: how to replace by regexp in found by regexp?

There is an XML file with an approximate structure

<product> ...
   <name> ....
      <desc> HTML-код </desc>


There are about 1000 occurrences of the desc tag.
The task is to remove the HTML tags inside each desc tag, and keep the rest of the content of the file.
<product> ...
   <name> ....
      <desc> только текст без тегов  </desc>


It turns out only to save the file with the contents of all desc tags without HTML:

[regex]::matches((Get-Content D:\test2.xml),"(?<=(<desc>)).*?(?=(</desc>))").value -replace '<[^>]*>', '' | sc d:\3.xml'


And how do you keep the source file stripped of HTML inside each desc tag?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
D
dodo512, 2021-10-28
@lxnvr

[regex]::replace(
    (Get-Content D:\test2.xml),
    '(?<=<desc>).*?(?=</desc>)', 
    { $args[0].Value -replace '<[^>]*>', '' }
)

M
MaxKozlov, 2021-10-28
@MaxKozlov

Alexey , the general wording is some kind of article on regexps, this has nothing to do with powershell. They are almost all olina now.
But even in the general case, if you iterate over only the found, it is not surprising that only the found is in the output.
then you need to move away from the onliner in one line and make a full-fledged cycle using the position of the found one in order to record the gaps too.
Or, (powershell only), for example, use and then clean in a cycle. But converting to [xml], it seems to me, is easier -split '(<desc>.*?</desc>)'

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question