Why does sed handle occurrences (regexes) incorrectly?

I

Ivan2020-05-21 22:20:31

linux

Ivan, 2020-05-21 22:20:31

Hello everybody!
Task:
There is a line with tags, you need to extract Commanda1 and Comanda2 from it.

</td><td class="tdteamname2">Comanda1</td><td class="tdteamname2">Comanda2</td>

I decided to implement the task through sed. With other tags, this option was quite successful, so I want to ask you to help figure out why sed does not correctly process entries on this tag.

Implementation (for clarity, a short version of the code and --- instead of nothing):
Occurrences work

[[email protected] share]$ cat  2str | sed 's/me2">/---/'
</td><td class="tdteamna---Comanda1</td><td class="tdteamname2">Comanda2</td>

And the second entry

[[email protected] share]$ cat  2str | sed 's/me2">/---/2'
</td><td class="tdteamname2">Comanda1</td><td class="tdteamna---Comanda2</td>

But the problem arises when I want to clear the line from the very beginning, since I have it implemented in other tags. Sed simply ignores the first occurrence. I also tried to explicitly indicate on which occurrence it is necessary to execute.

[[email protected] share]$ cat  2str | sed 's/<.*me2">/---/'
---Comanda2</td>

Clarification: other tags from which I pulled out information and this successfully happened were unique in the line.

Unfortunately I couldn't find anything on this issue. Please help me to sort out this problem.
Thanks in advance!

Reply

Answer the question

In order to leave comments, you need to log in

[[+comments_count]] answer(s)

V

vaut, 2020-05-21
@vaut

What you are looking for is called "non-greedy matching" or "non-greedy searching".
As far as I know, sed doesn't have it.

K

ky0, 2020-05-21
@ky0

If your number of characters (read - tags) is always the same, it's easier not to bother with regular expressions, but simply bite off the necessary parts of the string, for example, using tr.

V

Victor Taran, 2020-05-22
@shambler81

once.
https://github.com/EricChiang/pup
2. if you need to parse something complex but not on an ongoing basis, even a schoolboy will learn.
https://chrome.google.com/webstore/detail/web-scra...
3. special programs xmlstarlet, html-xml-utils .... and so on are generally better suited for this
4. 1 variable 2 variable
cut -d '>' -f3 111.txt | sed 's/<\/td//g'
cut -d '>' -f5 111.txt | sed 's/<\/td//g'