P
P
Pavel Bogdanov2015-09-26 12:01:46
C++ / C#
Pavel Bogdanov, 2015-09-26 12:01:46

How to extract all the attribute values ​​of the desired tag from a line containing html?

There is a line containing html code. I want to pull out all the values ​​of the "action" attribute of the "form" tag. I've tried many different options and haven't found a solution yet.
Split and Html Agility Pack are not suitable.
I hope there is still a solution.
UPD:
Example.
The line consists of 10 containers, which consist of DIVs, each has a FORM. From FORM, you need to extract the value of the attribute.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
A
Aram Aramyan, 2015-09-26
@rebirther23

Why is Html Agility Pack not suitable? He just knows how to do it well.
You can, of course, write a regular expression, but the problem with HTML is that it's irregular. Those. an attribute may or may not have quotes, or it may have one. The tag may or may not be closed.
All this will have to be taken into account in the regular expression.

var matches = (new Regex("<form.*?action=(\"[^\\\"]+?\"|'[^\']+?'|[\\S]+?).*?>"
, RegexOptions.Singleline | RegexOptions.IgnoreCase)
).Matches("HTML CODE <form action=1><form action='2'><form action=\"3\">");
foreach(Match m in matches)
{
  var actionValue = m.Groups[1].Value.Trim(new char[]{'\'','"'});
}

Like this for example: https://dotnetfiddle.net/Iuuy56

T
Tsiren Naimanov, 2015-09-26
@ImmortalCAT

write an example string....
usually split is fine...

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question