I
I
iXelper2019-01-21 23:16:07
C++ / C#
iXelper, 2019-01-21 23:16:07

Parsing html with c# standard tools?

There is an html code with an element:

<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">Hello word</div>

Xpath: //*[@id="Text"]/div[2]/ol/li/div
Question: How to pull out the string "Hello Word" using Regex
Sorry if I ask stupid questions, but a hopeless situation))

Answer the question

In order to leave comments, you need to log in

3 answer(s)
S
Sumor, 2019-01-21
@iXelper

If you have the correct html with closing tags, then you can try using XDocument or XElement
Conditionally, somewhere like this:

var xEl = XElement.Parse("<div style=\"font-family: 'Courier New', Courier, monospace; font-weight: normal\">Hello word</div>");
Console.WriteLine((string)xEl);

You can also use XPath there. Somewhere like this:
var xDoc = XDocument.Parse("<div><div class='c1'>c1</div><div class='c2'>c2</div><div class='c3'>c3</div></div>");

string xPath = "//div[@class='c1']";

foreach (var xElement in xDoc.XPathSelectElements(xPath))
{
  Console.WriteLine((string)xElement);
}

S
sergey, 2019-01-22
kuzmin @sergueik

for poorly defined html, there is HTMLAgilityPack CsQuery
Fizzler (I myself once tried this) and there are a lot of alternatives https://stackoverflow.com/questions/1065031/is-the... there is nuget

M
Milton812, 2019-01-24
@Milton812

If you really need to parse HTML using regular expressions, you can do this

string html = "<div style=\"font - family: 'Courier New', Courier, monospace; font - weight: normal; \">Hello word</div>";
Regex regex = new Regex("<div style=\"font - family: 'Courier New', Courier, monospace; font - weight: normal; \">(.*)<\\/div>");
string text = regex.Match(html).Groups[1].Value;

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question