Regular expressions. How to extract text from a site?

V

VirusesAnalystCoder2020-06-20 23:53:50

C++ / C#

VirusesAnalystCoder, 2020-06-20 23:53:50

Well, I can not understand how these regular expressions are arranged!
In general, there is such a text on satya:

<span>Оператор:</span> 	*оператор*	</div>
<div class="***"><span>Регион:</span>  *регион*</div>

You need to pull out the region and the operator!
How to do it? Please post the correct regular expressions.
If you throw off the code, then comment, because I can’t normally understand how to pull it all out ...

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

O

oleg_ods, 2020-06-21
@VirusesAnalystCoder

Regular expressions are not designed for parsing html. Try using specialized tools. For example, read about the AngleSharp library.

A

Anton, 2020-06-21
Semenov

There is code and comments
https://www.geeksforgeeks.org/what-is-regular-expr...
here you can test regex regexstorm.net/tester
here examples regexstorm.net/reference
Study the examples and try to compose what you need , as an option, if it’s really tight, then stupidly delete the unnecessary through the search and replace with empty in the line to search only in the text and not in the html markup.

public static string StripHTML(string input)
{
   return Regex.Replace(input, "<.*?>", String.Empty);
}