X
X
xne712472015-04-02 16:02:36
Java
xne71247, 2015-04-02 16:02:36

What regular expression to use to search for a block of Russian text?

There is an HTML page, you need to pull out a block of Russian text from it. There is only one block of text in Russian. Advise the regular expression or how it is better to make it?

Answer the question

In order to leave comments, you need to log in

4 answer(s)
S
SagePtr, 2015-04-02
@SagePtr

Depending on what is meant by the block. If there are no tags inside it, then you can somehow cut it out like this:
>([^<]*[A-Zaa-z][^<]*)<
As a result, everything between >< and contains at least one Russian letter. This is offhand, so be sure to make sure that the regularizer works with exactly the same encoding.

P
programmerjava, 2015-04-02
@programmerjava

use jsoup .
for REGEX, set a regular expression that includes Russian characters, punctuation marks.

S
ShamblerR, 2015-04-03
@ShamblerR

could you give me an example of the page

A
asd111, 2015-04-03
@asd111

Check this site
https://regex101.com/
text text текст text Текст теКСТ

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question