Answer the question
In order to leave comments, you need to log in
Why do regular expressions work this way?
In python reg. expressions work correctly, for example:
import re
s = '''Если ты хочешь построить корабль, не надо созывать людей, планировать, делить работу, доставать инструменты.
Надо заразить людей стремлением к бесконечному морю. Тогда они сами построят корабль.'''
pattern = r'\w+'
match = re.findall(pattern, s)
if match:
print(match)
Answer the question
In order to leave comments, you need to log in
pcre.org/original/doc/html/pcrepattern.html#SEC2
Unicode property support
Another special sequence that may appear at the start of a pattern is (*UCP). This has the same effect as setting the PCRE_UCP option: it causes sequences such as \d and \w to use Unicode properties to determine character types, instead of recognizing only characters with codes less than 128 via a lookup table.
\w
match not only with the Latin alphabet, you need to add (*UCP).(*UCP)\w+
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question