Y
Y
Ywka2016-02-03 18:19:59
Python
Ywka, 2016-02-03 18:19:59

How to process Cyrillic hashtags?

Good day to all.
There is a piece of code for searching hashtags in the text:

pattern= re.compile(r'\#\w+')

hashtags = re.findall(pattern, text)

But it only deals with Latin.
How to extract all hashtags including Cyrillic?
Thank you.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
S
sim3x, 2016-02-03
@Ywka

python3

In [2]: print(re.findall(re.compile(r'\#\w+', re.IGNORECASE), 
                                   "#1aaa sdfs #ввв2 dfsdf sdf s"))
['#1aaa', '#ввв2']

python2
In [2]: print(re.findall(re.compile(ur'\#[0-9a-zа-я_-]+', re.IGNORECASE), 
                                   u"#1aaa sdfs #ввв2 dfsdf sdf s"))
[u'#1aaa', u'#\u0432\u0432\u04322']

python2 is correct (thanks for reading @aklim007 documentation)
In [2]: print(re.findall(re.compile(r'\#\w+', re.IGNORECASE|re.U), 
                                   u"#1aaa sdfs #ввв2 dfsdf sdf s"))
[u'#1aaa', u'#\u0432\u0432\u04322']

O
Oleg Krasnov, 2016-02-03
@OKrasnov

\#[а-яa-z]+

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question