S
S
Sergey Zolotarev2021-04-17 08:56:16
Python
Sergey Zolotarev, 2021-04-17 08:56:16

In addition to text, does NTLK vectorize HTML special characters and more?

Good morning!
Since I know well the basics of NTLK and its method word_tokenize(), I run into a problem where NTLK has to turn the source text into vectors if it has HTML special characters and other kinds of characters...
For example:


👐 Привет! Как настроение?
[Region = Samara]
😇 Ок, я нашел для вас интересные места в районе[moscow district = Krylatskoe]


Are there methods in word_tokenize()for vectorizing text with any kind of text (except plain text)?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
D
Dimonchik, 2021-04-17
@dimonchik2013

look at
https://stackoverflow.com/questions/9149709/extrac...
but I still don't understand what you need: to treat self-made words like '~toaster' and '!toaster' as different from 'toaster' or clear special characters

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question