A
A
Alexander Ananchenko2021-10-24 08:07:38
Python
Alexander Ananchenko, 2021-10-24 08:07:38

I want to write a mat filter, do you have any ideas how to implement it?

I create a mat filter for my group, I made sketches, but smart people aged 12 - 15 are always trying to outwit the filter

Here is my filter filter code

The code

import re
from fuzzywuzzy import fuzz

dict =   {
    'а' :   '[@|а|а́|a]',
    'б' :   '[б|6|b]',
    'в' :   '[в|b|v]',
    'г' :   '[г|r|g]',
    'д' :   '[д|d]',
    'е' :   '[е|e|ё]',
    'ж' :   '[ж|z|*]',
    'з' :   '[з|3|z]',
    'и' :   '[и|u|i]',
    'й' :   '[й|u|i]',
    'к' :   '[к|k]',
    'л' :   '[л|l]',
    'м' :   '[м|m]',
    'н' :   '[н|h|n]',
    'о' :   '[о|o|0]',
    'п' :   '[п|n|p]',
    'р' :   '[р|r|p]',
    'с' :   '[с|c|s|5|$]',
    'т' :   '[т|m|t]',
    'у' :   '[у́|у|y|u]',
    'ф' :   '[ф|f]',
    'х' :   '[х|x|h]',
    'ц' :   '[ц|c|u]',
    'ч' :   '[ч|c|h]',
    'ш' :   '[ш|щ]',
    'ь' :   '[ь|b]',
    'ы' :   '[ы|i]',
    'ъ' :   '[ъ|ь]',
    'э' :   '[э|e]',
    'ю' :   '[ю|y|u]',
    'я' :   '[я|r]',
    ' ' : '[.|,|!|?|&|)|(|\\|\/|*|-|_|"|\'|;|®]'
}   
# Регулярки для замены похожих букв и символов на русские


CWF = open("CurseWords.txt", "r", encoding = "utf-8")
CurseWords = "".join(CWF.readlines()).split(", ")


def replace_letters(word = None):
    word = word.lower()
    for key, value in dict.items():
        word = re.sub(value, key, word)
    return word

def filter_word(msg):
    msg = msg.split()
    for w in msg:
        w = ''.join([w[i] for i in range(len(w)-1) if w[i+1]!= w[i]]+[w[-1]]).lower()# Здесь убираю символы которые повторяються "Приииииивет" -> "Привет"
        w = replace_letters(w)
        for word in CurseWords:
            b = fuzz.token_sort_ratio(word, w)# Проверяю сходство слов из списка 
            if b >= 85:
                print(f'{w} | {b}% Матерное слово {word}')
                return True
            else: 
                pass


Sometimes there are false positives, but let it be, the biggest problem is when users write all the words together like this "Go all the way ***" and these are the words I just have no idea how to check correctly, and so that there are no false positives, I tried to compare swear words with a full sentence, but then such words as Use are considered obscene, if there are any ideas on how to improve this, I will be glad to hear

Answer the question

In order to leave comments, you need to log in

1 answer(s)
D
dollar, 2021-10-24
@Shurik24

This is a war of projectile and armor, that is, it is endless. She's impossible to win. As soon as you improve the armor, the enemy in response improves the caliber of the projectile, and so on in a circle. The best solution would be to try to stop the war, to impede its flow, and not to increase the power of weapons.
One way to do this is to stop fighting, that is, to surrender. Then the "game" will end. Those who tried to "defeat the system" will be declared the winners and will lose interest in further struggle. Although there will be those for whom mate is the norm, and will continue to use it.
The second way is to show the illusion of victory. That is, someone who tries to bypass the filter can be shown that his mate is displayed correctly, and everyone else in the chat will see asterisks or a substitute word. Of course, some will guess and start checking their messages from the second account. But not everyone will guess, so the mat as a whole will become smaller. In addition, the second account is associated with some hemorrhoids, which will also filter out lazy people. Only stubborn lone warriors will remain, who can easily be simply banned, for example.
Another technique - in addition to the previous ones - deferred punishment (by moderator). The “player” will not receive immediate reinforcement in the form of “well done, bypassed the filter”, but will be forced to wait for the jury. However, no one likes to wait, it also greatly affects the motivation to continue playing resistance to the system. Therefore, many "partisans" will go over to the side of good, because it's boring.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question