B
B
bamboozle2018-07-04 20:52:59
Fight against spam
bamboozle, 2018-07-04 20:52:59

Are there public tools for checking text for spam, or databases of "spam phrases" for self-checking?

There is a service that has functionality that allows you to generate a letter for the desired address. For example, you can invite others to join using the service, or simply register by email and wait for the confirmation email. Recently, spammers have begun to actively use it. The text of the letters is generated by the service, but spammers only need a username, in which they include links and other garbage. Spam language is mainly Chinese and Russian. My first thought was: surely there are services or tools for automatic text checking and spam detection.
Alas, everything was not so simple. Google gladly listed spam assassin as the most popular tool, but it:

  • considers primarily headers, sender data, etc., and we are only interested in text (for sure, you can configure)
  • doesn't seem to understand spam related to cryptocurrencies yet (can be solved with plugins for sure)
  • most importantly - only works with English

Further searches for tools or at least ready-made databases of texts for the necessary languages ​​​​(primarily Russian) did not lead to anything and, it seems, we will have to fence the bike. But still it seems that such a problem should have been solved long ago.
Paid solutions are also suitable.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
N
noma, 2018-07-04
@noma

SpamAssasin is local.
He's adjusting.
Including supports Russian, etc. that you need.
There can be no ready bases.
That is, there is some kind of basis, but then - it needs to be adjusted and customized for yourself.
All of these spam filters need to be trained regularly. Therefore, you need to provide for manual transmission of messages to SpamAssasin in case of errors (training for spam or "non-spam")

V
Vladimir Dubrovin, 2018-07-04
@z3apa3a

Unfortunately, this won't fix the problem. The only reliable solution is to eliminate user-generated content (UGC) before validating the address. Enough detail incl. with a description of the solutions, in this article .

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question