O
O
OlenaKarelina2020-07-01 17:34:58
Machine learning
OlenaKarelina, 2020-07-01 17:34:58

How to choose or create a tool for extracting login, password, email from texts of different structures?

You need to select or create a tool for extracting logins, passwords, emails from texts of various structures (such as parsing Telegram channels, parsing Twitter pages, databases leaked to hacker forums, etc.).
It is clear that regular expressions can be used. But for texts from different sources, regular expressions will be different, and when a new source appears, you will need to write a regular expression for it. The task is to create or find such an artificial intelligence method that would recognize and extract the named information from texts of different structures.
Please write what is your approach to solving this problem? Or perhaps it has already been solved in some application ...

Answer the question

In order to leave comments, you need to log in

3 answer(s)
S
Stalker_RED, 2020-07-01
@Stalker_RED

Information retrieval , as a subsection of natural language analysis, is a large and complex topic, there are a lot of textbooks and scientific papers on it. Yandex has a part of the research in the open source, you can look and decide which way to dig.
https://habr.com/ru/company/yandex/blog/219311/

A
Andrew Nodermann, 2020-08-15
@Lucian

Regular expressions are quite suitable, you need to know how to use them.

D
Developer, 2020-07-01
@samodum

First, give links to examples of such telegram channels with such information in order to understand what data you need to work with

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question