Answer the question
In order to leave comments, you need to log in
How to determine the probability that a domain name is random?
Many viruses request addresses from the DNS server, generating them according to their own algorithms.
In the server logs you can see something like this:
www .8fa0816f.com
www .9849af14.com
www
.c33fb23d.com
www .c9423e05.com
or
ascxdcnciz.net
aykjwdbhdx.ws
azsvhzicv.info
bbuiozzcnv.ws
bceuhtw.cc
beift.net
I conclude that the computer that makes such requests is most likely infected.
What algorithm would you recommend to read and implement in order to programmatically calculate the probability of the requested domain being "viral"?
Check all domains for existence? But they can be created by the authors of the virus using the same algorithm in order to give “orders” / “load” from them.
I appreciate any advice and guidance. I ask you not to recommend "just install an antivirus", since it is already installed on all computers, but my enterprise is simply huge in terms of area and number of computers, in short, "there is a hole in the old woman."
Answer the question
In order to leave comments, you need to log in
1. Take a list of known domains, and build a probability scale for encountering a sequence of 2 characters
2. Having a probability map of 2-character sequences, calculate the average probability of all two-letter sequences in the domain
3. Compare this indicator with normal domains, you will see a correlation
If you want to read the theory - see direction of the entropy of a sequence of characters. Your words are the ones with the highest entropy
Donald Erwin Knuth described the criteria for determining a random sequence, look.
I think dictionary search solutions will not be very efficient, since they take n time.
I would do this:
Take the google search api, make a request site:habrahabr.ru (We need the number of results, if it is large, the site is most likely real).
If you don’t want Google, you can count the pages of the site yourself, provided that it is real.
The simplest solution is with a dictionary.
It is possible with bells and whistles, like a letter-by-letter dictionary search for domain name substrings from digit to digit.
If you need a specific anti-spam solution, can it be easier to buy a ready-made one? cisco ironport, for example...
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question