How to correctly apply an IP address to the input of the neural network?

M

MaxxZ2021-06-08 08:17:11

Computer networks

MaxxZ, 2021-06-08 08:17:11

For some network tasks of traffic classification, I want to submit an IP address to the input of the neural network.
However, the most obvious way is to saddle, normalize the dword32 representation of the IP address, is obviously wrong. In this case, the neural network will begin to compare addresses by value and will inevitably begin to draw conclusions from the distance between the addresses of the training sample . And this, in turn, is fundamentally wrong, since the distance between IP addresses in networks does not mean anything . Addresses that differ by several units may belong to different companies on different continents. And between 192.167.255.255 and 192.168.0.0, with a unit distance, there is also an abyss in the field of application.
It comes to mind to make input vectors of huge sizes from networks, for example, by mask /24, /16, AS number or other criteria and make inputs binary for belonging to one or another group of addresses. This should improve the logic of the task. But maybe there are other solutions? Surely, after all, someone has already described this problem and there is no need to reinvent the wheel. I would be grateful for a link to some article on the subject.

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

H

hint000, 2021-06-08
@MaxxxZ

the most obvious way is to saddle, normalize the dword32 representation of the IP address, obviously wrong

Not obvious. Because the neural network can solve different problems. What task your neural network will solve is unknown. Different tasks will require different representations of the address.
In one case, it is sufficient to define the state. Otherwise, ASN will do. In the third case, it turns out that neither the state nor the ASN are relevant, but the one that is "obviously wrong" is a good fit. In the fourth case - some kind of completely unusual idea that will not even occur to anyone until the task is announced.
There is no one "correct" option for all cases.

D

Denis Yuriev, 2021-06-08
@dyuriev

use the maxmiddb database, and feed the neural network not the IP itself, but information about geographic coordinates (do not forget that they are spherical, by the way, and they should be converted to a Cartesian system, taking the center of the earth as the origin), country, region, city, ASN (organization / provider) and something else can be
fished out by itself, the IP address of the neural network is unlikely to say anything
PS: please note that there are gaps in the database, for example, there will be no information about the region or city

V

Vladimir Dubrovin, 2021-06-08
@z3apa3a

the distance between IP addresses in networks means nothing.

Actually, it really means a lot, but the distance should be considered not as the modulus of the difference, but as the number of zeros in the minimum common mask (i.e. if the addresses fall into the /28 network, then the distance between them is 4).
Sometimes a network is normalized to a network address using information from BGP or whois, or an ASN is added, but such networks are not always accurate. Sometimes the approach is used that at first small subnets are taken, for example /24, if adjacent subnets have the same characteristics, then they are combined into a larger subnet and so on iteratively.