What can be done to avoid Unicast-flood?

I

Igor Belov2015-01-13 17:32:38

Computer networks

Igor Belov, 2015-01-13 17:32:38

We have a server with virtualization based on libvirt, virtual interfaces are in the bridge, at the moments when there is a flood on the IP address, example:

17:21:55.773491 IP 94.41.XX.XX.3090 > 109.XX.XX.XX.53: 44574+ A? 17FyFh1d.asus.com. (35)
17:21:55.773994 IP 94.41.XX.XX.3090 > 109.XX.XX.XX.53: 35700+ A? B5kNtUhz.asus.com. (35)
17:21:55.774435 IP 94.41.XX.XX.3090 > 109.XX.XX.XX.53: 48282+ A? 7IySjYHY.asus.com. (35)
17:21:55.774964 IP 94.41.XX.XX.3090 > 109.XX.XX.XX.53: 7106+ A? sN2c7Rsg.asus.com. (35)
17:21:55.775386 IP 94.41.XX.XX.3090 > 109.XX.XX.XX.53: 39958+ A? OMUj7mRD.asus.com. (35)
17:21:55.775917 IP 94.41.XX.XX.3090 > 109.XX.XX.XX.53: 30218+ A? mTgSmay5.asus.com. (35)
17:21:55.776378 IP 94.41.XX.XX.3090 > 109.XX.XX.XX.53: 29452+ A? CTTk9PUW.asus.com. (35)
17:21:55.776854 IP 94.41.XX.XX.3090 > 109.XX.XX.XX.53: 751+ A? T360Nv6T.asus.com. (35)
17:21:55.777246 IP 94.41.XX.XX.3090 > 109.XX.XX.XX.53: 39873+ A? YrFGRylE.asus.com. (35)
17:21:55.777747 IP 94.41.XX.XX.3090 > 109.XX.XX.XX.53: 31386+ A? IFYzDYVr.asus.com. (35)
17:21:55.778284 IP 94.41.XX.XX.3090 > 109.XX.XX.XX.53: 55326+ A? Otnb2tdw.asus.com. (35)

And at the same time turn off the server, the flood develops into a Unicast flood and spreads to all virtual interfaces on the server, in some cases to the entire vlan, which naturally leads to a load on the server.
What can be done on the active equipment side to avoid such problems in automatic mode?

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

T

throughtheether, 2015-01-13
@throughtheether

And at the same time turn off the server, the flood develops into a Unicast flood and spreads to all virtual interfaces on the server, in some cases to the entire vlan,

If I understand you correctly, it means unknown unicast flood.
Is your host at 109.XX.XX.XX.53? What kind of traffic is distributed to the entire vlan - similar to the one indicated in the dump or maybe arp requests?
I haven't worked with libvirt, but here are my thoughts on your situation. There is a certain router (device or virtual entity) on which an IP address from one prefix ("subnet") with 109.XX.XX.XX.53 is active. It has an arp table, where the IPv4 address 109.XX.XX.XX.53 corresponds to a certain MAC address aaaa-bbbb-cccc (ethernet is assumed). On a (virtual) switch/bridge, this MAC address is mapped to a virtual interface.
When you turn off the host, two options are possible:
1) an entry disappears in the switch table that maps a certain virtual interface to the aaaa-bbbb-cccc MAC address. In this case, an unknown unicast flood will be observed, namely, traffic similar to the one indicated in the dump you provided will be directed to all switch interfaces (although there may be nuances associated with virtualization). To solve the problem, you can either statically map the aaaa-bbbb-cccc MAC address to the desired interface, or, if possible, filter traffic destined for the host with the aaaa-bbbb-cccc MAC address while the corresponding server is inactive, or filter traffic to 109.XX.XX.XX.53 on upstream equipment
2) the entry corresponding to the IPv4 address 109.XX.XX.XX.53 to the MAC address aaaa-bbbb-cccc disappears in the arp table of the router. In this case, the router will send a number of broadcast arp requests. Quantitative characteristics depend on the implementation, how exactly this is done in libvirt is difficult to guess.

What can be done on the active equipment side to avoid such problems in automatic mode?

I do not understand your focus on the active equipment. In my opinion, it is more constructive to first try to solve the problem at the virtualization level. It seems that the virtualization environment provides more convenient tools for automation (ie various scripts and/or APIs). If the above (setting a static entry in the mapping table MAC address <-> switch interface; setting a filtering entry in the same table, see the cisco analogue mac address-table static MACADDRESS vlan VLANID drop; filtering based on L3 data) will not work to implement on a virtualization server, you can try to filter traffic up to 109.XX.XX.XX.53 on higher-level equipment, but I doubt that this can be conveniently automated.
It is difficult to give further advice without knowing the network topology.

In the example, it meant that 109.XX.XX.XX is IP, 53 is port,

Sorry, I overlooked. This is exactly why I don't like tcpdump output and prefer dumping pcap files.
I observe two problems:
1) garbage traffic to one of your hosts
2) flood (multiplication) of this traffic to all your virtual servers
Regarding the first problem, there are questions. How do you use a DNS server? How is access filtered? If not, why not? Are recursive queries allowed? If yes, then it is advisable to block their processing, unless you know exactly what you are doing. I suspect that in this case there is an attempt to attack 94.41.XX.XX using your server (dns reflection attack, it will be more accurate to say if you provide a traffic dump in .pcap format, tcpdump -i -s 65535 -w ). If the hypothesis turns out to be correct, then there is a chance that such junk traffic will stop some time after you turn off the processing of recursive requests.
Generally speaking, if I were you, I would deny all traffic, except for your service traffic and management (ssh, snmp, etc.) traffic using access lists (ACLs) on active equipment (on L3 devices closest to your equipment, most likely there are two of them)
Now for the second problem.

network engineers of the Data Center say that they have set the setting for vlan and switches in the DC "mac-address-table aging-time = 14400"

It is this, in my opinion, that makes the most significant contribution to the problem. When you turn off your server, the DC switch will send you traffic addressed to the server for another 4 hours. The value of 14400 was chosen, most likely, in order to avoid the problems of inconsistency between the arp table and the MAC address table when using ECMP and FHRP (a fairly common problem in a DC environment). This is the default value for the lifetime of an arp entry in cisco devices, if my memory serves me.
One possible solution is to reduce the lifetime of the MAC address table entry. This is possible if you are provided with a separate switch in the DC or if you are provided with a separate Vlan (a frequent scheme in the DC), and at the same time, the DC equipment supports changing the lifetime of the MAC address table entry depending on the Vlan (cisco catalyst, if I'm not mistaken, this is may, depending on the IOS version).
The second option is to automatically create an appropriate ebtables rule or a static entry in the MAC address table of the virtual bridge when the virtual server is turned off, which prohibits traffic to it (I think this is possible to some extent). The third option - deleting an entry in the MAC address table of the DC switch on request through any API at the moment seems unrealistic. In addition, you can think about reorganizing the scheme for connecting your server to the provider's equipment using two L3 links. In this case, you are not affected by the L2 settings of the DC equipment (the lifetime of the arp table entries, the mac address table), and your router (virtual) will drop traffic to the host for which there is no ARP entry (i.e. for turned off).
You wrote about the option I described above:

The situation is saved by a request to block the attacked IP on the active equipment, or you can use ebtables to prohibit forwarding packets to the attacked IP address, then the traffic will simply go to the main interface of the physical server without affecting the virtual servers. Both options are not an option since they require the constant presence of both network engineers in the DC, and bring problems on the physical server, and therefore I would like to know if it is possible to automatically avoid such problems from the network equipment.

I have not yet figured out what problems filtering traffic with ebtables leads to on a physical server. You need to understand that either this traffic (malicious, junk) somehow filters the DC (for this, it must first identify this malicious traffic by some criteria, and it’s not a fact that its criteria will match yours), most likely for additional money (anti-DDoS service), or this traffic reaches the equipment under your control, and then you already filter it as you see fit.
Summarizing, if I were you, I would:
1) deal with the traffic shown in the dump, with the DNS server settings
2) allow only the traffic necessary for the project to work using ACL on DC equipment
3) set up a clear monitoring of your server
It is almost certain that after the implementation of paragraphs. 1), 2) the observed problem would disappear. Even if not, I gave the options (changing the L2 settings of DC equipment, changing the connection scheme)

No traffic can be filtered because this is not our traffic, but traffic towards the client who rented the VDS. For the same reason, you cannot block someone for incoming packets,

I don't understand why you can't filter client traffic. Almost everyone does this. When a server rented for $100 a month receives 10 gigabits/s of traffic or more, as a rule, the traffic to it is blocked for the simple reason that the traffic costs exceed the income from the client. In addition, often such traffic threatens the infrastructure itself. You asked a question regarding anti-DDoS. What is it, if not traffic filtering?
Next, I think you should redo the uplink scheme, this is the most simple and potentially effective solution. So that the server is connected to the router with an L3 interface and then routes (and not bridged) traffic to the client. I imagine what it would look like in the case of an L3 switch, but here the nuances of implementing network support in Linux, unknown to me, will certainly appear here. I recommend testing the scheme thoroughly before use.

P

pavelsh, 2015-01-29
@pavelsh

If I understand this whole thread correctly, then the following happens:
1. You had traffic to a certain mac-address xxxx.yyyy.zzzz Everything was ok.
2. At some point in time, the client turns off the machine with this mac-address
3. Traffic arrives at the switch, sees the entry "look for the mac-address xxxx.yyyy.zzzz on such and such a port." Sends this traffic to the server. Everything is legal. (and will send this traffic until this mac is expired)
4. Traffic arriving at the server comes to the bridge. Apparently your bridge is configured in such a way that unknown unicast traffic begins to poke into all subinterfaces / interfaces to find - what if there is a receiver for this address. That is, this traffic begins to multiply.
5. If you have a double connection to the switch, then the traffic can fly back to the vlan, and pretty much multiplied.
How to solve the problem. I can say for sure - this is not a problem with L2 and L3 equipment. This is a linux bridge issue. Deal with the bridge and do not touch the DC engineers.