F
F
fso2013-02-08 08:30:59
Debian
fso, 2013-02-08 08:30:59

Does Debian crash with pve on board?

I came across a post about a packet of death for network Intel , I suspect a similar bug on my server.
Just recently, on January 18, I took a server in Hetzner EX4.
Using installimage, I installed the finished assembly

Debian + Proxmox, ядро 2.6.32-17-pve, pve-manager 2.2-32, сетевая  RTL8111/8168B PCI-E (rev 09).

I let traffic from the combat application at about 60-80 Mbps, in the logs there was swearing
kernel: TCP: time wait bucket table overflow (CT0)
in sysctl ,
net.ipv4.tcp_max_tw_buckets=3800000
it had no effect.
At arbitrary, as it seemed, moments in time, 5-10 minutes after the traffic was sent, the server spontaneously hung up (at least the network interface) after a cold reset, nothing in the logs, that is, the usual working log is simply interrupted by the boot start log (reset).
After removing the traffic load, with almost zero traffic, the server continued to freeze, but once every few days - the symptoms are absolutely the same, although there is no swearing in the logs at all ( bucket table overflowalso disappeared)
Has anyone encountered this behavior? What could it be?
PS: in the same hatzner, on another serverRTL8111/8168B PCI-E (rev 02)keeps under this load naura (actually I thought to transfer it from him)

Answer the question

In order to leave comments, you need to log in

12 answer(s)
F
fso, 2013-03-21
@fso

Let me summarize. After updating the bios, the crashes completely stopped. Now the month of uptime is completely without an artifact.
Maybe someone else will encounter such a problem - update the BIOS and install the r8168 driver (I did this before the update and it will probably work without r8168).

L
LightFalcon, 2013-02-08
@LightFalcon

There was exactly the same problem, I decided to replace the network cards with Intel's PCE-Express.
The software didn't fix it.

P
Puma Thailand, 2013-02-08
@opium

I have a similar problem with servers on ex4s
It all started with proxmox 2.1 now the latest version 2.2
now left only test environments there
hangs like you have
a black screen on the keyboard, reset from the control panel helps.
it seems to me that the servers of the ex4 line differ in hardware, since I asked the proxmox support what was wrong, they answered that they have several servers in the hatzner ex4 and there are no such freezes, they say do tests on the hardware, and I already ordered a memory test twice and the entire system.
Now only openvz environments work, it freezes from several times a week to once a month, I noticed that if I start a terminal windows server, it freezes stably for three days.
In the logs, as always, it is clean, if the kernel does not freeze tightly, then I think maybe the problem is related to three terabyte disks and alignment.
one user advised in a personal
In the BIOS, try:
1) disable all unnecessary devices - USB / sound card
2) Play around with the ACPI settings
3) disable processor power saving,
I think to take Lara and poke the BIOS.

S
script88, 2013-02-08
@script88

Do you have a network card r8169 with rolled firewood from r8168 or is it r8168?

V
Vlad Zhivotnev, 2013-02-08
@inkvizitor68sl

echo "deb backports.debian.org/debian-backports squeeze-backports main contrib non-free" >> /etc/apt/sources.list
apt-get update
apt-get install -t squeeze-backports linux-image-3.2. 0-0.bpo.2-amd64
Well, or you can find something newer there.

F
fso, 2013-02-08
@fso

That's bad. So they won’t change it for free, but two weeks of the test have already passed and I have already given away a double subscriber.
But putting up with a suddenly disappearing server is not the case. On the second server, the uptime is two years - I am completely satisfied.
I'm trying to load a regular kernel, apply a load. If there is no fall, then the matter is in the pve core and dig in that direction (leave only kvm).
If it continues to fall, try to raise the normal kernel to 3.x and repeat the test.
Well, if nothing helps at all, take another server or write to the support to change it. Soared for a week.

V
van, 2013-04-30
@van

Thanks to everyone who contributed in one way or another to this thread. The problem is 100% identical,
we also have a 4s server on a hetzner, also from time to time the server “disappears” without any logs and
only a reboot through the panel helps.
I was tired, I’ve been fighting for a month already, the most annoying thing is that on one forum tables constantly disappear when they fall,
in general, a full paragraph.
Many thanks for the instruction that the script posted, thanks to it I updated the network card and now I'm waiting, let's look at the effect.
Bios 1005, network connection is the same as in the instructions, in general, we follow and hope for the best.

V
van, 2013-05-08
@van

the server worked for a week and hung again (((((I was completely at a loss, rebooted it again and let's see what happens now. But if it keeps hanging, then apparently the move will be inevitable ... how he got me (((((((( ((

V
van, 2013-05-09
@van

in general, people also advise updating the kernel to 3.2, now I'll wait for the next fall and try to update the kernel as written here
unixforum.org/index.php?showtopic=133288&st=0&gopid=1240511&#entry1240511

V
van, 2013-05-12
@van

updating the kernel did not help, the server still hangs. Bios 1005, now I'll try to update it to 1105, but I don't even know if it's worth the candle

V
van, 2013-05-12
@van

I wrote five times already, the first time they completely turned off the server for 12 hours testing and checking the hardware, the second time they changed the power supply, then they tested the RAM for four hours, now they are already shrugging off software errors, and without specifying which ones.
Now I ordered to update the BIOS, although I already have 1005, so I don’t even know what to do ... if I only ask for a new server, but here unixforum.org/index.php?s=&showtopic=133288&view=findpost&p=1228920 they write about the same problem and about hetzner support answers and for one friend, even moving to another server did not help))

V
van, 2013-05-18
@van

in general it seems localized and solved the problem. If this happens, flash the bios. Somewhere there is a problem where exactly it was not possible to find out and this is not necessary, in general, get the firmware after which everything will immediately become normal.
Interestingly, I and another “victim” have machines of the same EX4 line, but with an older BIOS taken a year ago, so everything works fine there. In general, like this :) Good luck to everyone!

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question