C
C
CityCat42016-07-08 06:55:44
linux
CityCat4, 2016-07-08 06:55:44

Kernel BUG - really a kernel bug?

CentOS 6.8, many packages manually rebuilt. Kernel 2.6.32-642.1.1.el6.centos.plus.x86_64 #1 SMP Wed Jun 1 03:11:50 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux.
In the last two or three weeks, some kind of (no other word can be found) is going on. The computer constantly hangs - tightly, at a time when it is not in use, sometimes a kernel BUG message appears:

kernel: BUG: soft lockup - CPU#2 stuck for 67s! [thunderbird:6385]
kernel: Pid: 6385, comm: thunderbird Not tainted 2.6.32-642.1.1.el6.centos.plus.x86_64 #1 Gigabyte Technology Co., Ltd. H61M-D2-B3/H61M-D2-B3
Jul  7 20:30:18 sentry kernel: RIP: 0010:[<ffffffff812aea6d>]  [<ffffffff812aea6d>] copy_user_generic_strin
g+0x2d/0x40
Jul  7 20:30:18 sentry kernel: RSP: 0018:ffff88020e147c70  EFLAGS: 00010246
Jul  7 20:30:18 sentry kernel: RAX: ffff880000000000 RBX: ffff88020e147c78 RCX: 0000000000000200
Jul  7 20:30:18 sentry kernel: RDX: 0000000000000000 RSI: ffff8800c0184000 RDI: 00007faac39d8000
Jul  7 20:30:18 sentry kernel: RBP: ffffffff8100bc0e R08: 0000000000000003 R09: ffffea0002a054e8
Jul  7 20:30:18 sentry kernel: R10: ffff88020e147fd8 R11: 0000000000000293 R12: 0000000000001000
Jul  7 20:30:18 sentry kernel: R13: 0000000000001000 R14: 0000000000001000 R15: ffff88020e144000
Jul  7 20:30:18 sentry kernel: FS:  00007fab0235b720(0000) GS:ffff880028280000(0000) knlGS:0000000000000000
Jul  7 20:30:18 sentry kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul  7 20:30:18 sentry kernel: CR2: ffff8800c0184000 CR3: 00000003c76c1000 CR4: 00000000000427e0
Jul  7 20:30:18 sentry kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul  7 20:30:18 sentry kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jul  7 20:30:18 sentry kernel: Process thunderbird (pid: 6385, threadinfo ffff88020e144000, task ffff8801f0d1cab0)
Jul  7 20:30:18 sentry kernel: Stack:
Jul  7 20:30:18 sentry kernel: ffffffff81012c99 ffff88020e147cd8 ffffffff8112ddf3 ffff88020e147cd8
Jul  7 20:30:18 sentry kernel: <d> ffffffff811b9561 ffff8800c0184000 00001000070c7c94 00000000577e672c
Jul  7 20:30:18 sentry kernel: <d> ffffea0002a054e0 ffff880248a28dd8 0000000000000003 ffff88040799b280
Jul  7 20:30:18 sentry kernel: Call Trace:
Jul  7 20:30:18 sentry kernel: [<ffffffff81012c99>] ? copy_user_generic+0x9/0x10
Jul  7 20:30:18 sentry kernel: [<ffffffff8112ddf3>] ? file_read_actor+0x163/0x180
Jul  7 20:30:18 sentry kernel: [<ffffffff811b9561>] ? touch_atime+0x71/0x1a0
Jul  7 20:30:18 sentry kernel: [<ffffffff811301e6>] ? generic_file_aio_read+0x2d6/0x700
Jul  7 20:30:18 sentry kernel: [<ffffffff8119bf9a>] ? do_sync_read+0xfa/0x140
Jul  7 20:30:18 sentry kernel: [<ffffffff81160bf9>] ? mmap_region+0x269/0x5b0
Jul  7 20:30:18 sentry kernel: [<ffffffff810a6ac0>] ? autoremove_wake_function+0x0/0x40
Jul  7 20:30:18 sentry kernel: [<ffffffff811a1b94>] ? cp_new_stat+0xe4/0x100
Jul  7 20:30:18 sentry kernel: [<ffffffff8123d066>] ? security_file_permission+0x16/0x20
Jul  7 20:30:18 sentry kernel: [<ffffffff8119c895>] ? vfs_read+0xb5/0x1a0
Jul  7 20:30:18 sentry kernel: [<ffffffff8119d66f>] ? fget_light_pos+0x3f/0x50
Jul  7 20:30:18 sentry kernel: [<ffffffff8119cbe1>] ? sys_read+0x51/0xb0
Jul  7 20:30:18 sentry kernel: [<ffffffff810ee59e>] ? __audit_syscall_exit+0x25e/0x290
Jul  7 20:30:18 sentry kernel: [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b
Jul  7 20:30:18 sentry kernel: Code: 74 30 83 fa 08 72 27 89 f9 83 e1 07 74 15 83 e9 08 f7 d9 29 ca 8a 06 88 07 48 ff c6 48 ff c7 ff c9 75 f2 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 31 c0 c3 66 0f 1f 84 00 00 00 00 00 21 d2

Iron has nothing to do with it - the screws were moved to another computer with the same mother, that computer was checked - Windows works on it as if nothing had happened. Part of the kernel log can be viewed here - Part of the log .
Last week it hung constantly - after disabling all C-states found in the BIOS, disabling EIST, switching from S3 to S1, updating the BIOS itself (I don’t know what affected it) - it seemed to stop hanging and I already breathed a sigh of relief.
Yesterday again kernel BUG, ​​this morning I come - again we hang. There is still a suspicion of a screw, because SMART shows the following:
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR--   117   099   006    -    125207024
  3 Spin_Up_Time            PO----   097   097   000    -    0
  4 Start_Stop_Count        -O--CK   097   097   020    -    3781
  5 Reallocated_Sector_Ct   PO--CK   100   100   036    -    0
  7 Seek_Error_Rate         POSR--   083   060   030    -    214199643
  9 Power_On_Hours          -O--CK   067   067   000    -    29037
 10 Spin_Retry_Count        PO--C-   100   100   097    -    0
 12 Power_Cycle_Count       -O--CK   099   099   020    -    1763
183 Runtime_Bad_Block       -O--CK   001   001   000    -    233
184 End-to-End_Error        -O--CK   100   100   099    -    0
187 Reported_Uncorrect      -O--CK   100   100   000    -    0
188 Command_Timeout         -O--CK   100   092   000    -    4295032993
189 High_Fly_Writes         -O-RCK   096   096   000    -    4
190 Airflow_Temperature_Cel -O---K   058   050   045    -    42 (Min/Max 36/42)
194 Temperature_Celsius     -O---K   042   050   000    -    42 (0 18 0 0 0)
195 Hardware_ECC_Recovered  -O-RC-   051   036   000    -    125207024
197 Current_Pending_Sector  -O--C-   100   100   000    -    0
198 Offline_Uncorrectable   ----C-   100   100   000    -    0
199 UDMA_CRC_Error_Count    -OSRCK   200   200   000    -    0
240 Head_Flying_Hours       ------   100   253   000    -    109547435883024
241 Total_LBAs_Written      ------   100   253   000    -    3475782009
242 Total_LBAs_Read         ------   100   253   000    -    4167494488

And actually the question is - this kernel BUG is actually a kernel bug or can something be done, for example, replace the screw?

Answer the question

In order to leave comments, you need to log in

3 answer(s)
A
Armenian Radio, 2016-07-08
@gbg

Stop dancing with a tambourine and upgrade at last.

Y
Yuri Chudnovsky, 2016-07-08
@Frankenstine

If earlier on the same software everything worked more stable than before - most likely you have hardware problems, such as dried electrolytes on the motherboard or in the power supply. Check the condition of the capacitors and the voltage on the PSU buses under load.

A
Alexey Cheremisin, 2016-07-08
@leahch

The answer is not valid!
I meant thunderb olt interface

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question