I
I
Ilya T.2020-09-22 08:15:13
linux
Ilya T., 2020-09-22 08:15:13

Disk or mother problems?

Hello!
I have an ubuntu server, a non-system disk is connected via SATA on it cryptsetup and BTRFS. About once a month or even more often, the following appears in the logs:

spoiler

Sep 18 00:05:55 white kernel: [265985.854275] BTRFS error (device dm-1): invalid tree nritems, bytenr=4005273206784 nritems=0 expect >0
Sep 18 00:21:43 white kernel: [266934.470251] ata2.00: failed command: WRITE FPDMA QUEUED
Sep 18 00:21:43 white kernel: [266934.470259] ata2.00: cmd 61/80:00:40:db:1b/00:00:5b:01:00/40 tag 0 ncq dma 65536 out
Sep 18 00:21:43 white kernel: [266934.470259] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 18 00:21:43 white kernel: [266934.470267] ata2.00: status: { DRDY }
Sep 18 00:21:43 white kernel: [266934.470270] ata2.00: failed command: WRITE FPDMA QUEUED
Sep 18 00:21:43 white kernel: [266934.470277] ata2.00: cmd 61/20:08:20:dc:1b/00:00:5b:01:00/40 tag 1 ncq dma 16384 out
Sep 18 00:21:43 white kernel: [266934.470277] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 18 00:22:44 white kernel: [266995.066702] ata2: softreset failed (1st FIS failed)
Sep 18 00:22:44 white kernel: [266995.066714] ata2: limiting SATA link speed to 3.0 Gbps
Sep 18 00:22:44 white kernel: [266995.066716] ata2: hard resetting link
Sep 18 00:22:49 white kernel: [267000.067524] ata2: softreset failed (1st FIS failed)
Sep 18 00:22:49 white kernel: [267000.067549] ata2: reset failed, giving up
Sep 18 00:22:49 white kernel: [267000.067560] ata2.00: disabled
Sep 18 00:22:49 white kernel: [267000.067729] ata2: EH complete
Sep 18 00:22:49 white kernel: [267000.067828] sd 1:0:0:0: [sdb] tag#29 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Sep 18 00:22:49 white kernel: [267000.067837] sd 1:0:0:0: [sdb] tag#29 CDB: Write(16) 8a 00 00 00 00 01 5b 1b db 00 00 00 00 20 00 00
Sep 18 00:22:49 white kernel: [267000.067842] print_req_error: I/O error, dev sdb, sector 5823519488
Sep 18 00:22:49 white kernel: [267000.067885] BTRFS error (device dm-1): bdev /dev/mapper/private errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
Sep 18 00:22:49 white kernel: [267000.067955] sd 1:0:0:0: [sdb] tag#30 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Sep 18 00:22:49 white kernel: [267000.067961] sd 1:0:0:0: [sdb] tag#30 CDB: Write(16) 8a 00 00 00 00 01 5b 1b 97 a0 00 00 00 a0 00 00
Sep 18 00:22:49 white kernel: [267000.067964] print_req_error: I/O error, dev sdb, sector 5823502240
Sep 18 00:22:49 white kernel: [267000.067990] BTRFS error (device dm-1): bdev /dev/mapper/private errs: wr 2, rd 0, flush 0, corrupt 0, gen 0

and the partition becomes read-only.
SMART disk does not contain anything criminal. The loop has changed. BIOS updated.
Change motherboard? Or a disk?

Answer the question

In order to leave comments, you need to log in

3 answer(s)
S
Saiputdin Omarov, 2020-09-22
@generalx

Did you change the connection bus?

A
Alexey Kharchenko, 2020-09-22
@AVX

print_req_error: I/O error, dev sdb,

Indicates that the problem is rather iron. I don't know how deep the specified crypt (encrypted partition?) is involved, but there are too many points of failure - both the very fact that it is encrypted, and btrfs (still a buggy FS, whatever one may say) ....
SMART drive here, and complete model. By the way, SMART does not always correctly catch recording errors, but the OS may well run into this. And I came across such disks that reading is ideal, and writing with big problems, to the point that the disk itself goes into read-only mode (not the OS, not the driver, but the disk itself, more often it comes across ssd). I came across a disk such that everything works, you can write to it, read it, but after a power reset it is again in the same state as it was - as if nothing was written to it. Then I did not specifically check it with tests, there was no time, it was simply replaced and the client took it.

I
Ilya T., 2020-09-23
@Insaned

Changed the hard drive. I'm watching. There will be some news - I will report here.
upd: After replacing the disk, some kind of fierce game began.
Once a day in the logs it gives the following:


Sep 24 16:03:10 white kernel: [68836.536800] xhci_hcd 0000:00:10.0: WARN Cannot submit Set TR Deq Ptr
Sep 24 16:03:10 white kernel: [68836.536803] xhci_hcd 0000:00:10.0: A Set TR Deq Ptr command is pending.
Sep 24 16:03:10 white kernel: [68836.668526] usb 3-2: reset SuperSpeed ​​Gen 1 USB device number 2 using xhci_hcd
Sep 24 16:03:41 white kernel: [68867.259600] xhci_hcd 0000:00:10.0: WARN Cannot submit Set TR Deq Ptr
Sep 24 16:03:41 white kernel: [68867.259603] xhci_hcd 0000:00:10.0: A Set TR Deq Ptr command is pending.
Sep 24 16:03:41 white kernel: [68867.388753] usb 3-2: reset SuperSpeed ​​Gen 1 USB device number 2 using xhci_hcd
Sep 24 16:03:41 white kernel: [68867.411827] sd 9:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
Sep 24 16:03:41 white kernel: [68867.411833] sd 9:0:0:0: [sdb] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 20 00
Sep 24 16:03: 41 white kernel: [68867.411836] blk_update_request: I/O error, dev sdb, sector 0 op 0x0:(READ) flags 0x80700 phys_seg 4 prio class 0
Sep 24 16:04:12 white kernel: [68897.983997] xhci_hcd 0000:00: 10.0: WARN Cannot submit Set TR Deq Ptr
Sep 24 16:04:12 white kernel: [68897.984004] xhci_hcd 0000:00:10.0: A Set TR Deq Ptr command is pending.
Sep 24 16:04:12 white kernel: [68898.113010] usb 3-2: reset SuperSpeed ​​Gen 1 USB device number 2 using xhci_hcd
Sep 24 16:04:42 white kernel: [68928.696367] xhci_hcd 0000:00:10.0: WARN Cannot submit Set TR Deq Ptr
Sep 24 16:04:42 white kernel: [68928.696369] xhci_hcd 0000:00:10.0: A Set TR Deq Ptr command is pending.
Sep 24 16:04:42 white kernel: [68928.825433] usb 3-2: reset SuperSpeed ​​Gen 1 USB device number 2 using xhci_hcd

After that, LA begins to grow uncontrollably due to the growth of iowait.
At the same time, a flash drive is inserted into the USB but not mounted (used during boot).

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question