4
4
479112021-09-28 07:44:47
linux
47911, 2021-09-28 07:44:47

SSD usage over 100% - is it possible and why did it happen?

Hello. In general, I observe a not entirely clear situation on the server.
There are 2 SSD disks (Manufacturer and model ADATA SX8200PNP), connected via the "video card" slot.

/dev/nvme0n1

Расположение NVME SSD 0 drive 1
Размер диска 976.72 GiB
Производитель и модель ADATA SX8200PNP
SMART поддерживается? Да
SMART включен? Да
Проверка пройдена? Да
===========================
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.19.0-17-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number: ADATA SX8200PNP
Serial Number: 0000000000000
Firmware Version: S0118C
PCI Vendor/Subsystem ID: 0x1cc1
IEEE OUI Identifier: 0x000000
Controller ID: 1
Number of Namespaces: 1
Namespace 1 Size/Capacity: 1,024,209,543,168 [1.02 TB]
Namespace 1 Formatted LBA Size: 512
Local Time is: Tue Sep 28 11:39:33 2021 +07
Firmware Updates (0x14): 2 Slots, no Reset required
Optional Admin Commands (0x0016): Format Frmw_DL Self_Test
Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Maximum Data Transfer Size: 64 Pages
Warning Comp. Temp. Threshold: 75 Celsius
Critical Comp. Temp. Threshold: 80 Celsius

Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 9.00W - - 0 0 0 0 0 0
1 + 4.60W - - 1 1 1 1 0 0
2 + 3.80W - - 2 2 2 2 0 0
3 - 0.0450W - - 3 3 3 3 2000 2000
4 - 0.0040W - - 4 4 4 4 6000 8000

Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning: 0x00
Temperature: 36 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 2%
Data Units Read: 67,898,083 [34.7 TB]
Data Units Written: 23,510,229 [12.0 TB]
Host Read Commands: 3,664,943,069
Host Write Commands: 967,519,731
Controller Busy Time: 57,686
Power Cycles: 38
Power On Hours: 15,120
Unsafe Shutdowns: 15
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Thermal Temp. 1 Transition Count: 18
Thermal Temp. 2 Transition Count: 14
Thermal Temp. 1 Total Time: 135
Thermal Temp. 2 Total Time: 210

Error Information (NVMe Log 0x01, max 256 entries)
No Errors Logged

/dev/nvme1n1

Расположение NVME SSD 1 drive 1
Размер диска 976.72 GiB
Производитель и модель ADATA SX8200PNP
SMART поддерживается? Да
SMART включен? Да
Проверка пройдена? Нет
============================
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.19.0-17-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number: ADATA SX8200PNP
Serial Number: ХХХХХХХХХХХХХХ
Firmware Version: S0118C
PCI Vendor/Subsystem ID: 0x1cc1
IEEE OUI Identifier: 0x000000
Controller ID: 1
Number of Namespaces: 1
Namespace 1 Size/Capacity: 1,024,209,543,168 [1.02 TB]
Namespace 1 Formatted LBA Size: 512
Local Time is: Tue Sep 28 11:29:33 2021 +07
Firmware Updates (0x14): 2 Slots, no Reset required
Optional Admin Commands (0x0016): Format Frmw_DL Self_Test
Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Maximum Data Transfer Size: 64 Pages
Warning Comp. Temp. Threshold: 75 Celsius
Critical Comp. Temp. Threshold: 80 Celsius

Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 9.00W - - 0 0 0 0 0 0
1 + 4.60W - - 1 1 1 1 0 0
2 + 3.80W - - 2 2 2 2 0 0
3 - 0.0450W - - 3 3 3 3 2000 2000
4 - 0.0040W - - 4 4 4 4 6000 8000

Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
- NVM subsystem reliability has been degraded

SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning: 0x04
Temperature: 36 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 134%
Data Units Read: 473,472,097 [242 TB]
Data Units Written: 1,602,371,132 [820 TB]
Host Read Commands: 26,657,848,436
Host Write Commands: 25,830,242,567
Controller Busy Time: 255,528
Power Cycles: 49
Power On Hours: 16,047
Unsafe Shutdowns: 14
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Thermal Temp. 1 Transition Count: 18
Thermal Temp. 2 Transition Count: 15
Thermal Temp. 1 Total Time: 120
Thermal Temp. 2 Total Time: 167

Error Information (NVMe Log 0x01, max 256 entries)
No Errors Logged


*On "Percentage Used: 134%" stands Debian + nginx + php-fpm + mariadb + postfix (i.e. the main one is almost dead - how? It works on the strength of 1.5 years), and on "2% of use" are ~250 sites (only their files, pictures).

Actually questions:
1) there was a situation of the code "/dev/nvme0n1" and "/dev/nvme1n1" the data exchanged places. Is this normal? (And everything was fine after the reboot)
2) I see "Percentage Used: 134%". How is this possible and how to find out: where does it come from?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
R
Rsa97, 2021-09-28
@Rsa97

Percentage Used is the estimated wear of the drive. It is determined by the volume of the record, some calculated value is taken as 100%. On the disk with the system you have a permanent record, most likely there is a database and a swap. The disk with site files is practically not used, most likely, a significant part of the constantly used files is given from the cache in RAM.
But the spare blocks are not yet involved (Available Spare: 100%), which means that the disk can not be changed yet.

Percentage Used: 2%
Data Units Written: 23,510,229 [12.0 TB]

Percentage Used: 134%
Data Units Written: 1,602,371,132 [820 TB]
134 / 820 * 12 = 1.96 ≈ 2

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question