M
M
metallix2019-09-26 14:36:21
linux
metallix, 2019-09-26 14:36:21

What is the reason for the constantly dying SSD?

Greetings!
Machine available -
Proc - Intel® Core™ i7-3770 Processor (8 MB Cache, 3.40 GHz)
Mother - Dell Optiplex 9010 0KV62T LGA 1155
RAM - Samsung DDR3 M378B5273DH0-CH9 x4
SSD - Kingston SSD SATA 2.5" 480GB TLC SA400S37/ 480GB
- --
A couple of months after the first installation of the OS (Ubuntu 18.04), short friezes of the system began. Over time, they became more frequent and longer. As a result, everything ended with errors like - Read-Only file system. As a temporary solution, the `fsck` + command helped reboot the system, a little later it was possible not to boot due to the fact that grub was not working.In the end, I decided not to suffer, and buy a new ssd.(UPD: Second disk, exactly the same)
With the new ssd, the problem began to repeat again after a couple of months. Reinstalling the OS in both cases helped for 3-4 weeks, and again everything is new. What could be the problem? Shoveled a bunch of forums, and solutions. Everything is useless. Could the reason be not in the ssd, but in some other component?
Below I will attach the results of several commands
-----

fdisk -l

Disk /dev/sda: 447,1 GiB, 480103981056 bytes, 937703088 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xcf2bfa08
Device Boot Start End Sectors Size Id Type
/dev/sda1 * 2048 1050623 1048576 512M ef EFI (FAT-12/16/32)
/dev/sda2 1052670 937701375 936648706 446,6G 5 Extended
/dev/sda5 1052672 937701375 936648704 446,6G 83 Linux

smartctl -i /dev/sda5

smartctl 6.6 2016-05-31 r4324 [x86_64-linux-5.0.0-29-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: KINGSTON SA400S37480G
Serial Number: 50026B76826371CA
LU WWN Device Id: 5 0026b7 6826371ca
Firmware Version: SBFKB1C2
User Capacity: 480 103 981 056 bytes [480 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-3 T13/2161-D revision 4
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Thu Sep 26 14:28:58 2019 EEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

smartctl -t short -a /dev/sda5

smartctl 6.6 2016-05-31 r4324 [x86_64-linux-5.0.0-29-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: KINGSTON SA400S37480G
Serial Number: 50026B76826371CA
LU WWN Device Id: 5 0026b7 6826371ca
Firmware Version: SBFKB1C2
User Capacity: 480 103 981 056 bytes [480 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-3 T13/2161-D revision 4
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Thu Sep 26 14:30:06 2019 EEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (65535) seconds.
Offline data collection
capabilities: (0x11) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 30) minutes.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x0032 000 100 000 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 2537
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 237
148 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 0
149 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 0
167 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 0
168 Unknown_Attribute 0x0012 100 100 000 Old_age Always - 0
169 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 13
170 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 9
172 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
173 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 3407935
181 Program_Fail_Cnt_Total 0x0032 100 100 000 Old_age Always - 0
182 Erase_Fail_Count_Total 0x0000 100 100 000 Old_age Offline - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0012 100 100 000 Old_age Always - 55
194 Temperature_Celsius 0x0022 075 062 000 Old_age Always - 25 (Min/Max 17/38)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
199 UDMA_CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0
218 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
231 Temperature_Celsius 0x0000 006 006 000 Old_age Offline - 94
233 Media_Wearout_Indicator 0x0032 100 100 000 Old_age Always - 13037
241 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 2911
242 Total_LBAs_Read 0x0032 100 100 000 Old_age Always - 1702
244 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 52
245 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 63
246 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 821280
246 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 821280
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 2174 -
# 2 Short offline Completed without error 00% 1575 -
# 3 Short offline Completed without error 00% 581 -
# 4 Short offline Aborted by host 00% 581 -
# 5 Extended offline Completed without error 00% 385 -
# 6 Short offline Completed without error 00% 102 -
Selective Self-tests/Logging not supported
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 2 minutes for test to complete.
Test will complete after Thu Sep 26 14:32:06 2019

Answer the question

In order to leave comments, you need to log in

9 answer(s)
R
RickNRoll, 2019-09-26
@RickNRoll

First of all, you need to look at the PSU, since if there is any voltage mismatch (or strong deviations under load), then this can directly affect the controller itself and the duration of its operation. Usually, when working with hard drives and SSDs, a deviation rate of 5% is acceptable, but sometimes, depending on the PSU, these deviations can be even larger (due to drawdowns or power surges).
There are many articles on the net on the topic “voltage tolerance during hard drive operation” (which also applies to SSDs). Might be worth going in that direction.

A
Alexey Dmitriev, 2019-09-26
@SignFinder

1. SMART from the disk you need to watch.
2. What file system is worth, do we support TRIM.
3. How is the journal in the file system - it increases the load on the SSD.

A
Alexander Semenenko, 2019-09-26
@semenenko88

Also check that ahci mode for sata is enabled in the bios.
Most likely you have ext4, it supports TRIM. And if the disk supports it, you can find something like this:
Well, if the disk and file system support TRIM, then you can include the discard option in /etc/fstab:
Possibly a bad sata cable, or possibly a sata socket on the mat. board. There may be a problem with the power supply.

D
d22b, 2019-09-26
@d22b

You can also try running `iostat 60` in a terminal to see if there might be a really large number of entries in idle or from some application. In SMART, it is not clear with the counter of the written, if only Total_LBAs_Written is in GB (then it turns out 2911 against 1702 read).
Somewhere else I saw advice to leave unallocated space on the SSD that does not belong to any partition. I always do this and so far everything is alive with comparable wear.

G
grabbee, 2019-09-26
@grabbee

I had a problem with my mother. Completely replaced under warranty. It was very similar, but it hung even during the start of the iron for 1 minute somewhere and could not turn off when it was turned off (randomly). In the same way, ridonly and rude did not appear and the system did not see and friezes and reinstalled. I immediately thought about the disk, but the service said everything was ok with the disk. I've been with him for over a year now.

R
Ruslan, 2019-09-27
@msHack

bp check

M
Maxim Yaroshevich, 2019-10-03
@YMax

A similar situation under Windows 10 - SanDisk SSD began to fall off at system startup. Updating the BIOS, replacing the PSU does not help, I suspect that the matter is in the disk. In general, desktop SSDs in terms of survivability can present surprises - not so long ago, two AData SSDs ceased to be detected anywhere without any warnings - they simply disappeared from the system and that's it.

V
Vladimir Bobylev, 2019-10-03
@ShturmN

On Ubuntu there was a bug in the laptop-mode-tools package. He often extinguished the disk when idle. As a result, the start / stop HDD parameter failed. And I don't remember fixing it. It was solved by a clear config setting.

A
Andrey Dugin, 2019-10-04
@adugin

Show the result of the command:
The problem may be in the size of the swap file. On my laptop with 8 GB of RAM, Ubuntu 18.04 automatically set it to 2 GB during installation, and I observed regular freezes up to 5 minutes. After increasing the size of the swap file to 16 GB, everything began to work fine. Manual is here .

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question