How to understand that it's time for the hard drive to rest?

Nikolai Vasilchuk2013-08-28 15:09:33

linux

Nikolai Vasilchuk, 2013-08-28 15:09:33

For several years, WD Green 1Tb has been working in the "home server".
Recently, the disk has become suspiciously noisy and takes a long time to respond.
How to understand whether it is time for him and how long he will live?

smartctl output

smartctl 5.43 2012-06-30 r3573 [i686-linux-3.7.0-7-generic] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green (Adv. Format)
Device Model:     WDC WD10EARS-00Y5B1
Serial Number:    WD-WCAV57323576
LU WWN Device Id: 5 0014ee 2ae9cc0a5
Firmware Version: 80.00A80
User Capacity:    1 000 204 886 016 bytes [1,00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Wed Aug 28 16:06:11 2013 MSK
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)	Offline data collection activity
          was completed without error.
          Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
          without error or no self-test has ever 
          been run.
Total time to complete Offline 
data collection: 		(20100) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
          Auto Offline data collection on/off support.
          Suspend Offline collection upon new
          command.
          Offline surface scan supported.
          Self-test supported.
          Conveyance Self-test supported.
          Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
          power-saving mode.
          Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
          General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 231) minutes.
Conveyance self-test routine
recommended polling time: 	 (   5) minutes.
SCT capabilities: 	       (0x3031)	SCT Status supported.
          SCT Feature Control supported.
          SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   130   122   021    Pre-fail  Always       -       6491
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1878
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   079   079   000    Old_age   Always       -       15935
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       918
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       96
193 Load_Cycle_Count        0x0032   101   101   000    Old_age   Always       -       299420
194 Temperature_Celsius     0x0022   106   095   000    Old_age   Always       -       41
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Answer the question

In order to leave comments, you need to log in

6 answer(s)

script88, 2013-08-28
@Anonym

The most critical parameters that you should pay attention to are these:
Raw Read Error Rate - the error rate when reading data from a disk, the origin of which is due to the disk hardware.
Spin Up Time - the time it takes for a pack of disks to spin up from rest to operating speed. When calculating the normalized value (Value), the practical time is compared with some reference value set at the factory. A non-deteriorating non-maximum value with Spin Up Retry Count Value = max (RAW_VALUE equal to 0) does not mean anything bad. The difference in time from the reference can be caused by a number of reasons, for example, a drawdown in the voltage of the power supply.
Spin Up Retry Count— the number of repeated attempts to spin up disks to operating speed, if the first attempt was unsuccessful. A non-zero value of RAW_VALUE (respectively, a non-maximum VALUE) indicates problems in the mechanical part of the drive.
Seek Error Rate - error rate when positioning the block of heads. A high value of RAW_VALUE indicates the presence of problems, which may be damaged servos, excessive thermal expansion of disks, mechanical problems in the positioner, etc. A constantly high value of VALUE indicates that everything is fine.
Reallocated Sector Count— number of sector reassignment operations. SMART in modern disks is able to analyze the sector for stability on the fly and, if it is recognized as a failure, reassign it.
In your case, so far, everything is even very good.

Evgeny Elizarov, 2013-08-28
@KorP

I would start by replacing the cable

Shizoid, 2013-08-28
@Shizoid

It is necessary to periodically check with the victoria program (former mhdd), it is for low-level disk scanning.
There are several testing modes, as well as useful utilities for the hard drive. (sector-by-sector copying, password, erasing information, noise level control, and smart, of course, you can watch it)
There, one of the modes completely scans the disk, reading data directly sector-by-sector, taking into account the response speed of each sector.
In the course of work, a sector response map is built in milliseconds.
Each sector on this map is colored differently depending on the response time threshold.
grays are the fastest, greens are slower, browns and reds are very slow, and crosses are broken unreadable sectors.
Link to the forum, there is a program and other useful information on it.
forum.ru-board.com/topic.cgi?forum=5&topic=35147&start=1160#lt
When everything is gray everything is fine.
The more green, and the more red, then this is an alarming sign.
I do not trust smart, more trust in this method of verification.

@xave, 2013-08-28
_

Run long read and write tests through the same smartctl (in the latter case, all information will be erased). Since the test will pass through all blocks, then if there is a bb, the screw will remap it and the Reallocated Sector Count parameter is incremented. In this case, carry in the guarantee, if it is still there.

goldena, 2013-08-29
@goldena

If it makes noise and responds for a long time, even if “subjectively” - change it, as advised above.
SMART and programs are certainly good and wonderful, but not a panacea (backups also do not always and not completely save). WD Green 1Tb, if my memory serves me, it's about 60-70 dollars. It's hard for me to judge how much your time is worth and how much your work depends on the disk on your computer, but you must admit that this is not so much. In most cases, the cost of information is incomparable with this amount. Prices for data recovery from a "broken" disk also bite. Maybe not worth the risk?

Alexey T, 2013-09-02
@Alexeyslav

The problem may not be in the hard drive, but in the server. Overheating, viruses, file fragmentation?
To be honest, the "speed" of the disk should be tested not on the operating system according to sensations, but with a special program. The access time is the same, and take into account that the access time also adds up with the access time to the server itself, the operating system thinks something for itself, etc. so that this whole heap of reasons does not affect the final grade, they must be excluded.
And if necessary, fix it.