S
S
susnake2016-01-12 09:49:48
Hard disks
susnake, 2016-01-12 09:49:48

Why is the disk converted to RO on a running server?

Good afternoon.
There is an Ubuntu 14.04.3 LTS server (GNU/Linux 3.19.0-43-generic x86_64), 2 SATA HDDs on one partition / , only /home on the second one. The /home drive is from 2014.
The server has various small services (web, backup), and it works 24/7. Recently, around the end of November, I began to notice that the OS puts this disk in RO mode. After the reboot, everything works, until the new transfer to RO. Today, when rebooting, the system could not determine the HDD and, accordingly, could not load / home and suggested that I continue without it by clicking on S. I waited until it loaded, turned off the server, just in case I changed the PSU and cables. in BIOS, the disk is defined correctly, I start the system - it boots for about 20 minutes. It booted up, the disk was defined and mounted correctly.
Just in case, I completely updated the system, Updated "grub-common, grub-pc, grub-pc-bin, grub2-common, libgnutls-openssl27, libgnutls26, libpng12-0, owncloud, owncloud-config-apache, owncloud-server" on just in case, ran another dist-upgrade, additionally updated "linux-generic-lts-vivid, linux-headers-generic-lts-vivid, linux-image-generic-lts-vivid".
After the update I ran:

:~$ sudo smartctl -i /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.19.0-25-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.14 (AF)
Device Model: ST1000DM003-1ER162
Serial Number: Z4Y87TP3
LU WWN Device Id: 5 000c50 07b966c9a
Firmware Version: CC46
User Capacity: 1 000 204 886 016 bytes [1.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2, ACS -3 T13/2161-D revision 3b
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Mon Jan 11 16:25:45 2016 NOVT
==> WARNING: A firmware update for this drive is available,
see the following Seagate web pages:
knowledge.seagate.com/articles/en_US/FAQ/207931en
knowledge.seagate.com/articles/en_US/FAQ/223651en
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

I start testing:
:~$ sudo smartctl -t long /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.19.0-25-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke , www.smartmontools.org
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 106 minutes for test to complete.
Test will complete after Mon Jan 11 18:12:02 2016
Use smartctl -X to abort test.

I wait and start
:~$ sudo smartctl -l selftest /dev/sda (if I understood man correctly, this is the output of what the program tested)
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.19.0-25-generic] ( local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours ) LBA_of_first_error
# 1 Extended offline Completed without error 00% 1595 -

Those. no errors were found on the disk.
Launched extended view:
705d37f21b1a453181192cdeef0465e1.png
Yes, everything seems to be fine.
Just in case, we check the time (it happened once.)
$ timedatectl
Local time: Mon. 2016-01-11 18:56:48 NOVT
Universal time: Mon. 2016-01-11 12:56:48 UTC
Timezone: Asia/Novosibirsk (NOVT, +0600)
NTP enabled: yes
NTP synchronized: yes
RTC in local TZ: no
DST active: n/a

The time is correct and synchronized.
Checking FS
$ sudo fsck -f /dev/sda1
fsck from util-linux 2.20.1
e2fsck 1.42.9 (4-Feb-2014)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
data: 760314/61054976 files (0.1% non-contiguous), 67586650/244190208 blocks

But here I did not understand. which means 0.1%. Did he find something and fix it, if so, what?
And in general, you can somehow see why the OS suddenly switches the disk to RO mode? There are no power surges, electricity was not cut down (as far as I know). I'm even a little confused.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
N
Nikolay45, 2016-01-13
@Nikolay45

Not for advertising, but for the good of the cause. geektimes.ru/post/258160 Good luck.

S
Slava Kryvel, 2016-01-12
@kryvel

it seems that the disk is still dying,
here you need a write / read test to make sure. but for this you need to transfer the data somewhere.
But it’s still better to do it, otherwise it may end badly

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question