K
K
KEKSOV2013-09-01 18:01:45
VMware
KEKSOV, 2013-09-01 18:01:45

Problems (probably) with a disk under ESXi 5.1 at Hetzner?

“Suddenly” virtual machines under ESXi 5.1 stopped starting on the server “with cheap non-server hardware” from Hetzner. The host itself starts up, I can go to it via ssh. I can also go to the failed drive and ls and read some files, but most attempts to read other files end up like this:

/vmfs/volumes/5060d5c3-875cbeb8-a8d3-406186e9d73d/Gentoo# tail -f vmware.log
tail: can't open 'vmware.log': Input/output error
tail: no files
There is little hope that the situation can be corrected by running some command to check and repair the file system. Perhaps someone has encountered a similar problem and knows these magic words.

Here is a snippet from /var/log/vmkernel.log that I think is relevant to this issue.

2013-09-01T13:24:44.370Z cpu0:4942)FSS: 6750: Mounting fs visorfs (410007af2850) with -o 0,249,0,0,0755,hostdstats on file descriptor 410007684470
2013-09-01T13:234:45 CPU4: 4970) HBX: 5056: Marking HB [HB State ABCDEF04 OFFSET 3567616 GEN 165 STAMPUS 150540149 UUID 50699EF0-01E6207D-F9B9-406186E9D-F9B9-406186E9D73D JRNL <FB 405000> DRV 14.58] ON VOL 'DATASTORE1'
2013-09-01T13: 24: 45.131 Z CPU4: 4970) HBX: 5134: Marked HB [HB State ABCDEF04 OFFSET 3567616 GEN 165 STAMPUS 44333239 UUID 50699EF0-01E6207D-F9B9-406186E9D73D JRNL <FB 405000> DrV 14.58] ON VOL 'DATASTORE1'
2013-09-01T13: 24: 45.131Z cpu4:4970)J3: 3726: Replaying journal at <FB 405000>, gen 165
2013-09-01T13:24:50.208Z cpu6:4802)Tcpip: 2062: soaccept failed with 53
2013-09-01T13:24:56.134Z cpu5 :4239)<6>ahci_scsi_abort: cmd 0x2a (0x412401844540), entering... entering...
2013-09-01T13:24:58.374Z cpu5:4239)<6>ata3: ahci_port_reset, hard reseting port
2013-09-01T13:24:58.374Z cpu2:4630)<3>ata3: failed to read log page 10h (errno=-5)
2013-09-01T13:24:58.374Z cpu2:4630)<3>ata3.00: exception Emask 0x1 SAct 0x3 SErr 0x0 action 0x2 frozen
2013-09-01T13:24:58.374Z cpu2:4630)<3>ata3.00: irq_stat 0x40000008
2013-09-01T13:24:58.374Z cpu2:4630)<3>ata3.00: cmd 60/40:00:00:e7:0c/00:00 :32:00:00/40 tag 0 ncq 32768 in
res 40/00:04:00:e7:0c/00:00:32:00:00/40 Emask 0x1 (device error)
2013-09-01T13:24:58.374Z cpu2:4630)<3>ata3.00: status: { DRDY
} 01:08:39:bb:9c/00:00:00:00:00/40 tag 1 ncq 512 out
res 40/00:04:00:e7:0c/00:00:32:00:00/40 Emask 0x1 (device error)
2013-09-01T13:24:58.374Z cpu2:4630)<3>ata3.00: status: { DRDY }
2013-09-01T13:24:58.696Z cpu2:4630)<6>ata3 : soft resetting link
2013-09-01T13:24:58.696Z cpu2:4630)<6>ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
2013-09-01T13:24:58.707Z cpu2:4630)<6
<3>ata3: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x9 t4 2013-09-01T13
:24:58.707Z cpu2 :4630)<3>ata3: irq_stat 0x00400000, PHY RDY changed
2013-09-01T13:24: 58.719Z
cpu2 :4630)<6>ata3.00: configured for UDMA/133 -01T13: 24: 58.719Z cpu3: 5031) NMP: nmp_ThrottleLogForDevice: 2319: Cmd 0x28 (0x4124007b5640, 4970) to dev "t10.ATA _____ SAMSUNG_HD754JJ_________________________S281JX0BA00347______" on path "vmhba33: C0: T0: L0" Failed: H: 0x0 D: 0x2 P: 0x0 Valid sense dat $
2013-09-01T13: 24: 58.719Z cpu3: 5031) ScsiDeviceIO: 2303: Cmd (0x4124007b5640) 0x28, CmdSN 0x39 from world 4970 to dev "t10.ATA _____ SAMSUNG_HD754JJ_________________________S281JX0BA00347______" failed H: 0x0 D: 0x2 P:0x0 Valid sense data: 0xb 0x0 0x0.
2013-09-01T13:24:58.880Z cpu5:4239)<6>ata3: ahci_port_reset: SUCCEEDED
2013-09-01T13: 24: 58.880z CPU5: 4239) <6> AHCI_SCSI_ABORT: CMD 0x2a (0x412401844540), succeeded
2013-09-01T13: 24: 58.880z CPU3: 5031) NMP: NMP_THROTTLELOGFORDEVICE: 2319: CMD 0x2a (0x4124007b5440 , 4104) to dev "t10.ATA_____SAMSUNG_HD754JJ_____________________________S281JX0BA00347______" on path "vmhba33:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense dat$
2013-09-01T13:24:58.919Z cpu6:5103) WARNING: LinScsi: SCSILinuxQueueCommand:1193:queuecommand failed with status = 0x1056 Unknown status vmhba33:0:0:0 (driver name: ahci) - Message repeated 1 time
2013-09-01T13:25:00.132Z cpu5:4239)<6 >ahci_scsi_abort: cmd 0x2a (0x412401813000), entering...
2013-09-01T13:25:00.132Z cpu5:4239)<7>ata3: ahci_port_reset, entering...
2013-09-01T13:25:00.523Z cpu2:4630)<3>ata3.00: exception Emask 0x0 SAct 0xf SErr 0x0 action 0x0
2013-09-01T13:25:00.523Z cpu2:4630)<3>ata3.00 : irq_stat 0x40000008
2013-09-01T13:25:00.523Z cpu2:4630)<3>ata3.00: cmd 60/40:10:00:e7:0c/00:00:32:00:00/40 tag 2 ncq 32768 in
res 41/40:00:30:e7:0c/2b:00:32:00:00/40 Emask 0x409 (media error) <F>
2013-09-01T13:25:00.523Z cpu2:4630) <3>ata3.00: status: { DRDY ERR }
2013-09-01T13:25:00.523Z cpu2:4630)<3>ata3.00: error: { UNC }
2013-09-01T13:25:00.535Z cpu2 :4630)<6>ata3.00: configured for UDMA/133
2013-09-01T13:25:00.535Z cpu2:4630)<6>ata3: EH complete
2013-09-01T13:25:00.535Z cpu7:5621)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237:NMP device "t10.ATA_____SAMSUNG_HD754JJ_________________________S281JX0BA00347______" state in doubt; requested fast path state update...
2013-09-01T13:25:00.535Z cpu7:5621)ScsiDeviceIO: 2316: Cmd(0x4124007f4600) 0x28, CmdSN 0xe from world 5184 to dev "t10.ATA_____SAMSUNG_HD754JJJ_________________________S281JX0BA03" 0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2013-09-01T13: 25: 00.535Z cpu7: 5621) ScsiDeviceIO: 2316: Cmd (0x4124007b5140) 0x28, CmdSN 0x1bd from world 5605 to dev "t10.ATA _____ SAMSUNG_HD754JJ_________________________S281JX0BA00347______" failed H: 0x3 D: 0x0 P: 0x0 Possible sense data: 0x0 0x0 0x0.
2013-09-01T13: 25: 00.535Z cpu7: 5621) ScsiDeviceIO: 2316: Cmd (0x4124007b5640) 0x28, CmdSN 0x39 from world 4970 to dev "t10.ATA _____ SAMSUNG_HD754JJ_________________________S281JX0BA00347______" failed H: 0x0 D: 0x2 P: 0x0 Valid sense data: 0x3 0x11 0x4.
2013-09-01T13:25:00.535Z cpu1:4970)WARNING: HBX: 4497: Replay of journal <FB 405000> on vol 'datastore1' failed: I/O error
2013-09-01T13:25:00.535Z cpu1: 4970) HBX: 2441: Waiting for TIMED OUT [HB State ABCDEF02 OFFSET 3568128 GEN 57 STAMPUS 58153836 UUID 5223400B-E3D782F4-4688-406186E9D73D JRNL <FB 0> DATASTORE1 '
2013-09-01T13: 25: 00.535 Z cpu7:5605)User: 2742: wantCoreDump : esxcfg-dumppart -enabled : 0

Thanks

Answer the question

In order to leave comments, you need to log in

3 answer(s)
P
Puma Thailand, 2013-09-01
@opium

Well, just replace the disks and deploy the virtual machines from the backup.

S
script88, 2013-09-01
@script88

This information is not enough, look towards smart, badblock.

J
joneleth, 2013-09-04
@joneleth

Judging by the log, a complete ass with a disk. We must change.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question