You are not logged in.

#1 2013-09-20 02:17:59

wolfdogg
Member
From: Portland, OR, USA
Registered: 2011-05-21
Posts: 545

/dev/sdc errors, is my hard drive failing

im am seeing this in journalctl

smartd[446]: Device: /dev/sda [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 50 to 51
Sep 19 11:13:12 falcon smartd[446]: Device: /dev/sda [SAT], SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 50 to 51
Sep 19 11:13:12 falcon smartd[446]: Device: /dev/sdc [SAT], 79 Currently unreadable (pending) sectors
Sep 19 11:13:12 falcon smartd[446]: Device: /dev/sdc [SAT], 79 Offline uncorrectable sectors
Sep 19 11:13:12 falcon smartd[446]: Device: /dev/sdc [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 102 to 101
Sep 19 11:13:12 falcon smartd[446]: Device: /dev/sdd [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 98 to 96
Sep 19 11:13:12 falcon smartd[446]: Device: /dev/sdd [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 49 to 48
Sep 19 11:13:12 falcon smartd[446]: Device: /dev/sdd [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 51 to 52
Sep 19 11:13:12 falcon smartd[446]: Device: /dev/sdd [SAT], SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 57 to 56

should i suspect that the hard drive if failing based on ;
1) unreadable and oncorrectable sectors
2) SMART Prefailure Attributes,
or all of the above?  Or need i be looking elsewhere?

I read that the Raw_Read_Error_Rate changes are not a sign of a failing hard drive, but i find it peculiar that they are right smack next to the errors in the log.


Node.js, PHP Software Architect and Engineer (Full-Stack/DevOps)
GitHub  | LinkedIn

Offline

#2 2013-09-20 03:10:42

r0b0t
Member
From: /tmp
Registered: 2009-05-24
Posts: 505

Re: /dev/sdc errors, is my hard drive failing

To me it's enough viewing read/write errors on dmesg.. However it depends sometimes the hdd keeps up, sometimes it dies very quickly.. Make sure to backup

Offline

#3 2013-09-20 03:14:21

wolfdogg
Member
From: Portland, OR, USA
Registered: 2011-05-21
Posts: 545

Re: /dev/sdc errors, is my hard drive failing

thanks, and i got myself a handful of those now looking

[17905.667484] systemd-journald[124]: Failed to write entry (25 items, 2720322 bytes) despite vacuuming, ignoring: Argument list too long
[17906.997296] systemd-journald[124]: Deleted empty journal /var/log/journal/a32a8aa2b4901b9a1f51ec8900000251/user-1000@ac1a54c8b7774108bcb0808c6fe44ec2-0000000000000000-0000000000000000.journal (3739648 bytes).
[17907.000424] systemd-journald[124]: Vacuuming done, freed 7528448 bytes
[17907.005566] systemd-journald[124]: Failed to write entry (25 items, 2712130 bytes) despite vacuuming, ignoring: Argument list too long
[17908.222014] systemd-journald[124]: Deleted empty journal /var/log/journal/a32a8aa2b4901b9a1f51ec8900000251/user-1000@ac1a54c8b7774108bcb0808c6fe44ec2-0000000000000000-0000000000000000.journal (3739648 bytes).
[17908.226755] systemd-journald[124]: Vacuuming done, freed 7528448 bytes
[17908.231934] systemd-journald[124]: Failed to write entry (25 items, 2708034 bytes) despite vacuuming, ignoring: Argument list too long

which is the way i was alerted in the first place. They were showing up in the terminal.
guess ill ghost it, but should i toss it?  i was thinking of using it as a dedicated cache drive for my zfs, if ya'all think i should scrap it.

Last edited by wolfdogg (2013-09-20 03:15:18)


Node.js, PHP Software Architect and Engineer (Full-Stack/DevOps)
GitHub  | LinkedIn

Offline

#4 2013-09-20 03:20:06

wolfdogg
Member
From: Portland, OR, USA
Registered: 2011-05-21
Posts: 545

Re: /dev/sdc errors, is my hard drive failing

[root@falcon samba]# smartctl -t short /dev/sdc
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.11.1-1-ARCH] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 1 minutes for test to complete.
Test will complete after Thu Sep 19 12:25:43 2013

Use smartctl -X to abort test.
[root@falcon samba]# smartctl -H /dev/sdc
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.11.1-1-ARCH] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
Please note the following marginal Attributes:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
190 Airflow_Temperature_Cel 0x0022   050   030   045    Old_age   Always   In_the_past 50 (17 216 50 44 0)

Node.js, PHP Software Architect and Engineer (Full-Stack/DevOps)
GitHub  | LinkedIn

Offline

#5 2013-09-20 03:32:51

r0b0t
Member
From: /tmp
Registered: 2009-05-24
Posts: 505

Re: /dev/sdc errors, is my hard drive failing

The errors you printed looked more like filesystem issues (journaling), a read/write error is more like "cannot read from sector xxxx on hdd" or failed to write on sector xxxxxx while you try using a program or an operation which uses the damaged sectors, if I don't see those errors than the hdd is fine. Will wait also for other members opinion smile

Offline

#6 2013-09-20 19:34:00

wolfdogg
Member
From: Portland, OR, USA
Registered: 2011-05-21
Posts: 545

Re: /dev/sdc errors, is my hard drive failing

r0b0t wrote:

if I don't see those errors than the hdd is fine.

  Am i to assume you meant the inverse is true then?  The errors are apparent, so the hdd is not fine? 

Can anybody else shed some light on this? 

Ill give you some symptoms that my system has been exhibiting. 
First of all,
1)I will come to the console and tap the space bar, and the screen wont turn on, at this time, the num lock key is stuck on on, wont toggle on and off, this is usually a bad sign, this forces me to reset the computer.
2) when im shelled in from another machine i will lose access randomly after atleast a few minutes, possibly more than 15 minutes, even more likely not until i do something system intensive or walk away for a period of time then come back.  At this point I cannot re connect in  (until system is reset (rebooted))
3) i am running a backup over the network to this machine(to a different hard drive array) which takes hours, some random time in the middle the backup fails, probably related to the system locking up as noted in last 2 symptoms.
4) right now, for the first time i head the system give one long beep, pause for about 15 seconds, another long beep, repeat a couple times, once i hit the space bar to see if the system was alive it stopped, but again, the screen isnt coming up, numlock is stuck on, prompting for a reset again. 

I can reset it now, and view the logs, ill post back once thats done. 

Is there any logs i should be checking specifically?  Or would journalctl suffice?  Should i suspect that its the hard drive?


Node.js, PHP Software Architect and Engineer (Full-Stack/DevOps)
GitHub  | LinkedIn

Offline

#7 2013-09-20 19:40:14

wolfdogg
Member
From: Portland, OR, USA
Registered: 2011-05-21
Posts: 545

Re: /dev/sdc errors, is my hard drive failing

I can add some more information, on boot i see message similar to this

/***********************************************/
*** The root device cannot be mounted ***
/**********************************************/

and here is journalctl

-- Logs begin at Fri 2013-09-20 04:41:41 PDT, end at Fri 2013-09-20 04:43:11 PDT. --
Sep 20 04:41:41 falcon kernel: powernow-k8: Found 1 AMD Athlon(tm) 64 X2 Dual Core Processor 4600+ (2 cpu cores) (version 2.20.00)
Sep 20 04:41:41 falcon kernel: wmi: Mapper loaded
Sep 20 04:41:41 falcon systemd[1]: Found device MCP61 Ethernet.
Sep 20 04:41:41 falcon systemd-udevd[205]: renamed network interface eth1 to net0
Sep 20 04:41:41 falcon kernel: ACPI: PCI Interrupt Link [APC8] enabled at IRQ 16
Sep 20 04:41:41 falcon kernel: nouveau  [  DEVICE][0000:02:00.0] BOOT0  : 0x04b200b1
Sep 20 04:41:41 falcon kernel: nouveau  [  DEVICE][0000:02:00.0] Chipset: G73 (NV4B)
Sep 20 04:41:41 falcon kernel: nouveau  [  DEVICE][0000:02:00.0] Family : NV40
Sep 20 04:41:41 falcon kernel: nouveau  [   VBIOS][0000:02:00.0] checking PRAMIN for image...
Sep 20 04:41:41 falcon kernel: nouveau  [   VBIOS][0000:02:00.0] ... appears to be valid
Sep 20 04:41:41 falcon kernel: nouveau  [   VBIOS][0000:02:00.0] using image from PRAMIN
Sep 20 04:41:41 falcon kernel: nouveau  [   VBIOS][0000:02:00.0] BIT signature found
Sep 20 04:41:41 falcon kernel: nouveau  [   VBIOS][0000:02:00.0] version 05.73.22.50.02
Sep 20 04:41:41 falcon kernel: nouveau  [     PFB][0000:02:00.0] RAM type: DDR2
Sep 20 04:41:41 falcon kernel: nouveau  [     PFB][0000:02:00.0] RAM size: 512 MiB
Sep 20 04:41:41 falcon kernel: nouveau  [     PFB][0000:02:00.0]    ZCOMP: 379904 tags
Sep 20 04:41:41 falcon kernel: nouveau  [  PTHERM][0000:02:00.0] FAN control: none / external
Sep 20 04:41:41 falcon kernel: nouveau  [  PTHERM][0000:02:00.0] fan management: disabled
Sep 20 04:41:41 falcon kernel: nouveau  [  PTHERM][0000:02:00.0] internal sensor: yes
Sep 20 04:41:41 falcon kernel: [TTM] Zone  kernel: Available graphics memory: 2026604 kiB
Sep 20 04:41:41 falcon kernel: [TTM] Initializing pool allocator
Sep 20 04:41:41 falcon kernel: [TTM] Initializing DMA pool allocator
Sep 20 04:41:41 falcon kernel: nouveau  [     DRM] VRAM: 507 MiB
Sep 20 04:41:41 falcon kernel: nouveau  [     DRM] GART: 512 MiB
Sep 20 04:41:41 falcon kernel: nouveau  [     DRM] TMDS table version 1.1
Sep 20 04:41:41 falcon kernel: nouveau W[     DRM] TMDS table script pointers not stubbed
Sep 20 04:41:41 falcon kernel: nouveau  [     DRM] DCB version 3.0
Sep 20 04:41:41 falcon kernel: nouveau  [     DRM] DCB outp 00: 01000300 00000028
Sep 20 04:41:41 falcon kernel: nouveau  [     DRM] DCB outp 01: 0c011322 00000000
Sep 20 04:41:41 falcon kernel: nouveau  [     DRM] DCB outp 02: 04011320 00000028
Sep 20 04:41:41 falcon kernel: nouveau  [     DRM] DCB outp 03: 020223f1 00c0c080
Sep 20 04:41:41 falcon kernel: nouveau  [     DRM] DCB conn 00: 1030
Sep 20 04:41:41 falcon kernel: nouveau  [     DRM] DCB conn 01: 0100
Sep 20 04:41:41 falcon kernel: nouveau  [     DRM] DCB conn 02: 0210
Sep 20 04:41:41 falcon kernel: nouveau  [     DRM] DCB conn 03: 0211
Sep 20 04:41:41 falcon kernel: nouveau  [     DRM] DCB conn 04: 0213
Sep 20 04:41:41 falcon kernel: nouveau  [     DRM] Saving VGA fonts
Sep 20 04:41:41 falcon kernel: [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
Sep 20 04:41:41 falcon kernel: [drm] No driver support for vblank timestamp query.
Sep 20 04:41:41 falcon kernel: nouveau  [     DRM] 0xC28A: Parsing digital output script table
Sep 20 04:41:41 falcon kernel: nouveau  [     DRM] 1 available performance level(s)
Sep 20 04:41:41 falcon kernel: nouveau  [     DRM] 0: core 500MHz shader 500MHz memory 400MHz voltage 1150mV fanspeed 100%
Sep 20 04:41:41 falcon kernel: nouveau  [     DRM] c: core 400MHz shader 400MHz memory 405MHz
Sep 20 04:41:41 falcon kernel: nouveau  [     DRM] MM: using M2MF for buffer copies
Sep 20 04:41:41 falcon kernel: nouveau  [     DRM] Setting dpms mode 3 on TV encoder (output 3)
Sep 20 04:41:41 falcon kernel: nouveau  [     DRM] allocated 1280x1024 fb: 0x9000, bo ffff8801182fc000
Sep 20 04:41:41 falcon kernel: fbcon: nouveaufb (fb0) is primary device
Sep 20 04:41:41 falcon kernel: Console: switching to colour frame buffer device 160x64
Sep 20 04:41:41 falcon kernel: nouveau 0000:02:00.0: fb0: nouveaufb frame buffer device
Sep 20 04:41:41 falcon kernel: nouveau 0000:02:00.0: registered panic notifier
Sep 20 04:41:41 falcon kernel: [drm] Initialized nouveau 1.1.1 20120801 for 0000:02:00.0 on minor 0
Sep 20 04:41:41 falcon systemd[1]: Found device ST3250823AS.
Sep 20 04:41:41 falcon systemd[1]: Activating swap /dev/disk/by-uuid/47279c0b-a92f-43c7-90c6-62f1cee8a154...
Sep 20 04:41:41 falcon systemd[1]: Found device ST3250823AS.
Sep 20 04:41:41 falcon systemd[1]: Starting File System Check on /dev/disk/by-uuid/63e4be21-1d4d-493f-8b39-5c5abe5faf06...
Sep 20 04:41:41 falcon systemd[1]: Activated swap /dev/disk/by-uuid/47279c0b-a92f-43c7-90c6-62f1cee8a154.
Sep 20 04:41:41 falcon systemd[1]: Starting Swap.
Sep 20 04:41:41 falcon systemd[1]: Reached target Swap.
Sep 20 04:41:41 falcon kernel: input: HDA Digital PCBeep as /devices/pci0000:00/0000:00:05.0/input/input5
Sep 20 04:41:41 falcon kernel: Adding 4200992k swap on /dev/sdb2.  Priority:-1 extents:1 across:4200992k FS
Sep 20 04:41:42 falcon systemd[1]: Found device ST3250823AS.
Sep 20 04:41:42 falcon systemd[1]: Starting File System Check on /dev/disk/by-uuid/cfcb22da-6f8c-40f6-85bc-658446d68496...
Sep 20 04:41:42 falcon systemd-fsck[406]: /dev/sdb4: recovering journal
Sep 20 04:41:43 falcon kernel: input: HDA NVidia Front Headphone as /devices/pci0000:00/0000:00:05.0/sound/card0/input6
Sep 20 04:41:43 falcon kernel: input: HDA NVidia Line Out Side as /devices/pci0000:00/0000:00:05.0/sound/card0/input7
Sep 20 04:41:43 falcon kernel: input: HDA NVidia Line Out CLFE as /devices/pci0000:00/0000:00:05.0/sound/card0/input8
Sep 20 04:41:43 falcon kernel: input: HDA NVidia Line Out Surround as /devices/pci0000:00/0000:00:05.0/sound/card0/input9
Sep 20 04:41:43 falcon kernel: input: HDA NVidia Line Out Front as /devices/pci0000:00/0000:00:05.0/sound/card0/input10
Sep 20 04:41:43 falcon kernel: input: HDA NVidia Line Out Front as /devices/pci0000:00/0000:00:05.0/sound/card0/input10
Sep 20 04:41:43 falcon kernel: input: HDA NVidia Line as /devices/pci0000:00/0000:00:05.0/sound/card0/input11
Sep 20 04:41:43 falcon kernel: input: HDA NVidia Front Mic as /devices/pci0000:00/0000:00:05.0/sound/card0/input12
Sep 20 04:41:43 falcon kernel: input: HDA NVidia Rear Mic as /devices/pci0000:00/0000:00:05.0/sound/card0/input13
Sep 20 04:41:43 falcon systemd[1]: Starting Sound Card.
Sep 20 04:41:43 falcon systemd[1]: Reached target Sound Card.
Sep 20 04:41:44 falcon kernel: SPL: using hostid 0x007f0100
Sep 20 04:41:44 falcon zpool[194]: no pools available to import
Sep 20 04:41:44 falcon systemd-fsck[406]: /dev/sdb4: clean, 60395/11739136 files, 4339848/46925865 blocks (check after next mount)
Sep 20 04:41:44 falcon systemd[1]: Started File System Check on /dev/disk/by-uuid/63e4be21-1d4d-493f-8b39-5c5abe5faf06.
Sep 20 04:41:44 falcon systemd[1]: Mounting /home...
Sep 20 04:41:44 falcon zfs[415]: cannot mount '/backup': directory is not empty
Sep 20 04:41:44 falcon zfs[415]: cannot mount '/backup/falcon': directory is not empty
Sep 20 04:41:44 falcon kernel: EXT4-fs (sdb4): mounting ext3 file system using the ext4 subsystem
Sep 20 04:41:44 falcon systemd-fsck[410]: /dev/sdb1 was not cleanly unmounted, check forced.
Sep 20 04:41:45 falcon kernel: EXT4-fs (sdb4): mounted filesystem with ordered data mode. Opts: data=ordered
Sep 20 04:41:45 falcon systemd[1]: Mounted /home.
Sep 20 04:41:45 falcon systemd[1]: zfs.service: main process exited, code=exited, status=1/FAILURE
Sep 20 04:41:45 falcon systemd[1]: Failed to start Zettabyte File System (ZFS).
Sep 20 04:41:45 falcon systemd[1]: Unit zfs.service entered failed state.
Sep 20 04:41:45 falcon systemd-fsck[410]: /dev/sdb1: 38/26104 files (23.7% non-contiguous), 27567/104388 blocks
Sep 20 04:41:45 falcon systemd[1]: Started File System Check on /dev/disk/by-uuid/cfcb22da-6f8c-40f6-85bc-658446d68496.
Sep 20 04:41:45 falcon systemd[1]: Mounting /boot...
Sep 20 04:41:45 falcon kernel: EXT4-fs (sdb1): mounting ext2 file system using the ext4 subsystem
Sep 20 04:41:45 falcon systemd[1]: Mounted /boot.
Sep 20 04:41:45 falcon systemd[1]: Starting Local File Systems.
Sep 20 04:41:45 falcon systemd[1]: Reached target Local File Systems.
Sep 20 04:41:45 falcon systemd[1]: Starting Recreate Volatile Files and Directories...
Sep 20 04:41:45 falcon kernel: EXT4-fs (sdb1): mounted filesystem without journal. Opts: (null)
Sep 20 04:41:45 falcon systemd[1]: Starting Trigger Flushing of Journal to Persistent Storage...
Sep 20 04:41:45 falcon systemd[1]: Started Recreate Volatile Files and Directories.
Sep 20 04:41:45 falcon systemd[1]: Starting Update UTMP about System Reboot/Shutdown...
Sep 20 04:41:45 falcon systemd[1]: Started Update UTMP about System Reboot/Shutdown.
Sep 20 04:41:45 falcon systemd[1]: Starting System Initialization.
Sep 20 04:41:45 falcon systemd[1]: Reached target System Initialization.
Sep 20 04:41:45 falcon systemd[1]: Starting SWAT Samba Web Admin Tool.
Sep 20 04:41:45 falcon systemd[1]: swat.socket failed to listen on sockets: Cannot assign requested address
Sep 20 04:41:45 falcon systemd[1]: Failed to listen on SWAT Samba Web Admin Tool.
Sep 20 04:41:45 falcon systemd[1]: Unit swat.socket entered failed state.
Sep 20 04:41:45 falcon systemd[1]: Starting D-Bus System Message Bus Socket.
Sep 20 04:41:45 falcon systemd[1]: Listening on D-Bus System Message Bus Socket.
Sep 20 04:41:45 falcon systemd[1]: Starting Sockets.
Sep 20 04:41:45 falcon systemd[1]: Reached target Sockets.
Sep 20 04:41:45 falcon systemd[1]: Starting Daily Cleanup of Temporary Directories.
Sep 20 04:41:45 falcon systemd[1]: Started Daily Cleanup of Temporary Directories.
Sep 20 04:41:45 falcon systemd[1]: Starting Timers.
Sep 20 04:41:45 falcon systemd[1]: Reached target Timers.
Sep 20 04:41:45 falcon systemd[1]: Starting Basic System.
Sep 20 04:41:45 falcon systemd[1]: Reached target Basic System.
Sep 20 04:41:45 falcon systemd[1]: Starting Networking for netctl profile wolfnet...
Sep 20 04:41:45 falcon systemd[1]: Started SSH Key Generation.
Sep 20 04:41:45 falcon systemd[1]: Starting Self Monitoring and Reporting Technology (SMART) Daemon...
Sep 20 04:41:45 falcon systemd[1]: Started Self Monitoring and Reporting Technology (SMART) Daemon.
Sep 20 04:41:45 falcon systemd[1]: Starting Networking for netctl profile hispeednet...
Sep 20 04:41:45 falcon systemd[1]: Starting Periodic Command Scheduler...
Sep 20 04:41:45 falcon systemd[1]: Started Periodic Command Scheduler.
Sep 20 04:41:45 falcon systemd[1]: Starting MySQL database server...
Sep 20 04:41:46 falcon crond[444]: (CRON) INFO (running with inotify support)
Sep 20 04:41:46 falcon systemd[1]: Starting Login Service...
Sep 20 04:41:46 falcon systemd[1]: Starting D-Bus System Message Bus...
Sep 20 04:41:46 falcon systemd[1]: Started D-Bus System Message Bus.
Sep 20 04:41:49 falcon systemd-journal[122]: Permanent journal is using 10.8M (max 4.0G, leaving 4.0G of free 0B, current limit 0B).
Sep 20 04:41:46 falcon systemd[1]: Started Trigger Flushing of Journal to Persistent Storage.
Sep 20 04:41:46 falcon systemd[1]: Starting Permit User Sessions...
Sep 20 04:41:46 falcon systemd[1]: Started Permit User Sessions.
Sep 20 04:41:46 falcon systemd[1]: Starting Getty on tty1...
Sep 20 04:41:46 falcon systemd[1]: Started Getty on tty1.
Sep 20 04:41:46 falcon systemd[1]: Starting Login Prompts.
Sep 20 04:41:46 falcon systemd[1]: Reached target Login Prompts.
Sep 20 04:41:47 falcon systemd-logind[453]: New seat seat0.
Sep 20 04:41:47 falcon systemd[1]: Started Login Service.
Sep 20 04:41:47 falcon systemd-logind[453]: Watching system buttons on /dev/input/event3 (Power Button)
Sep 20 04:41:47 falcon systemd-logind[453]: Watching system buttons on /dev/input/event2 (Power Button)
Sep 20 04:41:51 falcon systemd[1]: Started Networking for netctl profile hispeednet.
Sep 20 04:41:51 falcon systemd[1]: Started Networking for netctl profile wolfnet.
Sep 20 04:41:51 falcon systemd[1]: Starting Network.
Sep 20 04:41:51 falcon systemd[1]: Reached target Network.
Sep 20 04:41:51 falcon systemd[1]: Starting OpenSSH Daemon...
Sep 20 04:41:51 falcon systemd[1]: Started OpenSSH Daemon.
Sep 20 04:41:51 falcon systemd[1]: Starting Samba NetBIOS name server...
Sep 20 04:41:51 falcon systemd[1]: Starting Apache Web Server...
Sep 20 04:41:47 falcon smartd[442]: smartd 6.2 2013-07-26 r3841 [x86_64-linux-3.11.1-1-ARCH] (local build)
Sep 20 04:41:47 falcon smartd[442]: Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
Sep 20 04:41:47 falcon smartd[442]: Opened configuration file /etc/smartd.conf
Sep 20 04:41:47 falcon smartd[442]: Drive: DEVICESCAN, implied '-a' Directive on line 23 of file /etc/smartd.conf
Sep 20 04:41:47 falcon smartd[442]: Configuration file /etc/smartd.conf was parsed, found DEVICESCAN, scanning devices
Sep 20 04:41:47 falcon smartd[442]: Device: /dev/sda, type changed from 'scsi' to 'sat'
Sep 20 04:41:47 falcon smartd[442]: Device: /dev/sda [SAT], opened
Sep 20 04:41:47 falcon smartd[442]: Device: /dev/sda [SAT], ST2000DM001-9YN164, S/N:W1E07E0G, WWN:5-000c50-045406de0, FW:CC4C, 2.00 TB
Sep 20 04:41:47 falcon smartd[442]: Device: /dev/sda [SAT], found in smartd database: Seagate Barracuda 7200.14 (AF)
Sep 20 04:41:47 falcon smartd[442]: Device: /dev/sda [SAT], WARNING: A firmware update for this drive is available,
Sep 20 04:41:47 falcon smartd[442]: see the following Seagate web pages:
Sep 20 04:41:51 falcon smartd[442]: http://knowledge.seagate.com/articles/en_US/FAQ/207931en
Sep 20 04:41:51 falcon smartd[442]: http://knowledge.seagate.com/articles/en_US/FAQ/223651en
Sep 20 04:41:51 falcon network[443]: Starting network profile 'hispeednet'...
Sep 20 04:41:51 falcon network[443]: Started network profile 'hispeednet'
Sep 20 04:41:51 falcon network[441]: Starting network profile 'wolfnet'...
Sep 20 04:41:51 falcon network[441]: Started network profile 'wolfnet'
Sep 20 04:41:51 falcon kernel: forcedeth 0000:00:07.0: irq 41 for MSI/MSI-X
Sep 20 04:41:51 falcon kernel: forcedeth 0000:00:07.0 net0: MSI enabled
Sep 20 04:41:51 falcon kernel: IPv6: ADDRCONF(NETDEV_UP): internet0: link is not ready
Sep 20 04:41:51 falcon kernel: e100 0000:01:07.0 internet0: NIC Link is Up 100 Mbps Full Duplex
Sep 20 04:41:51 falcon kernel: IPv6: ADDRCONF(NETDEV_CHANGE): internet0: link becomes ready
Sep 20 04:41:51 falcon smartd[442]: Device: /dev/sda [SAT], is SMART capable. Adding to "monitor" list.
Sep 20 04:41:51 falcon smartd[442]: Device: /dev/sdb, type changed from 'scsi' to 'sat'
Sep 20 04:41:51 falcon smartd[442]: Device: /dev/sdb [SAT], opened
Sep 20 04:41:52 falcon smartd[442]: Device: /dev/sdb [SAT], ST3250823AS, S/N:5ND0MS6K, FW:3.03, 250 GB
Sep 20 04:41:52 falcon smartd[442]: Device: /dev/sdb [SAT], found in smartd database: Seagate Barracuda 7200.8
Sep 20 04:41:52 falcon smartd[442]: Device: /dev/sdb [SAT], is SMART capable. Adding to "monitor" list.
Sep 20 04:41:52 falcon smartd[442]: Device: /dev/sdc, type changed from 'scsi' to 'sat'
Sep 20 04:41:52 falcon smartd[442]: Device: /dev/sdc [SAT], opened
Sep 20 04:41:52 falcon smartd[442]: Device: /dev/sdc [SAT], WDC WD5000AADS-00S9B0, S/N:WD-WCAV93917591, WWN:5-0014ee-1ad3cc907, FW:01.00A01, 500 GB
Sep 20 04:41:52 falcon smartd[442]: Device: /dev/sdc [SAT], found in smartd database: Western Digital Caviar Green
Sep 20 04:41:53 falcon smartd[442]: Device: /dev/sdc [SAT], is SMART capable. Adding to "monitor" list.
Sep 20 04:41:53 falcon smartd[442]: Device: /dev/sdd, type changed from 'scsi' to 'sat'
Sep 20 04:41:53 falcon smartd[442]: Device: /dev/sdd [SAT], opened
Sep 20 04:41:53 falcon smartd[442]: Device: /dev/sdd [SAT], ST3750640AS, S/N:5QD03NB9, FW:3.AAE, 750 GB
Sep 20 04:41:53 falcon smartd[442]: Device: /dev/sdd [SAT], found in smartd database: Seagate Barracuda 7200.10
Sep 20 04:41:53 falcon smartd[442]: Device: /dev/sdd [SAT], is SMART capable. Adding to "monitor" list.
Sep 20 04:41:53 falcon smartd[442]: Device: /dev/sde, type changed from 'scsi' to 'sat'
Sep 20 04:41:53 falcon smartd[442]: Device: /dev/sde [SAT], opened
Sep 20 04:41:53 falcon smartd[442]: Device: /dev/sde [SAT], ST3750640AS, S/N:3QD0AD6E, FW:3.AAE, 750 GB
Sep 20 04:41:53 falcon smartd[442]: Device: /dev/sde [SAT], found in smartd database: Seagate Barracuda 7200.10
Sep 20 04:41:53 falcon smartd[442]: Device: /dev/sde [SAT], is SMART capable. Adding to "monitor" list.
Sep 20 04:41:53 falcon smartd[442]: Monitoring 5 ATA and 0 SCSI devices
Sep 20 04:41:54 falcon smartd[442]: Device: /dev/sdc [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 106 to 105
Sep 20 04:41:54 falcon smartd[442]: Device: /dev/sdd [SAT], 79 Currently unreadable (pending) sectors
Sep 20 04:41:54 falcon smartd[442]: Device: /dev/sdd [SAT], 79 Offline uncorrectable sectors
Sep 20 04:41:55 falcon sshd[513]: Server listening on :: port 22.
Sep 20 04:41:55 falcon sshd[513]: Server listening on 0.0.0.0 port 22.
Sep 20 04:41:56 falcon mysqld[451]: 130920  4:41:56 InnoDB: The InnoDB memory heap is disabled
Sep 20 04:41:56 falcon mysqld[451]: 130920  4:41:56 InnoDB: Mutexes and rw_locks use GCC atomic builtins
Sep 20 04:41:56 falcon mysqld[451]: 130920  4:41:56 InnoDB: Compressed tables use zlib 1.2.7
Sep 20 04:41:56 falcon mysqld[451]: 130920  4:41:56 InnoDB: Initializing buffer pool, size = 128.0M
Sep 20 04:41:56 falcon mysqld[451]: 130920  4:41:56 InnoDB: Completed initialization of buffer pool
Sep 20 04:41:57 falcon mysqld[451]: 130920  4:41:57 InnoDB: highest supported file format is Barracuda.
Sep 20 04:41:57 falcon mysqld[451]: InnoDB: The log sequence number in ibdata files does not match
Sep 20 04:41:57 falcon mysqld[451]: InnoDB: the log sequence number in the ib_logfiles!
Sep 20 04:41:57 falcon mysqld[451]: 130920  4:41:57  InnoDB: Database was not shut down normally!
Sep 20 04:41:57 falcon mysqld[451]: InnoDB: Starting crash recovery.
Sep 20 04:41:57 falcon mysqld[451]: InnoDB: Reading tablespace information from the .ibd files...
Sep 20 04:41:58 falcon mysqld[451]: InnoDB: Restoring possible half-written data pages from the doublewrite
Sep 20 04:41:58 falcon mysqld[451]: InnoDB: buffer...
Sep 20 04:41:59 falcon systemd[1]: Started Samba NetBIOS name server.
Sep 20 04:41:59 falcon systemd[1]: Starting Samba Winbind daemon...
Sep 20 04:42:00 falcon systemd[1]: PID file /run/httpd/httpd.pid not readable (yet?) after start.
Sep 20 04:42:01 falcon systemd[1]: PID file /var/run/winbindd.pid not readable (yet?) after start.
Sep 20 04:42:01 falcon systemd[1]: Started Samba Winbind daemon.
Sep 20 04:42:01 falcon systemd[1]: Starting Samba SMB/CIFS server...
Sep 20 04:42:01 falcon winbindd[604]: [2013/09/20 04:42:01.175084,  0] ../source3/winbindd/winbindd_cache.c:3179(initialize_winbindd_cache)
Sep 20 04:42:01 falcon winbindd[604]: initialize_winbindd_cache: clearing cache and re-creating with version number 2
Sep 20 04:42:02 falcon systemd[1]: Started Samba SMB/CIFS server.
Sep 20 04:42:04 falcon systemd[1]: Started Apache Web Server.
Sep 20 04:42:06 falcon mysqld[451]: InnoDB: Last MySQL binlog file position 0 405, file name ./mysql-bin.000089
Sep 20 04:42:06 falcon mysqld[451]: 130920  4:42:06  InnoDB: Waiting for the background threads to start
Sep 20 04:42:06 falcon dbus-daemon[454]: dbus[454]: [system] Activating via systemd: service name='org.freedesktop.Avahi' unit='dbus-org.freedesktop.Avahi.service'
Sep 20 04:42:06 falcon dbus[454]: [system] Activating via systemd: service name='org.freedesktop.Avahi' unit='dbus-org.freedesktop.Avahi.service'
Sep 20 04:42:06 falcon dbus-daemon[454]: dbus[454]: [system] Activation via systemd failed for unit 'dbus-org.freedesktop.Avahi.service': Unit dbus-org.freedesktop.Avahi.service failed to load: No such file or directory. See system logs
Sep 20 04:42:06 falcon dbus[454]: [system] Activation via systemd failed for unit 'dbus-org.freedesktop.Avahi.service': Unit dbus-org.freedesktop.Avahi.service failed to load: No such file or directory. See system logs and 'systemctl sta
Sep 20 04:42:07 falcon mysqld[451]: 130920  4:42:07 InnoDB: 5.5.30 started; log sequence number 39996170
Sep 20 04:42:07 falcon mysqld[451]: 130920  4:42:07 [Note] Recovering after a crash using mysql-bin
Sep 20 04:42:07 falcon mysqld[451]: 130920  4:42:07 [Note] Starting crash recovery...
Sep 20 04:42:07 falcon mysqld[451]: 130920  4:42:07 [Note] Crash recovery finished.
Sep 20 04:42:07 falcon mysqld[451]: 11:42:07 UTC - mysqld got signal 11 ;
Sep 20 04:42:07 falcon mysqld[451]: This could be because you hit a bug. It is also possible that this binary
Sep 20 04:42:07 falcon mysqld[451]: or one of the libraries it was linked against is corrupt, improperly built,
Sep 20 04:42:07 falcon mysqld[451]: or misconfigured. This error can also be caused by malfunctioning hardware.
Sep 20 04:42:07 falcon mysqld[451]: We will try our best to scrape up some info that will hopefully help
Sep 20 04:42:07 falcon mysqld[451]: diagnose the problem, but since we have already crashed,
Sep 20 04:42:07 falcon mysqld[451]: something is definitely wrong and this may fail.
Sep 20 04:42:07 falcon mysqld[451]: key_buffer_size=16777216
Sep 20 04:42:07 falcon mysqld[451]: read_buffer_size=262144
Sep 20 04:42:07 falcon mysqld[451]: max_used_connections=0
Sep 20 04:42:07 falcon mysqld[451]: max_threads=151
Sep 20 04:42:07 falcon mysqld[451]: thread_count=0
Sep 20 04:42:07 falcon mysqld[451]: connection_count=0
Sep 20 04:42:07 falcon mysqld[451]: It is possible that mysqld could use up to
Sep 20 04:42:07 falcon mysqld[451]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 134074 K  bytes of memory
Sep 20 04:42:07 falcon mysqld[451]: Hope that's ok; if not, decrease some variables in the equation.
Sep 20 04:42:07 falcon mysqld[451]: Thread pointer: 0x0
Sep 20 04:42:07 falcon mysqld[451]: Attempting backtrace. You can use the following information to find out
Sep 20 04:42:07 falcon mysqld[451]: where mysqld died. If you see no messages after this, something went
Sep 20 04:42:07 falcon mysqld[451]: terribly wrong...
Sep 20 04:42:07 falcon mysqld[451]: stack_bottom = 0 thread_stack 0x40000
Sep 20 04:42:07 falcon mysqld[451]: /usr/bin/mysqld(my_print_stacktrace+0x29)[0x78ae99]
Sep 20 04:42:07 falcon mysqld[451]: /usr/bin/mysqld(handle_fatal_signal+0x471)[0x67ad71]
Sep 20 04:42:07 falcon mysqld[451]: /usr/lib/libpthread.so.0(+0xf870)[0x7fc7476e1870]
Sep 20 04:42:07 falcon mysqld[451]: The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
Sep 20 04:42:07 falcon mysqld[451]: information that should help you find out what is causing the crash.
Sep 20 04:42:07 falcon systemd[1]: mysqld.service: main process exited, code=exited, status=1/FAILURE

fstab is

# blkid
#/dev/sda: LABEL="pool" UUID="2882362132597796888" UUID_SUB="13904923197739712060" TYPE="zfs_member"
#/dev/sdb1: UUID="cfcb22da-6f8c-40f6-85bc-658446d68496" TYPE="ext2"
#/dev/sdb2: UUID="47279c0b-a92f-43c7-90c6-62f1cee8a154" TYPE="swap"
#/dev/sdb3: UUID="31bc96fe-23e2-4585-8b04-8f95d84aa78b" TYPE="ext3"
#/dev/sdb4: UUID="63e4be21-1d4d-493f-8b39-5c5abe5faf06" TYPE="ext3"
#/dev/sdc: LABEL="pool" UUID="2882362132597796888" UUID_SUB="17298518391707716875" TYPE="zfs_member"
#/dev/sdd: LABEL="pool" UUID="2882362132597796888" UUID_SUB="13741145581555081656" TYPE="zfs_member"
#/dev/sde: LABEL="pool" UUID="2882362132597796888" UUID_SUB="14112044456289249974" TYPE="zfs_member"
tmpfs                                           /tmp            tmpfs   nodev,nosuid            0       0
UUID=63e4be21-1d4d-493f-8b39-5c5abe5faf06       /home           ext3    defaults,data=ordered   0       1
UUID=31bc96fe-23e2-4585-8b04-8f95d84aa78b       /               ext3    defaults                0       1
UUID=47279c0b-a92f-43c7-90c6-62f1cee8a154       swap            swap    defaults                0       0
UUID=cfcb22da-6f8c-40f6-85bc-658446d68496       /boot           ext2    defaults                0       1

Last edited by wolfdogg (2013-09-20 19:41:40)


Node.js, PHP Software Architect and Engineer (Full-Stack/DevOps)
GitHub  | LinkedIn

Offline

#8 2013-09-20 20:48:19

WorMzy
Forum Moderator
From: Scotland
Registered: 2010-06-16
Posts: 11,784
Website

Re: /dev/sdc errors, is my hard drive failing

If your root device couldn't be mounted, you would be dropped to the initrd shell. I think you're seeing the !rw warning that tells you that your root partiton may be fscked twice. It's not something to lose sleep over, but you may want to make the recommended changes to your bootloader.

Post the output of 'smartctl -a /dev/sdc'. I'm not sure what -H is meant to show, but your HDD's airflow temperature isn't all that interesting, even if it is in the past.


Sakura:-
Mobo: MSI MAG X570S TORPEDO MAX // Processor: AMD Ryzen 9 5950X @4.9GHz // GFX: AMD Radeon RX 5700 XT // RAM: 32GB (4x 8GB) Corsair DDR4 (@ 3000MHz) // Storage: 1x 3TB HDD, 6x 1TB SSD, 2x 120GB SSD, 1x 275GB M2 SSD

Making lemonade from lemons since 2015.

Offline

#9 2013-09-20 22:11:50

wolfdogg
Member
From: Portland, OR, USA
Registered: 2011-05-21
Posts: 545

Re: /dev/sdc errors, is my hard drive failing

yeah it might have been, i couldnt read it in time, and after that mesage i saw it was running fsck, but that may be just because i had to reset it.

# smartctl -a /dev/sdc | less
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.11.1-1-ARCH] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green
Device Model:     WDC WD5000AADS-00S9B0
Serial Number:    WD-WCAV93917591
LU WWN Device Id: 5 0014ee 1ad3cc907
Firmware Version: 01.00A01
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Fri Sep 20 07:17:05 2013 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                                        was suspended by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (11580) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 136) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x303f) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   145   144   021    Pre-fail  Always       -       3741
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       383
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   064   064   000    Old_age   Always       -       26592
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       347
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       271
193 Load_Cycle_Count        0x0032   131   131   000    Old_age   Always       -       209740
194 Temperature_Celsius     0x0022   093   087   000    Old_age   Always       -       50
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     26573         -
# 2  Short offline       Completed without error       00%     25123         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

looking again at journalctl, i see the same errors on /dev/sdd, they are both the same model drives, and werre purchased at the same time

Sep 20 09:36:12 falcon smartd[1192]: Device: /dev/sdd [SAT], 79 Currently unreadable (pending) sectors
Sep 20 09:36:12 falcon smartd[1192]: Device: /dev/sdd [SAT], 79 Offline uncorrectable sectors
Sep 20 09:36:12 falcon smartd[1192]: Device: /dev/sdd [SAT], Failed SMART usage Attribute: 190 Airflow_Temperature_Cel.

and looking further, i see problems in sdd,FAILING_NOW in the temperature section

# smartctl -a /dev/sdd | less
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.11.1-1-ARCH] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.10
Device Model:     ST3750640AS
Serial Number:    5QD03NB9
Firmware Version: 3.AAE
User Capacity:    750,155,292,160 bytes [750 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA/ATAPI-7 (minor revision not indicated)
Local Time is:    Fri Sep 20 09:52:20 2013 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 244) Self-test routine in progress...
                                        40% of test remaining.
Total time to complete Offline
data collection:                (  430) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 202) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   101   074   006    Pre-fail  Always       -       122060634
  3 Spin_Up_Time            0x0003   094   093   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   099   099   020    Old_age   Always       -       1476
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   072   060   030    Pre-fail  Always       -       103517819818
  9 Power_On_Hours          0x0032   073   073   000    Old_age   Always       -       24002
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       373
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       112
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   036   030   045    Old_age   Always   FAILING_NOW 64 (18 189 65 44 0)
194 Temperature_Celsius     0x0022   064   070   000    Old_age   Always       -       64 (0 13 0 0 0)
195 Hardware_ECC_Recovered  0x001a   061   046   000    Old_age   Always       -       196915025
197 Current_Pending_Sector  0x0012   097   095   000    Old_age   Always       -       79
198 Offline_Uncorrectable   0x0010   097   095   000    Old_age   Offline      -       79
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       1
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
202 Data_Address_Mark_Errs  0x0032   100   253   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 167 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 167 occurred at disk power-on lifetime: 10255 hours (427 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  10 51 01 ae 5e 54 e7

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  37 ff 01 ae 5e 54 e7 00      00:23:22.395  SET NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  27 ff 00 ae 5e 54 e0 00      00:23:22.330  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  37 ff 00 ae 5e 54 e7 00      00:23:20.251  SET NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  27 ff 01 ae 5e 54 e0 00      00:23:20.191  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  37 ff 01 ae 5e 54 e7 00      00:23:19.967  SET NATIVE MAX ADDRESS EXT [OBS-ACS-3]

Error 166 occurred at disk power-on lifetime: 10255 hours (427 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  10 51 01 ae 5e 54 e7

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  37 ff 01 ae 5e 54 e7 00      00:23:17.311  SET NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  27 ff 00 ae 5e 54 e0 00      00:23:17.311  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  37 ff 00 ae 5e 54 e7 00      00:23:20.251  SET NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  27 ff 01 ef 66 54 e0 00      00:23:20.191  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  29 00 01 ef 66 54 e0 00      00:23:19.967  READ MULTIPLE EXT

Error 165 occurred at disk power-on lifetime: 10255 hours (427 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  10 51 01 ae 5e 54 e7

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  37 ff 01 ae 5e 54 e7 00      00:16:21.310  SET NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  27 ff 00 ae 5e 54 e0 00      00:16:21.246  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  37 ff 00 ae 5e 54 e7 00      00:16:19.166  SET NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  27 ff 01 ae 5e 54 e0 00      00:16:19.107  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  37 ff 01 ae 5e 54 e7 00      00:16:18.882  SET NATIVE MAX ADDRESS EXT [OBS-ACS-3]

Error 164 occurred at disk power-on lifetime: 10255 hours (427 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  10 51 01 ae 5e 54 e7

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  37 ff 01 ae 5e 54 e7 00      00:16:16.220  SET NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  27 ff 00 ae 5e 54 e0 00      00:16:16.220  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  37 ff 00 ae 5e 54 e7 00      00:16:19.166  SET NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  27 ff 01 ef 66 54 e0 00      00:16:19.107  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  29 00 01 ef 66 54 e0 00      00:16:18.882  READ MULTIPLE EXT

Error 163 occurred at disk power-on lifetime: 10255 hours (427 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  10 51 01 ae 5e 54 e7

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  37 ff 01 ae 5e 54 e7 00      00:06:05.586  SET NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  27 ff 00 ae 5e 54 e0 00      00:06:05.586  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  37 ff 00 ae 5e 54 e7 00      00:06:10.967  SET NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  27 ff 01 ae 5e 54 e0 00      00:06:10.907  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  37 ff 01 ae 5e 54 e7 00      00:06:10.683  SET NATIVE MAX ADDRESS EXT [OBS-ACS-3]

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     23984         -
# 2  Short offline       Completed without error       00%     23983         -
# 3  Short offline       Completed without error       00%     22708         -
# 4  Extended offline    Self-test routine in progress 40%     24002         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

it looks like this drive may be causing problems. 

what i dont understand, is if devs sdc, and sdd are only members of the zfs pool, which is not the system drive, im wondering if this could even be the cause of the system locking up at all.. i do believe it could be the cause of the backup failing however.  but i dont think it could cause the system to lockup on me, since the system drive is "sdb".

a smartmontools long test is underway right now on all drives.

Last edited by wolfdogg (2013-09-21 00:50:05)


Node.js, PHP Software Architect and Engineer (Full-Stack/DevOps)
GitHub  | LinkedIn

Offline

#10 2013-09-21 03:51:14

Pse
Member
Registered: 2008-03-15
Posts: 413

Re: /dev/sdc errors, is my hard drive failing

/dev/sdd is certainly failing in your setup. The error rates are way too high. Plus there are unreadable sectors. I'd take my data off that drive ASAP and put it to rest. Also note that the load cycle count for /dev/sdc is getting high. 200k+ is a lot. There probably is some power saving feature kicking in and loading/unloading the heads a lot in your setup. After 24000+ hours, it's easy to reach 200k+ if that's the case.

Edit: typo.

Last edited by Pse (2013-09-21 21:24:32)

Offline

#11 2013-09-21 07:05:11

WorMzy
Forum Moderator
From: Scotland
Registered: 2010-06-16
Posts: 11,784
Website

Re: /dev/sdc errors, is my hard drive failing

I have to agree with Pse's assessment there, sdd is not a healthy disk. You should replace it asap.


Sakura:-
Mobo: MSI MAG X570S TORPEDO MAX // Processor: AMD Ryzen 9 5950X @4.9GHz // GFX: AMD Radeon RX 5700 XT // RAM: 32GB (4x 8GB) Corsair DDR4 (@ 3000MHz) // Storage: 1x 3TB HDD, 6x 1TB SSD, 2x 120GB SSD, 1x 275GB M2 SSD

Making lemonade from lemons since 2015.

Offline

#12 2013-09-21 09:51:22

mich41
Member
Registered: 2012-06-22
Posts: 796

Re: /dev/sdc errors, is my hard drive failing

and looking further, i see problems in sdd,FAILING_NOW in the temperature section

Sure you do. What do you think it means?

Offline

#13 2013-09-21 18:04:58

t0m5k1
Member
From: overthere
Registered: 2012-02-10
Posts: 324

Re: /dev/sdc errors, is my hard drive failing

if that was my drive I would:
connect a working drive with enough free space to house my data
boot up hirens boot cd (or similar)
use one of the many tools to copy/backup the data on /dev/sdc to the newly added drive
reboot into hirens boot cd
low level format the failing drive using a tool from the manufacturer (many of these are present on hirens boot cd).

I would low level format to reallocate the bad sectors, this will result in a small loss of space but will give you a workable drive until you get a new one

In laymans terms:
s.m.a.r.t. errors are not related to Filesystem errors smart is your drive talking to you & smartctl give you a way to talk directly to the drive
Low level format will access, test, blank/fill with 0 all sectors on the drive & if it encounters a bad unreadable sector re address the bad sectors to be at the very end of the drive where they will be flagged & ignored.
The end result is a minor speed drop & less space due to bad sectors but essentially as you are also seeing temprature errors as well this is a big indicator of old age.
If you cannot afford a new drive as of yet & need to save up this will give you some time to save up but if you have funds now go get a new drive & copy the data to it & skip the low level format as this can take ages to finish.

good luck

Last edited by t0m5k1 (2013-09-21 18:06:21)


ROG Strix (GD30CI) - Intel Core i5-7400 CPU - 32Gb 2400Mhz - GTX1070 8GB - AwesomeWM (occasionally XFCE, i3)

If everything in life was easy, we would learn nothing!
Linux User: 401820  Steam-HearThis.at-Last FM-Reddit

Offline

#14 2013-09-22 04:52:09

wolfdogg
Member
From: Portland, OR, USA
Registered: 2011-05-21
Posts: 545

Re: /dev/sdc errors, is my hard drive failing

Thanks for the great responses, well most of them are great, not so sure how to take @mich41's yet, lol.   

Well the data is not in jeopardy so i dont really need to worry about offloading it, unless were talking about a problem with /dev/sda, in which case my arch installation would get hosed.  As far as the other 4 drives I think i mentioned already that the zfs filesystem (sdb,sdc,sdd,sde)is just an incremental backup array of which the original files are still in tact opn their own drives, so im going to simply replace 1 or 2, or 3 bad drives with a single 2TB, recreate the array, and just do another backup onto them. I am just trying to make sure i understand smartmontools output and get some opinions which i definitely appreciate. 

Thanks for the clarifications.  Yeah, now that i know its just the drive talking to me, i was hoping to get some experience on interpreting it.  I was kind of hoping that someone would say that the one that has bad sectors (first sdc, now sdd too) i could just do what @t0m5k1 just suggested, to low level format to redflag those sectors and get on with it, but was thinking to just watch carefully that no new bad sectors appear any time soon.  But im hearing otherwise, the obvious, that they actually need replacing as soon as possible.     

All the advice is well taken...

So continuing on.....

The drives that i havent posted yet are the two other ZFS array drives (sda,sde), and the system drive(sdb) which i still havent figured out from any comments if is having a bad day or not causing the system lockups or if its something else. 

ill post those 3 here here and would appreciate any advice on those since i think there's alot more going on here than im able to see apparently. 

sda (system drive)

# smartctl -a /dev/sda | less
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.11.1-1-ARCH] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.8
Device Model:     ST3250823AS
Serial Number:    5ND0MS6K
Firmware Version: 3.03
User Capacity:    250,058,268,160 bytes [250 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA/ATAPI-7 (minor revision not indicated)
Local Time is:    Sat Sep 21 13:54:03 2013 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  430) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (  84) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   054   049   006    Pre-fail  Always       -       159666020
  3 Spin_Up_Time            0x0003   098   098   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       891
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   088   060   030    Pre-fail  Always       -       761102208
  9 Power_On_Hours          0x0032   055   055   000    Old_age   Always       -       39641
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       224
194 Temperature_Celsius     0x0022   043   054   000    Old_age   Always       -       43 (0 12 0 0 0)
195 Hardware_ECC_Recovered  0x001a   054   049   000    Old_age   Always       -       159666020
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
202 Data_Address_Mark_Errs  0x0032   100   253   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 154 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
 FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 154 occurred at disk power-on lifetime: 39614 hours (1650 days + 14 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  10 51 01 2e 51 1c ed

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  37 ff 01 2e 51 1c ed 00      00:21:57.306  SET NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  27 ff 00 2e 51 1c e0 00      00:21:57.306  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  37 ff 00 2e 51 1c ed 00      00:21:57.180  SET NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  27 ff 01 2e 51 1c e0 00      00:21:57.180  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  37 ff 01 2e 51 1c ed 00      00:21:57.009  SET NATIVE MAX ADDRESS EXT [OBS-ACS-3]

Error 153 occurred at disk power-on lifetime: 39614 hours (1650 days + 14 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  10 51 01 2e 51 1c ed

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  37 ff 01 2e 51 1c ed 00      00:21:57.306  SET NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  27 ff 00 2e 51 1c e0 00      00:21:57.306  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  37 ff 00 2e 51 1c ed 00      00:21:57.180  SET NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  27 ff 01 6f 59 1c e0 00      00:21:57.180  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  29 00 01 6f 59 1c e0 00      00:21:57.009  READ MULTIPLE EXT

Error 152 occurred at disk power-on lifetime: 39609 hours (1650 days + 9 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  10 51 01 2e 51 1c ed

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  37 ff 01 2e 51 1c ed 00   1d+08:10:08.268  SET NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  27 ff 00 2e 51 1c e0 00   1d+08:10:08.216  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  37 ff 00 2e 51 1c ed 00   1d+08:10:08.215  SET NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  27 ff 01 2e 51 1c e0 00   1d+08:10:08.215  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  37 ff 01 2e 51 1c ed 00   1d+08:10:22.475  SET NATIVE MAX ADDRESS EXT [OBS-ACS-3]

Error 151 occurred at disk power-on lifetime: 39609 hours (1650 days + 9 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
 -- -- -- -- -- -- --
  10 51 01 2e 51 1c ed

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  37 ff 01 2e 51 1c ed 00   1d+08:10:08.268  SET NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  27 ff 00 2e 51 1c e0 00   1d+08:10:08.216  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  37 ff 00 2e 51 1c ed 00   1d+08:10:08.215  SET NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  27 ff 01 6f 59 1c e0 00   1d+08:10:08.215  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  29 00 01 6f 59 1c e0 00   1d+08:10:08.215  READ MULTIPLE EXT

Error 150 occurred at disk power-on lifetime: 39587 hours (1649 days + 11 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  10 51 01 2e 51 1c ed

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  37 ff 01 2e 51 1c ed 00      09:18:20.811  SET NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  27 ff 00 2e 51 1c e0 00      09:18:20.811  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  37 ff 00 2e 51 1c ed 00      09:18:20.685  SET NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  27 ff 01 2e 51 1c e0 00      09:18:20.685  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  37 ff 01 2e 51 1c ed 00      09:18:18.548  SET NATIVE MAX ADDRESS EXT [OBS-ACS-3]

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     39635         -
# 2  Extended offline    Completed without error       00%     39618         -
# 3  Extended offline    Completed without error       00%     39614         -
# 4  Short offline       Completed without error       00%     39593         -
# 5  Short offline       Completed without error       00%     38157         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

/dev/sdb  (old array drive)

# smartctl -a /dev/sdb | less
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.11.1-1-ARCH] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green
Device Model:     WDC WD5000AADS-00S9B0
Serial Number:    WD-WCAV93917591
LU WWN Device Id: 5 0014ee 1ad3cc907
Firmware Version: 01.00A01
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Sat Sep 21 13:56:18 2013 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (11580) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 136) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x303f) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   145   144   021    Pre-fail  Always       -       3741
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       383
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   064   064   000    Old_age   Always       -       26622
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       347
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       271
193 Load_Cycle_Count        0x0032   131   131   000    Old_age   Always       -       209806
194 Temperature_Celsius     0x0022   104   087   000    Old_age   Always       -       39
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     26616         -
# 2  Extended offline    Completed without error       00%     26599         -
# 3  Extended offline    Interrupted (host reset)      10%     26595         -
# 4  Short offline       Completed without error       00%     26573         -
# 5  Short offline       Completed without error       00%     25123         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

/dev/sde (old array drive)

# smartctl -a /dev/sde | less
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.11.1-1-ARCH] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.10
Device Model:     ST3750640AS
Serial Number:    3QD0AD6E
Firmware Version: 3.AAE
User Capacity:    750,155,292,160 bytes [750 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA/ATAPI-7 (minor revision not indicated)
Local Time is:    Sat Sep 21 13:57:27 2013 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  430) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 202) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   097   077   006    Pre-fail  Always       -       42652052
  3 Spin_Up_Time            0x0003   094   093   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   099   099   020    Old_age   Always       -       1417
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   084   060   030    Pre-fail  Always       -       265767243
  9 Power_On_Hours          0x0032   074   074   000    Old_age   Always       -       23088
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       357
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   052   032   045    Old_age   Always   In_the_past 48 (13 118 68 45 0)
194 Temperature_Celsius     0x0022   048   068   000    Old_age   Always       -       48 (0 14 0 0 0)
195 Hardware_ECC_Recovered  0x001a   054   047   000    Old_age   Always       -       7441529
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       1
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
202 Data_Address_Mark_Errs  0x0032   100   253   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 5
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 5 occurred at disk power-on lifetime: 4986 hours (207 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  10 51 01 ae 5e 54 e7

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  37 ff 01 ae 5e 54 e7 00      02:05:42.264  SET NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  27 ff 00 ae 5e 54 e0 00      02:05:40.179  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  37 ff 00 ae 5e 54 e7 00      02:05:40.120  SET NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  27 ff 01 ae 5e 54 e0 00      02:05:39.895  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  37 ff 01 ae 5e 54 e7 00      02:05:39.822  SET NATIVE MAX ADDRESS EXT [OBS-ACS-3]

Error 4 occurred at disk power-on lifetime: 4986 hours (207 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  10 51 01 ae 5e 54 e7

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  37 ff 01 ae 5e 54 e7 00      02:05:26.187  SET NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  27 ff 00 ae 5e 54 e0 00      02:05:40.179  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  37 ff 00 ae 5e 54 e7 00      02:05:40.120  SET NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  27 ff 01 ef 66 54 e0 00      02:05:39.895  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  29 00 01 ef 66 54 e0 00      02:05:39.822  READ MULTIPLE EXT

Error 3 occurred at disk power-on lifetime: 4986 hours (207 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  10 51 01 ae 5e 54 e7

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  37 ff 01 ae 5e 54 e7 00      01:39:10.418  SET NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  27 ff 00 ae 5e 54 e0 00      01:39:20.356  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  37 ff 00 ae 5e 54 e7 00      01:39:20.296  SET NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  27 ff 01 ae 5e 54 e0 00      01:39:20.072  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  37 ff 01 ae 5e 54 e7 00      01:39:20.013  SET NATIVE MAX ADDRESS EXT [OBS-ACS-3]

Error 2 occurred at disk power-on lifetime: 4986 hours (207 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  10 51 01 ae 5e 54 e7

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  37 ff 01 ae 5e 54 e7 00      01:39:10.418  SET NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  27 ff 00 ae 5e 54 e0 00      01:39:10.417  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  37 ff 00 ae 5e 54 e7 00      01:39:09.976  SET NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  27 ff 01 ef 66 54 e0 00      01:39:09.976  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  29 00 01 ef 66 54 e0 00      01:39:09.976  READ MULTIPLE EXT

Error 1 occurred at disk power-on lifetime: 4986 hours (207 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  10 51 01 ae 5e 54 e7

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  37 ff 01 ae 5e 54 e7 00      01:36:24.332  SET NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  27 ff 00 ae 5e 54 e0 00      01:36:24.332  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  37 ff 00 ae 5e 54 e7 00      01:36:24.332  SET NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  27 ff 01 ef 66 54 e0 00      01:36:24.331  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  29 00 01 ef 66 54 e0 00      01:36:24.331  READ MULTIPLE EXT

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     23082         -
# 2  Extended offline    Completed without error       00%     23067         -
# 3  Extended offline    Interrupted (host reset)      40%     23063         -
# 4  Short offline       Completed without error       00%     23044         -
# 5  Short offline       Completed without error       00%     21768         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Last edited by wolfdogg (2013-09-22 05:28:20)


Node.js, PHP Software Architect and Engineer (Full-Stack/DevOps)
GitHub  | LinkedIn

Offline

#15 2013-09-24 21:25:41

wolfdogg
Member
From: Portland, OR, USA
Registered: 2011-05-21
Posts: 545

Re: /dev/sdc errors, is my hard drive failing

ok so if i edit the subject to "bad array" instead of "bad /dev/sdc" would that prompt more replies?  lol. 

the drive was reported at 15% health, by one of the tools, not sure which tools, but its the STS3750 Serial 5QD03NB9 so it looks like its definitely /dev/sdd.  I want to work on this one first to see if its salvagable, then i plan to follow this same procedure on the other 3 drives in the ZFS Pool. 

So im currently stuck.  I have downloaded hirens, and ran HDAT on all drives, and it reports no bad sectors on the drive test.  Then i tried the file system test, and this took two days just to get across 10 percent of the drive, and i already had my doubts that this was the proper way before i cancelled it, since i dont care about the files at this point, i just want to reformat it.  Then i tried to run seagate disk wizard through hirens and it couldnt get past the 'looking for network' option, which is absurd.  then i tried seatools in hirens (text mode),   this didnt have a format option, leading me to doubt that it will flag bad sectors, which is what i want. It only had a 'clean' option, and zero out option i think.  Before this i tried linux rescue environment on hirens, (parted magic) and again, i didnt see anything that guaranteed that it would replace sectors either, it just createdd partitions.  Can i trust that this will format and look for bad sectors, or do i have to use the 'check' feature in there to check sectors?

I have all the tools, can somebody recommend me how i can have a tool scan the drive and repair bad sectors doing a low level format???
 
In hirens the "Hard Disk Tools" i havent tried are;
DReviatlize, ViVard, Hard Disk Sentinel, Smartudm, MHDD, Victoria, hdd erase, dariks boot and nuke, copywipe, active kill disk, wdclear, wd lifeguard, maxtor powermax, fujitsu add diag tools, samsung?, imb hitache drive fitness, gateway gwscan, excelstors estest, thshiba hd diag, wd wdidle3. 
Can i run any of these HD tools to do the job?

Last edited by wolfdogg (2013-09-24 21:39:25)


Node.js, PHP Software Architect and Engineer (Full-Stack/DevOps)
GitHub  | LinkedIn

Offline

#16 2013-09-27 20:36:19

t0m5k1
Member
From: overthere
Registered: 2012-02-10
Posts: 324

Re: /dev/sdc errors, is my hard drive failing

the "clean" option is what you want for this & the fill with zero's

It may help to read the notes for the seatolls DOS version:
http://origin-www.seagate.com/files/sup … Sguide.pdf

What I did was let is test the drive to see if it picks up the bad sectors (LONG TEST)
this will create a log file in the ramdrive seatools boots from
then after it finishes choose:
Zero ALL
from the ERASE option

these are mentioned in section G of the above linked PDF

Last edited by t0m5k1 (2013-09-27 20:36:50)


ROG Strix (GD30CI) - Intel Core i5-7400 CPU - 32Gb 2400Mhz - GTX1070 8GB - AwesomeWM (occasionally XFCE, i3)

If everything in life was easy, we would learn nothing!
Linux User: 401820  Steam-HearThis.at-Last FM-Reddit

Offline

#17 2013-09-30 03:33:07

wolfdogg
Member
From: Portland, OR, USA
Registered: 2011-05-21
Posts: 545

Re: /dev/sdc errors, is my hard drive failing

thanks for that tip, its what i was looking for, and that documentation pointed out what low level formatting is as well, now i know, thanks.  one problem, i ran the long test, this time using the gui, and it passed, then i ran the erase, which is in 'advanced features' 'full erase', but im not seeing a zero all in there.

do i need to run it from text mode, or do you think thats the same option?

EDIT: ok i ran the text mode, and indeed there is still 4 options on it, the first two, and the last of 4 are the exact same, and the 3rd says the way you stated it, Zero All.  so i have to assume this was the same operation as Full Erase, or whatver it said in GUI mode, maybe it was Full Erase, but it didnt give any indication of Zeroing. 

running Zero all now, that looks like it will take a while.

Last edited by wolfdogg (2013-09-30 04:03:39)


Node.js, PHP Software Architect and Engineer (Full-Stack/DevOps)
GitHub  | LinkedIn

Offline

Board footer

Powered by FluxBB