You are not logged in.
Hi all!
I have just configured the smartd daemon, but I have problems limiting the mail function. I want smartd to send and e-mail (in fact I execute a script instead) only when pre-fail or fail attributes reach the threshold, or the health check fails. Basically, the options -p -f -H, as for my smartd.conf:
/dev/sda -H -p -f -s S/../../7/01 -m root -M exec /usr/local/bin/smartnotify
/dev/sdb -H -p -f -s S/../../7/02 -m root -M exec /usr/local/bin/smartnotify
However, I have a 8 Currently unreadable (pending) sectors warning for /dev/sda coming up that triggers the mail (well, the smartnotify script). Why is that? From my understanding of the smartd.conf manual, the -p option should notify only values of the "Pre-fail" type, while -f should notify if a non "Pre-fail" goes beyond the threshold, not just when it changes (that should be -u, right?). Then why smartd triggers the notification even if the pending sectors (ID: 197) are far from the threshold? Below the (partial) output from smartctl -a:
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 062 Pre-fail Always - 0
2 Throughput_Performance 0x0005 100 100 040 Pre-fail Offline - 0
3 Spin_Up_Time 0x0007 186 186 033 Pre-fail Always - 2
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 797
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 100 100 040 Pre-fail Offline - 0
9 Power_On_Hours 0x0012 091 091 000 Old_age Always - 3955
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 796
191 G-Sense_Error_Rate 0x000a 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 33
193 Load_Cycle_Count 0x0012 045 045 000 Old_age Always - 556861
194 Temperature_Celsius 0x0002 206 206 000 Old_age Always - 29 (Min/Max 14/47)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 8
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0
223 Load_Retry_Count 0x000a 100 100 000 Old_age Always - 0
Thanks a lot for any help, this is driving me crazy!
P.S. I know having bad sectors is not a good thing, so please don't just tell me to change the drive, that's not my question!
Last edited by palmaway (2014-10-12 00:35:06)
Offline
What does `journalctl -u smartd.service` say. When it starts it'll tell you if it parsed the config-file succesfully; for example:
smartd[342]: Opened configuration file /etc/smartd.conf
smartd[342]: Drive: DEVICESCAN, implied '-a' Directive on line 23 of file /etc/smartd.conf
smartd[342]: Configuration file /etc/smartd.conf was parsed, found DEVICESCAN, scanning devices
Offline
The configuration seems to be fine... It says Configuration file /etc/smartd.conf parsed, but it still triggers the notification for some reason:
ott 12 01:40:08 liberty systemd[1]: Starting Self Monitoring and Reporting Technology (SMART) Daemon...
ott 12 01:40:08 liberty systemd[1]: Started Self Monitoring and Reporting Technology (SMART) Daemon.
ott 12 01:40:08 liberty smartd[10336]: smartd 6.3 2014-07-26 r3976 [x86_64-linux-3.16.4-1-ARCH] (local build)
ott 12 01:40:08 liberty smartd[10336]: Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org
ott 12 01:40:08 liberty smartd[10336]: Opened configuration file /etc/smartd.conf
ott 12 01:40:08 liberty smartd[10336]: Configuration file /etc/smartd.conf parsed.
ott 12 01:40:08 liberty smartd[10336]: Device: /dev/sda, type changed from 'scsi' to 'sat'
ott 12 01:40:08 liberty smartd[10336]: Device: /dev/sda [SAT], opened
ott 12 01:40:08 liberty smartd[10336]: Device: /dev/sda [SAT], Hitachi HTS727575A9E364, S/N:J3740084H9K3PE, WWN:5-000cca-68cd26f1b, FW:JF4OA0D0, 750 GB
ott 12 01:40:08 liberty smartd[10336]: Device: /dev/sda [SAT], found in smartd database: Hitachi/HGST Travelstar 7K750
ott 12 01:40:08 liberty smartd[10336]: Device: /dev/sda [SAT], is SMART capable. Adding to "monitor" list.
ott 12 01:40:08 liberty smartd[10336]: Device: /dev/sdb, type changed from 'scsi' to 'sat'
ott 12 01:40:08 liberty smartd[10336]: Device: /dev/sdb [SAT], opened
ott 12 01:40:08 liberty smartd[10336]: Device: /dev/sdb [SAT], SanDisk SSD i100 16GB, S/N:123600107147, WWN:5-001b44-7d2b8f28b, FW:11.56.04, 16.0 GB
ott 12 01:40:08 liberty smartd[10336]: Device: /dev/sdb [SAT], found in smartd database: SanDisk based SSDs
ott 12 01:40:08 liberty smartd[10336]: Device: /dev/sdb [SAT], can't monitor Current_Pending_Sector count - no Attribute 197
ott 12 01:40:08 liberty smartd[10336]: Device: /dev/sdb [SAT], can't monitor Offline_Uncorrectable count - no Attribute 198
ott 12 01:40:08 liberty smartd[10336]: Device: /dev/sdb [SAT], is SMART capable. Adding to "monitor" list.
ott 12 01:40:08 liberty smartd[10336]: Monitoring 2 ATA and 0 SCSI devices
ott 12 01:40:08 liberty smartd[10336]: Device: /dev/sda [SAT], 8 Currently unreadable (pending) sectors
ott 12 01:40:08 liberty smartd[10336]: Sending warning via /usr/local/bin/smartnotify to root ...
ott 12 01:40:08 liberty smartd[10336]: Warning via /usr/local/bin/smartnotify to root: successful
Last edited by palmaway (2014-10-11 23:43:24)
Offline
Just a quick note: I modified the configuration by adding -d ata in order to avoid the type changed from 'scsi' to 'sat' message in the log. No changes in behavior...
Offline
I found out the smartd daemon thinks that attribute 197 is pre-fail, despite the hard drive producer disagreeing (see table above). I wonder why... This means that both -H and -f would trigger the notification. I solved by using
/dev/sda -d ata -H -p -f -C 197+ -s S/../../7/01 -m root -M exec /usr/local/bin/smartnotify
which warns me only if the number of unreadable (pending) sectors increases.
Last edited by palmaway (2014-10-12 00:36:26)
Offline