You are not logged in.
Dear Community,
yesterday when writing a document, saving failed as I was informed that my whole filesystem is read only, which it was not before, looking at journalctl -xe, I concluded that something went wrong with the hard disk.
The next step I took was to reboot the Laptop (T440p), and during boot as it tried to recover the file system journal, I was prompted to manually run fsck, which found some errors but was able to correct them. After that, the computer booted again and I immediately ran a backup of my files.
Thinking that this might have been a one time event, I continued using the machine, but today the same error occurred. This time I took a photo of it which I will attach to the post, no saved version available though because chrome crashed when I tried to open pastebin and I obviously could not write to disk anymore.
I have run lsblk and cat /proc/scsi/scsi after rebooting and running fsck again to see which drive was the failing one, and what I gather from it is that the 1 TB HDD is the culprit.
╭─andreas@Dagger ~
╰─➤ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 931,5G 0 disk
├─sda1 8:1 0 512M 0 part
├─sda2 8:2 0 1G 0 part /boot
└─sda3 8:3 0 930G 0 part
└─bcache0 254:0 0 930G 0 disk
└─lvm 253:0 0 930G 0 crypt
├─DaggerStorage-swapvol 253:1 0 8G 0 lvm [SWAP]
├─DaggerStorage-rootvol 253:2 0 30G 0 lvm /
└─DaggerStorage-homevol 253:3 0 892G 0 lvm /home
sdb 8:16 0 119,2G 0 disk
└─bcache0 254:0 0 930G 0 disk
└─lvm 253:0 0 930G 0 crypt
├─DaggerStorage-swapvol 253:1 0 8G 0 lvm [SWAP]
├─DaggerStorage-rootvol 253:2 0 30G 0 lvm /
└─DaggerStorage-homevol 253:3 0 892G 0 lvm /home
sr0 11:0 1 1024M 0 rom
╭─andreas@Dagger ~
╰─➤ cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: ATA Model: ST1000LM024 HN-M Rev: 0001
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi1 Channel: 00 Id: 00 Lun: 00
Vendor: ATA Model: TS128GMTS400 Rev: 6I
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi5 Channel: 00 Id: 00 Lun: 00
Vendor: HL-DT-ST Model: DVDRAM GU70N Rev: LS20
Type: CD-ROM ANSI SCSI revision: 05
╭─andreas@Dagger ~
╰─➤
Am I right in what I assumed and did until now or should I do something different or when this happens the next time?
How to proceed from this? I ordered a new 1 TB drive, will I be fine just dd'ing the current HDD to the new one or is there something I should be careful about?
Thank you very much for your input and help
Image of journalctl -xe today: http://i.imgur.com/qQyiO9b.jpg
Offline
Post the output of 'smartctl -a /dev/sdX' for both sda and sdb.
R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K
Offline
I obviously could not write to disk
Pendrives come in handy in such cases.
It looks like the disk is simply disappearing for some reason. My first thought was SSD firmware bug but you say it's spinning rust? Well, besides 'smartctl -a' try also 'smartctl -t short' and see what comes out of it. Check the disk's power cable.
Replacement probably will fix it (unless it's motherboard's fault, unlikely IMO).
You can dd one disk onto another provided that they are the exact same size or the new one is larger, otherwise you will wait few hours only to see "write error: out of space".
Offline
Thanks R00KIE and mich41 for your answers, I ran the commands asked by you. smartctl -a /dev/sda:
╭─andreas@Dagger ~
╰─➤ sudo smartctl -a /dev/sda
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.10.4-1-ARCH] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate Samsung SpinPoint M8 (AF)
Device Model: ST1000LM024 HN-M101MBB
Serial Number: <removed>
LU WWN Device Id: <removed>
Firmware Version: 2BA30001
User Capacity: 1.000.204.886.016 bytes [1,00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Form Factor: 2.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 6
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Mon Apr 3 11:48:04 2017 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (12660) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 211) minutes.
SCT capabilities: (0x003f) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 1
2 Throughput_Performance 0x0026 252 252 000 Old_age Always - 0
3 Spin_Up_Time 0x0023 092 090 025 Pre-fail Always - 2459
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 551
5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0
8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 2158
10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 38
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 560
191 G-Sense_Error_Rate 0x0022 100 100 000 Old_age Always - 27
192 Power-Off_Retract_Count 0x0022 100 100 000 Old_age Always - 31
194 Temperature_Celsius 0x0002 064 055 000 Old_age Always - 21 (Min/Max 17/49)
195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0
196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 252 252 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 252 252 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0036 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 5553
223 Load_Retry_Count 0x0032 100 100 000 Old_age Always - 38
225 Load_Cycle_Count 0x0032 078 078 000 Old_age Always - 230509
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Completed [00% left] (0-65535)
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
smartctl -a /dev/sdb:
╭─andreas@Dagger ~
╰─➤ sudo smartctl -a /dev/sdb
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.10.4-1-ARCH] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: TS128GMTS400
Serial Number: <removed>
Firmware Version: N1126I
User Capacity: 128.035.676.160 bytes [128 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Mon Apr 3 11:48:34 2017 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x71) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0002) Does not save SMART data before
entering power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 1) minutes.
Conveyance self-test routine
recommended polling time: ( 1) minutes.
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x0000 100 100 000 Old_age Offline - 0
5 Reallocated_Sector_Ct 0x0000 100 100 000 Old_age Offline - 0
9 Power_On_Hours 0x0000 100 100 000 Old_age Offline - 454
12 Power_Cycle_Count 0x0000 100 100 000 Old_age Offline - 842
160 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 0
161 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 44
163 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 25
164 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 110522
165 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 165
166 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 62
167 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 108
168 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 3000
169 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 97
175 Program_Fail_Count_Chip 0x0000 100 100 000 Old_age Offline - 0
176 Erase_Fail_Count_Chip 0x0000 100 100 000 Old_age Offline - 0
177 Wear_Leveling_Count 0x0000 100 100 050 Old_age Offline - 367
178 Used_Rsvd_Blk_Cnt_Chip 0x0000 100 100 000 Old_age Offline - 0
181 Program_Fail_Cnt_Total 0x0000 100 100 000 Old_age Offline - 0
182 Erase_Fail_Count_Total 0x0000 100 100 000 Old_age Offline - 0
192 Power-Off_Retract_Count 0x0000 100 100 000 Old_age Offline - 47
194 Temperature_Celsius 0x0000 100 100 000 Old_age Offline - 11
195 Hardware_ECC_Recovered 0x0000 100 100 000 Old_age Offline - 64449
196 Reallocated_Event_Count 0x0000 100 100 016 Old_age Offline - 0
197 Current_Pending_Sector 0x0000 100 100 000 Old_age Offline - 0
198 Offline_Uncorrectable 0x0000 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0000 100 100 050 Old_age Offline - 0
232 Available_Reservd_Space 0x0000 100 100 000 Old_age Offline - 100
241 Total_LBAs_Written 0x0000 100 100 000 Old_age Offline - 56400
242 Total_LBAs_Read 0x0000 100 100 000 Old_age Offline - 46096
245 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 442088
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
6 0 65535 Read_scanning was never started
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
self-test of sda:
╭─andreas@Dagger ~
╰─➤ sudo smartctl -t short /dev/sda
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.10.4-1-ARCH] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 2 minutes for test to complete.
Test will complete after Mon Apr 3 11:54:53 2017
Use smartctl -X to abort test.
╭─andreas@Dagger ~
╰─➤ sudo smartctl -l selftest /dev/sda
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.10.4-1-ARCH] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 2158 -
and of sdb:
╭─andreas@Dagger ~
╰─➤ sudo smartctl -t short /dev/sdb
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.10.4-1-ARCH] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 1 minutes for test to complete.
Test will complete after Mon Apr 3 11:56:48 2017
Use smartctl -X to abort test.
╭─andreas@Dagger ~
╰─➤ sudo smartctl -l selftest /dev/sdb
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.10.4-1-ARCH] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 198 -
I cannot really check its power cable since the disk is held in a tiny frame to a SATA and Power connector, but that was still screwed in place.
Since you mention faulty firmware mitch41, I did upgrade the Firmware of the Laptop, but that was on March 25th and it worked after that.
Also it sometimes does not wake from sleep, but restart by holding down power has worked.
The logs never showed any reason for the crash after restart (which now makes me think about whether this may be caused by the disappearing HDD, but probably unlikely too.)
Last edited by AndreasGB (2017-04-03 15:11:53)
Offline
From your 'smartctl -a' output I'd say both disk are not going to fail right away but you have to do a long test, short smart tests do not scan the whole disk surface and can miss some problems.
Your sdb is an SSD, so that should be quite fast to do the long test, your sda is another matter. I have no experience with Toshiba SSDs (which seems to be the brand of your SSD) but so far I've had only bad experiences with Samsung hard disks (even if it says Seagate on the sticker it seems it was bought from Samsung - I've lost track of who bought who so I suppose Seagate may have bough Samsung's spinning rust division and is selling some left over Samsung designs as Seagate's).
My advice is, backup all important data. Then issue a long test to your sda, after that finishes get all the smart parameters with 'smartctl -a'. Then for good measure issue an offline test and get all smart parameters again after the test finishes. Given your problems I would expect to see values above zero for smart parameters 197 and/or 198 for either/both tests. You can also do this for sdb just for peace of mind and for being thorough with your tests.
Regarding firmware, make sure the firmware for all disks is up-to-date as some newer revisions might include important fixes. Also make sure you don't have anything periodically getting the smart parameters from disks, at least some Samsung spinning rust disks have firmware bugs that can lead to data corruption when smart parameters are probed and there is still data in buffers waiting to be flushed to disk.
R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K
Offline
Regarding firmware, make sure the firmware for all disks is up-to-date
To make things clear, we mean disk firmware, the thing running on the tiny CPU inside the disk, not laptop firmware (BIOS/UEFI/whatever). You get this from the disk manufacturer's website. But I have to say I don't think it's firmware's fault if it started only after 2000+ hours.
Does the disk keep spinning afterwards? Any other sound from it?
Offline