You are not logged in.
Pages: 1
My Hard Disk Drive often falls into Read Only Filesystem after using the computer for a few hours/minutes. The temporary solution is to reboot the system, which runs fsck -y /dev/sda3 (sda3 is my Linux Filesystem), which solves the problem for a few hours. Below is the information about my device, when it is working "fine". I had previously used Windows on my computer, then switched to Ubuntu (no dual boot), then to Arch Linux (no dual boot, but have custom-made recovery archiso), and once more I am still noobie on all this, but the problem is about a year old.
uname -a
Linux ArchLinux 5.15.52-1-lts #1 SMP Sat, 02 Jul 2022 20:04:03 +0000 x86_64 GNU/Linuxsudo journalctl -p 3 -xb
Jul 04 15:44:56 archlinux kernel: platform MSFT0101:00: failed to claim resource 1: [mem 0xfed40000-0xfed40fff]
Jul 04 15:44:56 archlinux kernel: acpi MSFT0101:00: platform device creation failed: -16
Jul 04 15:46:11 ArchLinux kernel: ATPX version 1, functions 0x00000033
Jul 04 15:46:11 ArchLinux kernel: ATPX Hybrid Graphics
Jul 04 15:46:12 ArchLinux kernel: ATPX version 1, functions 0x00000033
Jul 04 15:46:12 ArchLinux kernel: ATPX Hybrid Graphics
Jul 04 15:50:05 ArchLinux systemd[443]: Failed to start Kite Updater.sudo smartctl --all /dev/sda3
. . .
Error 1393 occurred at disk power-on lifetime: 18445 hours (768 days + 13 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 41 00 d8 3a 2b 40 Error: UNC at LBA = 0x002b3ad8 = 2833112
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 08 40 d0 5c 92 40 00 03:25:48.224 READ FPDMA QUEUED
60 08 38 58 48 2b 40 00 03:25:44.476 READ FPDMA QUEUED
60 08 30 a0 6b 2b 40 00 03:25:44.473 READ FPDMA QUEUED
60 08 28 40 3c 2b 40 00 03:25:44.472 READ FPDMA QUEUED
60 08 20 f0 5c 2b 40 00 03:25:44.472 READ FPDMA QUEUED
Error 1392 occurred at disk power-on lifetime: 18444 hours (768 days + 12 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 41 18 d8 3a 2b 40 Error: UNC at LBA = 0x002b3ad8 = 2833112
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 08 68 50 38 d1 40 00 02:49:22.718 READ FPDMA QUEUED
60 48 38 28 c7 4d 40 00 02:49:21.876 READ FPDMA QUEUED
60 18 30 00 c7 4d 40 00 02:49:21.876 READ FPDMA QUEUED
60 20 28 d8 c6 4d 40 00 02:49:21.876 READ FPDMA QUEUED
60 50 20 80 c6 4d 40 00 02:49:21.876 READ FPDMA QUEUED
Error 1391 occurred at disk power-on lifetime: 18444 hours (768 days + 12 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 41 80 d8 3a 2b 40 Error: WP at LBA = 0x002b3ad8 = 2833112
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 48 c8 c8 8d 9a 40 00 02:49:17.894 WRITE FPDMA QUEUED
60 08 c0 10 44 2b 40 00 02:49:14.180 READ FPDMA QUEUED
60 08 b8 e0 42 2b 40 00 02:49:14.172 READ FPDMA QUEUED
60 08 b0 d8 4b 2b 40 00 02:49:14.148 READ FPDMA QUEUED
60 18 a8 80 3b 2b 40 00 02:49:14.148 READ FPDMA QUEUED
Error 1390 occurred at disk power-on lifetime: 18439 hours (768 days + 7 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 41 a0 d8 3a 2b 40 Error: WP at LBA = 0x002b3ad8 = 2833112
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 08 10 78 d3 54 40 00 09:24:30.197 WRITE FPDMA QUEUED
61 08 08 d8 ce 54 40 00 09:24:30.196 WRITE FPDMA QUEUED
61 08 00 48 cc 54 40 00 09:24:30.196 WRITE FPDMA QUEUED
61 08 f8 80 ca 54 40 00 09:24:30.196 WRITE FPDMA QUEUED
61 08 b8 a0 c6 54 40 00 09:24:30.196 WRITE FPDMA QUEUED
Error 1389 occurred at disk power-on lifetime: 18439 hours (768 days + 7 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 41 40 d8 3a 2b 40 Error: WP at LBA = 0x002b3ad8 = 2833112
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 08 48 58 b7 54 40 00 09:24:30.046 WRITE FPDMA QUEUED
60 10 40 d0 3a 2b 40 00 09:24:26.317 READ FPDMA QUEUED
60 08 38 a8 3a 2b 40 00 09:24:26.317 READ FPDMA QUEUED
60 10 30 78 3a 2b 40 00 09:24:26.317 READ FPDMA QUEUED
60 08 28 68 3a 2b 40 00 09:24:26.317 READ FPDMA QUEUED
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 30% 17938 3950688
# 2 Extended offline Completed: read failure 90% 17934 3950688
# 3 Short offline Completed: read failure 90% 17934 3950688
. . .sudo dmesg | grep -i sda3
[ 0.963978] sda: sda1 sda2 sda3
[ 61.867824] EXT4-fs (sda3): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
[ 70.642173] EXT4-fs (sda3): re-mounted. Opts: (null). Quota mode: none.Last edited by nkhatiwada (2022-07-20 11:39:30)
Offline
Sounds like your device is dying. Make sure to back up all data to another drive.
Then run a long SMART test and show the results.
Offline
Also post the actual data that you skipped in the smartctl output, but
# 1 Short offline Completed: read failure 30% 17938 3950688
# 2 Extended offline Completed: read failure 90% 17934 3950688
# 3 Short offline Completed: read failure 90% 17934 3950688is bad enough (and the more recent errors are all on LBA = 0x002b3ad8 = 2833112)
To stress what JoeyCorleone said: MAKE THE BACKUP ***FIRST***!
Do so from a live system boot. Read and write the compromised drive as little as possible.
Offline
I made a backup of important files. After running long version of SMART, I got this.
sudo smartctl -l selftest /dev/sda3
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-5.15.52-1-lts] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 18458 3950688
# 2 Short offline Completed: read failure 30% 17938 3950688
# 3 Extended offline Completed: read failure 90% 17934 3950688
# 4 Short offline Completed: read failure 90% 17934 3950688This is the only device I have for study and work, but have started doing most of the work online (Google Applications). Thank you for the suggestion.
Offline
The table above the errors in "smartctl -a" would provide very good information about the condition of the disk (though chances are it's dead)
Offline
I think this is critical information I missed.
sudo smartctl -a /dev/sda3
. . .
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x02) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 121) The previous self-test completed having
the read element of the test failed.
Total time to complete Offline
data collection: ( 120) seconds.
Offline data collection
capabilities: (0x51) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 200) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 100 050 Pre-fail Always - 0
2 Throughput_Performance 0x0027 100 100 050 Pre-fail Always - 0
3 Spin_Up_Time 0x0023 100 100 002 Pre-fail Always - 1760
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 183625
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x002f 100 100 050 Pre-fail Always - 0
8 Seek_Time_Performance 0x0025 100 100 050 Pre-fail Offline - 0
9 Power_On_Hours 0x0032 054 054 000 Old_age Always - 18470
10 Spin_Retry_Count 0x0033 253 100 030 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 5731
183 Runtime_Bad_Block 0x0032 100 100 001 Old_age Always - 0
184 End-to-End_Error 0x0033 100 100 097 Pre-fail Always - 0
185 Unknown_Attribute 0x0032 100 100 001 Old_age Always - 65535
187 Reported_Uncorrect 0x0032 001 001 000 Old_age Always - 1558
188 Command_Timeout 0x0032 100 099 000 Old_age Always - 2
189 High_Fly_Writes 0x003a 100 100 001 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 064 053 040 Old_age Always - 36 (Min/Max 30/36)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 6026
192 Power-Off_Retract_Count 0x0022 100 100 000 Old_age Always - 5242960
193 Load_Cycle_Count 0x0032 071 071 000 Old_age Always - 293571
194 Temperature_Celsius 0x0022 064 053 040 Old_age Always - 36 (Min/Max 30/36)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 816
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
SMART Error Log Version: 1
ATA Error Count: 1558 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
. . . Offline
9 Power_On_Hours 0x0032 054 054 000 Old_age Always - 18470
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 183625
192 Power-Off_Retract_Count 0x0022 100 100 000 Old_age Always - 5242960There are 5242960 power off retracts, ~284/h, one every 12 seconds - there's some issue w/ the power supply of the disk.
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
183 Runtime_Bad_Block 0x0032 100 100 001 Old_age Always - 0
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 816There're MANY pending sectors (what is bad) but apparently no re-allocation.
Either the controller is fused or this is an outfall of the power supply issue.
Check the power cable and inspect the plugs, make sure that's ok (switch it, if you can) and then see what happens.
If you change the supply but Power-Off_Retract_Count keeps running up, the connector is probably broken inside the disk.
Offline
I switched off the power supply and
sudo smartctl -a /dev/sda3 | grep 'Power-Off_Retract_Count'
192 Power-Off_Retract_Count 0x0022 100 100 000 Old_age Always - 5242960I switched the power supply again, but no changes. The Power-Off_Retract_Count remains fixed. Why did this happen? What is the solution?
Offline
I switched off the power supply
How? And was the head active at this time?
Is this a desktop or a notebook?
Offline
Is this a desktop or a notebook?
It is a notebook (laptop). I don't understand what the "head" means. Could you please clarify?
How?
I mean that the laptop was connected to the power supply, which I turned off and on again. Did I misunderstand something?
Offline
I don't understand what the "head" means.
https://en.wikipedia.org/wiki/Disk_read-and-write_head
I mean that the laptop was connected to the power supply, which I turned off and on again.
Did I misunderstand something?
Yes.
The laptop still has a battery. The problem here is that the disk gets cut from power at an insane frequency (perhaps a wonky connector)
This has nothing to do w/ the external AC adapter. The problem will be inside the notebook (unless the battery is dead, too)
Offline
That means I possibly have to replace the power cable (connector) inside the notebook. Any further suggestions are appreciated.
Offline
HDDs in notebooks typically have no power cable - there's one connection.
You could see whether that's loose.
Otherwise the system cannot provide enough voltage to keep the HDD up - this could either be a HW defect or maybe too aggressive power saving.
The thing you need to be aware of is that if you replace the HDD, the new one might suffer from the same environment and show the same symptoms.
Offline
Thanks a lot. Your suggestion fixed my issue. Here is what I did:
I disconnected the connector and the hard drive, then reconnected them.
The computer is now up for more than a week without a single issue.
Thanks again for saving my computer's life.
Offline
Glad to hear.
Please always remember to mark resolved threads by editing your initial posts subject - so others will know that there's no task left, but maybe a solution to find.
Thanks.
Offline
Pages: 1