You are not logged in.
I am unsure what causes teh issues I am having, so I'd like some help troubleshooting.
My system:
4.14.14-1-ARCH running on an x250
I'm using i3 as my WM
The problem:
About once a day, my filesystem becomes read-only, most bash-commands are "not found" - except basics like 'cd', 'mv' etc... i3 and Xorg seems to be still running, as I can use i3 keybinds.
I usually notice the issue when I loose my network-connection and the current application I am working in freezes. The i3 statusbar that contains my workspace-icons dissapears, but Polybar (statusbar) stays - but does not update since the scripts are no longer working.
As far as I can tell, all packages on the machine stops working / aren't detected, and the filesystem is read-only. The only fix right now is to reboot by forcing the machine to turn off by holding the down the power-button for 5 seconds. (shutdown -h now is not recognized as a command, and I cannot exit i3 or end my session because none of the commands to do that are recognized).
Where do I start troubleshooting this? Are there any log-files I can provide that might help solve this issue? I have looked at my xorg logs, but they seem fine. Xorg just freaks out that the filesystem is read-only..
Thank you
Last edited by vegarab (2018-02-02 19:10:59)
Offline
Make sure you have a backup of everything that's important
Check dmesg when this happens, run a SMART Test This sounds like your drive is about to die, or some connector cable being loose.
Offline
Can you describe your partitioning scheme? Saying that the filesystem is readonly and many commands are not found are two very different things, if /usr/bin/whatever was readable, it would be found.
I'm wondering if it could be some partition is lost entirely, and another is readonly. Or perhaps if disk access is lost entirely, maybe ram-cached files could still be readable but an attempt to write to them would fail.
On that train of thought, please post the exact commands and error messages that lead to your inferences: what specific symptom led to the conclusion that the filesystem is readonly?
What is the output of `mount` when this happens? If `mount` is one of the commands not found, try again and use `mount` regularly when everything is working fine (just in case it is only commands in memory that can still run, then`mount` might still be able to run after the problem shows up).
"UNIX is simple and coherent..." - Dennis Ritchie, "GNU's Not UNIX" - Richard Stallman
Offline
Make sure you have a backup of everything that's important
Check dmesg when this happens, run a SMART Test This sounds like your drive is about to die, or some connector cable being loose.
SMART overall-health self-assessment test result: PASSED
Can you describe your partitioning scheme? Saying that the filesystem is readonly and many commands are not found are two very different things, if /usr/bin/whatever was readable, it would be found.
I'm wondering if it could be some partition is lost entirely, and another is readonly. Or perhaps if disk access is lost entirely, maybe ram-cached files could still be readable but an attempt to write to them would fail.
On that train of thought, please post the exact commands and error messages that lead to your inferences: what specific symptom led to the conclusion that the filesystem is readonly?
What is the output of `mount` when this happens? If `mount` is one of the commands not found, try again and use `mount` regularly when everything is working fine (just in case it is only commands in memory that can still run, then`mount` might still be able to run after the problem shows up).
Disk /dev/sda: 238.5 GiB, 256060514304 bytes, 500118192 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 244E20C9-9239-4D6E-BDC7-FDB21DBC1F05
Device Start End Sectors Size Type
/dev/sda1 2048 393215 391168 191M EFI System
/dev/sda2 393216 58986495 58593280 28G Linux filesystem
/dev/sda3 58986496 234768383 175781888 83.8G Linux filesystem
sda1 is /boot, sda2 is /, sda3 is /home
I know that the /home filesystem is read-only because I usually have vim open, and :w fails, telling me that the filesystem is read-only. I haven't tried running 'mount' when this happens.
I'd love to tell you what commands make this happen: 1 hour ago it happened when I was writing up a document in vim. Other times it as happened when surfing in Firefox. The only thing I can say for sure is that Firefox Quantum has been running during every crash.
Edit:
It's worth noting that I once managed to quit i3 just as I noticed the problem occuring. I was then left in a bash-shell with status-messages from Xorg with error-messages saying that the filesystem was read-only. I wasn't able to do anythign from there since 'exit' to end my session was not recognized, and Xorg kept spamming commands. Ctrl-C also fails to stop it.
Last edited by vegarab (2018-01-25 12:37:34)
Offline
How does your free space look like? What do you get from a
df -H
Regarding SMART, you still should run a proper full test and then post the output of
smartctl -a /dev/sda
Overall assessment = passed might not mean much.
Offline
How does your free space look like? What do you get from a
df -H
Regarding SMART, you still should run a proper full test and then post the output of
smartctl -a /dev/sda
Overall assessment = passed might not mean much.
Filesystem Size Used Avail Use% Mounted on
dev 4.2G 0 4.2G 0% /dev
run 4.2G 877k 4.2G 1% /run
/dev/sda2 30G 12G 17G 41% /
tmpfs 4.2G 31M 4.1G 1% /dev/shm
tmpfs 4.2G 0 4.2G 0% /sys/fs/cgroup
tmpfs 4.2G 13k 4.2G 1% /tmp
/dev/sda1 201M 53M 149M 27% /boot
/dev/sda3 89G 4.1G 80G 5% /home
tmpfs 825M 17k 825M 1% /run/user/1000
No issues with full partitions
https://pastebin.com/235JahTM S.M.A.R.T. results
Last edited by vegarab (2018-01-25 13:05:36)
Offline
Please edit your topic title to remove the superfluous cry for help.
https://wiki.archlinux.org/index.php/Co … ow_to_post
SMART attributes aren't always the easiest things to read, but that power loss count attribute seems like it would be worth investigating. Can you make sure the power cable is properly connected and/or use a different one?
Sakura:-
Mobo: MSI MAG X570S TORPEDO MAX // Processor: AMD Ryzen 9 5950X @4.9GHz // GFX: AMD Radeon RX 5700 XT // RAM: 32GB (4x 8GB) Corsair DDR4 (@ 3000MHz) // Storage: 1x 3TB HDD, 6x 1TB SSD, 2x 120GB SSD, 1x 275GB M2 SSD
Making lemonade from lemons since 2015.
Offline
You still haven't posted a dmesg, but Samsung SSDs are quite notorious for this and similar issues. One thing that might be contributing here, if you are using TLP or laptop-mode-tools or similar, try to disable the SATA power saving modes/set them to max performance, see this thread as well: https://github.com/linrunner/TLP/issues/84 and check if there is a UEFI/SSD firmware update from your vendor
Last edited by V1del (2018-01-25 13:16:44)
Offline
You still haven't posted a dmesg...
To be fair, you asked for dmesg results from when the problem occurs - and the OP says it happens about once a day: so we should probably wait at least 24 hours before being concerned about the absence of that information.
"UNIX is simple and coherent..." - Dennis Ritchie, "GNU's Not UNIX" - Richard Stallman
Offline
Hmm true, depending on what the drive is able to choke out during the issue the information might be present in the journal, however if we can get a dmesg that would probably be the safest bet.
Offline
You still haven't posted a dmesg, but Samsung SSDs are quite notorious for this and similar issues. One thing that might be contributing here, if you are using TLP or laptop-mode-tools or similar, try to disable the SATA power saving modes/set them to max performance, see this thread as well: https://github.com/linrunner/TLP/issues/84 and check if there is a UEFI/SSD firmware update from your vendor
This might be the issue. As far as I can remember, these issues started a few days after I installed powertop and set it to automatically save as much power as possible while running on battery power. I'll disable the SATA power saving modes and observe if the issue occurs again.
SMART attributes aren't always the easiest things to read, but that power loss count attribute seems like it would be worth investigating. Can you make sure the power cable is properly connected and/or use a different one?
I ran the test while on battery power. I'm not able to run a test with a power-cable until tomorrow. Will provide new results then.
Edit:
I also use tlp, tlp_smapi and acpi_call on my Thinkpad. https://wiki.archlinux.org/index.php/TLP
Last edited by vegarab (2018-01-25 13:29:36)
Offline
You shouldn't combine both the powertop autotune and the TLP power savings at the same time, they will fight and conflict each other. But yeah, try to set the corresponding option in TLP for the SATA link mode
Offline
Just to make sure I am doing this right.
tlp-stat ouput: https://pastebin.com/zvdBvR6X
I will set (line 22)
SATA_LINKPWR_ON_BAT=min_power
to
max_performance
, and I have removed
--auto-tune
option from /etc/systemd/system/powertop.service
Anything else I should do to give the SATA-link all the power it needs?
Offline
You shouldn't combine both the powertop autotune and the TLP power savings at the same time, they will fight and conflict each other. But yeah, try to set the corresponding option in TLP for the SATA link mode
Alrigt so I should remove e.g. powertop?
Offline
No, just disable its service that applies the autotune, you can still use powertop to look at the device stats, you simply shouldn't set the tuning options manually, since TLP is covering them.
Changing that line should be enough from what I understand yeah.
Last edited by V1del (2018-01-25 14:05:25)
Offline
It seems the issue was resolved by setting SATA-power to max-performance and removing auto-tune from Powertop. Marking topic as solved.
Offline