You are not logged in.
Hello all,
quite recently, I have installed Arch on a J5005-ITX system as a file server operating system.
I chose a 120GB SATA SSD as the main hard drive for the OS, and installed arch on a LUKS-encrypted btrfs subvolue.
This went all well and fine.
Then I started a first backup to this server (and to other hard disks that are connected and that I use for the backups).
This backup was running since ~6days - and today I noticed after logging into the system via ssh that it behavious in a weird way:
- zsh was informing me that it cannot write to its history file
- zsh was giving me "input output errors" for some commands like dmesg
- mount showed that all my btrfs subvolumes that are mounted (home / root / snapshots) became READ ONLY (where /etc/fstab clearly lists them as "rw")
After a (hard) reboot (because arch claimed it could not unmount the fileystems), everything is back to normal.
I do not know where I can start to look for the root cause of this.
Some side info
- linux-hardened is used
- zfs on linux is there for the backup drives
- barely 4GB of space are used of the 120GB "OS drive"
- btrfs check reports no errors
Any ideas what might have gone wrong that I ended up with read-only btrfs fileystems?
Thank you very much in advance.
Offline
bump...
Can anyone maybe point me into the direction to understand WHEN the kernel would decide to make a mounted BTRFS filesystem read only?
Thank you very much in advance!
Offline
Don't do this: https://wiki.archlinux.org/index.php/Co … ct#Bumping
Check your dmesg/journal logs from when it happened and run a long SMART test and post the output of
smartctl -a $device
after the mentioned testing time has elapsed.
If you can't actively reproduce this and it became read only while writing the journal and you didn't write out the in memory dmesg or journals at that point in time, there's unlikely to be much to be done here, until you run into it again.
Offline
>Any ideas what might have gone wrong that I ended up with read-only btrfs fileystems?
There's not enough information. Needs a complete dmesg which will include events leading up to read-only. Btrfs goes read-only when it becomes confused due to file system corruption, to avoid further corrupting the file system. Has this Btrfs ever been written to with any kernel version 5.2.0 through 5.2.14? There's a known corruption bug there, and the newer tree checker catches this problem in later kernels, so it could be a case of old Btrfs bug causing it. Or it could be a case of hardware induced corruption. But dmesg is needed. Chances are you'll get a faster response about such problems from linux-btrfs@ list.
Last edited by cmurf (2020-04-05 04:40:58)
Offline
I ran a similar setup on my laptop, luks + btrfs + ssd, and had my drive go into ro randomly with nothing unusual in the logs. After months of troubleshooting, multiple OS reinstallations, different OS and multiple crashes a day the conclusion was a drive / driver incompatibility. Dell sent me a new ssd (Toshiba instead of Samsung), and my system is rock solid ever since.
If you have a spare ssd at hand, maybe this is something to test.
Offline
I also have had the same issue with a combination of luks, btrfs and SSD; and I never have figure out why. However, I have had this issue twice in 2018 when the Linux kernel was still at version 4. After getting into read-only once or twice, file system corruption would soon follow.
On Arch Wiki, it says that the ‘tlp’ package may cause file system corruption with btrfs, see https://wiki.archlinux.org/index.php/TLP#Btrfs. I can’t tell for sure that I have had this package installed, since I did a complete reinstall back then. Anyway, that’s something to keep in mind.
Last edited by zutruth (2020-04-06 14:27:11)
Offline
TLP would only be tangentially related here. However it is indeed the case that samsung SSDs are notorious for issues with low power modes. You'd definitely want to set those to max_performance if this happens to be something from Samsung, regardless of which method used to do so.
Offline