You are not logged in.
Pages: 1
I'm new to Arch and Linux in general, so I may have done something stupid... Anyways, my setup goes like this: I have 2 NVMe SSDs (Identical, Samsung EVO 970 1TB) on my mboard plus 2 SATA SSDs. The first NVMe has efi and boot physical partitions and the rest of the drive is a partition with a physical volume in LVM group. Another NVMe is 1 big partition with a PV. Those 2 PV are in the same Volume Group and which is, in turn, is divided into lv_root and lv_home for / and /home/ directories. For the desktop I use KDE Plasma on Wayland and recently installed Hyprland to tinker with it.
So the problem started somewhat like a week ago. Around that time I did a few things:
first and foremost I did pacman -Syu;
also I've connected another two drives (HDD) to my system;
installed KZones for KWin (and also was playing with plasmoids);
installed Ratbag/Piper and configured my Logitech mouse.
So it all started with Stellaris. I've started getting random crashes with GLib-GObject-CRITICAL **: time: g_object_unref: assertion 'G_IS_OBJECT (object) failed after which the whole system was collapsing with any attempt to run any command ending with Input/Output error and all apps that were running during the crash becoming unresponsive, sometimes showing just a black screen with a cursor. And then it has started to become worse - first launching Brave browser also started to cause this collapse, and yesterday it has become so bad that even launching KDE now ends up with just a black screen and a cursor. The problem is present also on Hyprland and X11 Plasma sessions. The only way to get out of this state is to reboot the PC with the button - which also somehow corrupts my lv_home and makes me run fsck from a bootable iso everytime.
removed all plasmoinds and KWin scripts
removed recently added HDD drives
reinstalled all the packages with pacman -S $(pacman -Qnq) - which includes glib
checked my NVMes with SMART and with Gigabyte mboard tool - no errors
checked lv_home with time -p dd if=/dev/\My VG]/lv_home of=/dev/null bs=4M - nothing
added a kernel parameter nvme_core.defult_ps_max_latency_us=0 which probably made things even worse altough it was already very bad so might be just my impressions
So the question is - WTF?? What possibly can it be? And are there any fixes?
Last edited by Heretic12 (Yesterday 13:32:35)
Offline
Start by posting a system journal that covers at least some of the issues, eg.
sudo journalctl -b -1 | curl -F 'file=@-' 0x0.st
for the previous ("-1") boot.
Avoifd rebooting w/ the power button, setup and use https://wiki.archlinux.org/title/Keyboa … el_(SysRq) instead.
"nvme_core.defult_ps_max_latency_us=0" and "iommu=soft" are strong contenders, but rn. it's not even remotely clear what your problem is, beyond some vague "Input/Output error … crash … unresponsive… black screen" (the latter likely being the compositor.
Offline
Hi, thanks for the reply, and for SysRq advice!
So the journal for the previous boot is here - https://0x0.st/XDZo.txt
UPD
Now I think I'll launch Stellaris right now to trigger that error once again and reboot to capture it in it's full glory
Last edited by Heretic12 (Yesterday 14:37:00)
Offline
That journal wasn't sync'd to disk after the root switch (initramfs phase)
Because of the SSDs, https://wiki.archlinux.org/title/Power_ … Management has been more often an issue because several drives were over-optimistically "upgraded" to med_power_with_dipm - check your value and set it to max_performance (or eventually medium_power, but check max_performance first)
Offline
So the problem was with SATA drives? And should I change that value for all hosts?
Offline
I do not know what the problem was/is - it's a guess based on "things that were problems for other people"
The journal doesn't record any issues because that boot ended w/ a hard reset.
Offline
OK, I had to battle with my system for a bit, so here's the log:
System started fine - and i caught Input/Output Error with Stellaris. Btw, Steam now is refusing to even launch so I used .sh file. Then after rebooting with SysRq I ran into system being unable to mount drives so I had to hard reboot 3 times and fixed the issue with fsck /dev/Array/lv_home from a bootable usb. So here are journals for the 5 last boots. The 4th is most likely the one that contains that boot with I/O Error - at least it reads so, although I am not that experienced yet to decipher it's content(
UPD
It seems like despite I've done fsck, there are still a lot of corrupted inodes. And - which is even more interesting, there are corrupted inodes in /boot/efi. That's at least was my understanding of those logs...
Last edited by Heretic12 (Yesterday 17:22:34)
Offline
4 of them end early
Nov 06 17:16:31 archlinux systemd-journald[469]: Time spent on flushing to /var/log/journal/689dd85299b14c96aac4b7be644764be is 4.566ms for 1091 entries.
https://0x0.st/XDNY.txt contains some actuall runtime journal.
Nov 06 17:18:01 archlinux mount[787]: WARNING: blksize option is ignored because ntfs-3g must calculate it.
Nov 06 17:18:01 archlinux mount[791]: WARNING: blksize option is ignored because ntfs-3g must calculate it.
Nov 06 17:18:01 archlinux kernel: EXT4-fs (nvme0n1p2): recovery complete
Nov 06 17:18:01 archlinux kernel: EXT4-fs (nvme0n1p2): mounted filesystem a8d527f2-71bc-4760-a7ea-b01c99d9ee5c r/w with ordered data mode. Quota mode: none.
Nov 06 17:18:01 archlinux systemd[1]: Mounted /boot.
Nov 06 17:18:01 archlinux systemd[1]: Mounting /boot/efi...
Nov 06 17:18:01 archlinux kernel: EXT4-fs error (device dm-1): ext4_orphan_get:1421: comm mount: bad orphan inode 88891646
Nov 06 17:18:01 archlinux kernel: ext4_test_bit(bit=253, block=355467283) = 0
There's an fsck because of the unclean previous shutdown - and an ntfs partition.
==> IS THERE A PARALLEL WINDOWS INSTALLATION?
=> 3rd link below. Mandatory.
Disable it (it's NOT the BIOS setting!) and reboot windows and linux twice for voodo reasons.
Later on
Nov 06 17:18:05 archlinux kernel: EXT4-fs error (device dm-1): ext4_mb_generate_buddy:1217: group 10945, block bitmap and bg descriptor inconsistent: 313 vs 307 free clusters
Nov 06 17:18:05 archlinux kernel: EXT4-fs error (device dm-1): ext4_mb_generate_buddy:1217: group 10929, block bitmap and bg descriptor inconsistent: 8828 vs 8773 free clusters
there're still some ext4 inconsistencies.
Nov 06 17:18:20 192.168.1.17 kernel: EXT4-fs error (device dm-1): ext4_lookup:1815: inode #89269276: comm kwin_wayland: deleted inode referenced: 89275064
Nov 06 17:18:21 192.168.1.17 kernel: usb 1-6: reset high-speed USB device number 5 using xhci_hcd
Nov 06 17:18:21 192.168.1.17 kcminit_startup[1179]: Initializing "/usr/lib/qt6/plugins/plasma/kcms/systemsettings/kcm_mouse.so"
Nov 06 17:18:21 192.168.1.17 kcminit_startup[1179]: Initializing "/usr/lib/qt6/plugins/plasma/kcms/systemsettings/kcm_style.so"
Nov 06 17:18:21 192.168.1.17 kernel: EXT4-fs error (device dm-1): ext4_lookup:1815: inode #89269276: comm ksplashqml: deleted inode referenced: 89275064
Nov 06 17:18:21 192.168.1.17 kernel: EXT4-fs error (device dm-1): ext4_lookup:1815: inode #89269276: comm Xwayland: deleted inode referenced: 89275064
and more
And finally =======================================================
Nov 06 17:19:20 192.168.1.17 kernel: nvme nvme1: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
Nov 06 17:19:20 192.168.1.17 kernel: nvme nvme1: Does your device have a faulty power saving mode enabled?
Nov 06 17:19:20 192.168.1.17 kernel: nvme nvme1: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pcie_port_pm=off" and report a bug
Nov 06 17:19:20 192.168.1.17 udisksd[1184]: Error probing device: NVMe Identify Controller command error: Interrupted system call (g-bd-nvme-error-quark, 1)
Nov 06 17:19:20 192.168.1.17 kernel: nvme 0000:03:00.0: enabling device (0000 -> 0002)
Nov 06 17:19:20 192.168.1.17 kernel: nvme nvme1: Disabling device after reset failure: -19
Nov 06 17:19:20 192.168.1.17 udisksd[1184]: Error probing device: NVMe Identify Namespace command error: Input/output error (g-bd-nvme-error-quark, 1)
Nov 06 17:19:21 192.168.1.17 kernel: nvme nvme1: Identify namespace failed (-5)
You already have "nvme_core.default_ps_max_latency_us=0" so add the others and also "iommu=soft" and run a complete fsck.
Offline
==> IS THERE A PARALLEL WINDOWS INSTALLATION?
No, I have 2 SATA SSDs that contain files from my previous setup which was Windows, and I've decided to leave them on NTFS and not to do any additional manipulations. Is it OK to have NTFS partitions in my system? Because at least one of them needs to be available to Windows and Mac as it is a data storage and could be used in different scenarios
So, I've created a file /etc/udev/rules.d/hd_power_save.rules with
ACTION=="add", SUBSYSTEM=="scsi_host", KERNEL=="host*", ATTR{link_power_management_policy}="max_performance"
And also added "iommu=soft" to both GRUB_CMDLINE_LINUX_DEFAULT and GRUB_CMDLINE_LINUX
Now going to reboot and do a complete fsck from a bootable usb. Hope it will fix the issue and I didn't miss anything
Last edited by Heretic12 (Yesterday 21:27:59)
Offline
And also added "iommu=soft" to both GRUB_CMDLINE_LINUX_DEFAULT and GRUB_CMDLINE_LINUX
1. editing /etc/default/grub doesn't do anything, you also have to run grub-mkconfig
2. don't forget "pcie_aspm=off pcie_port_pm=off"
Offline
Good thing I waited for your reply before doing reboot! I know about generating config and I did it. But totally forgot "pcie_aspm=off pcie_port_pm=off") So if it's gonna solve the issue - it would mean that power management was to blame all along? And btw, woud this affect graphics card since it adjusts all pcie?
So the total list of added kernel parameters would be:
GRUB_CMDLINE_LINUX_DEFAULT="loglevel=3 quiet nvidia-drm.modeset=1 nvme_core.default_ps_max_latency_us=0 iommu=soft pcie_aspm=off pcie_port_pm=off"
GRUB_CMDLINE_LINUX="nvidia-drm.modeset=1 nvme_core.default_ps_max_latency_us=0 iommu=soft pcie_aspm=off pcie_port_pm=off"
Last edited by Heretic12 (Yesterday 21:45:39)
Offline
And btw, woud this affect graphics card since it adjusts all pcie?
"pcie_aspm=off" prevents ASPM on the entire bus, but since you've only one GPU, it will likely not have any effect on that.
Also nb. that the general strategy is to first see whether you can stabilize the system at all.
If yes, you'd try to narrow down on the critical parameters to limit the impact as much as possible.
Offline
Unfortunately it didn't help. Well... at least Stellaris managed to work a few minutes before the crash. Again, journal entries:
https://0x0.st/XDAs.txt - this as that boot
https://0x0.st/XDT7.txt - and here I couldn't even enter the Plasma session - it just hand with greeting animation completely stopping and no ability to switch to TTY
Offline
Pages: 1