Constant Input/Output Error [Solved]

Heretic12 · 2024-11-06 13:28:07

So, a bit of a background

I'm new to Arch and Linux in general, so I may have done something stupid... Anyways, my setup goes like this: I have 2 NVMe SSDs (Identical, Samsung EVO 970 1TB) on my mboard plus 2 SATA SSDs. The first NVMe has efi and boot physical partitions and the rest of the drive is a partition with a physical volume in LVM group. Another NVMe is 1 big partition with a PV. Those 2 PV are in the same Volume Group and which is, in turn, is divided into lv_root and lv_home for / and /home/ directories. For the desktop I use KDE Plasma on Wayland and recently installed Hyprland to tinker with it.

Pretext

So the problem started somewhat like a week ago. Around that time I did a few things:

first and foremost I did pacman -Syu;
also I've connected another two drives (HDD) to my system;
installed KZones for KWin (and also was playing with plasmoids);
installed Ratbag/Piper and configured my Logitech mouse.

The problem itself

So it all started with Stellaris. I've started getting random crashes with GLib-GObject-CRITICAL **: time: g_object_unref: assertion 'G_IS_OBJECT (object) failed after which the whole system was collapsing with any attempt to run any command ending with Input/Output error and all apps that were running during the crash becoming unresponsive, sometimes showing just a black screen with a cursor. And then it has started to become worse - first launching Brave browser also started to cause this collapse, and yesterday it has become so bad that even launching KDE now ends up with just a black screen and a cursor. The problem is present also on Hyprland and X11 Plasma sessions. The only way to get out of this state is to reboot the PC with the button - which also somehow corrupts my lv_home and makes me run fsck from a bootable iso everytime.

Things I've already tried:

removed all plasmoinds and KWin scripts
removed recently added HDD drives
reinstalled all the packages with pacman -S $(pacman -Qnq) - which includes glib
checked my NVMes with SMART and with Gigabyte mboard tool - no errors
checked lv_home with time -p dd if=/dev/\My VG]/lv_home of=/dev/null bs=4M - nothing
added a kernel parameter nvme_core.defult_ps_max_latency_us=0 which probably made things even worse altough it was already very bad so might be just my impressions

So the question is - WTF?? What possibly can it be? And are there any fixes?

Cause and the solution

It was a hardware problem - particularly with power connectors. The solution was found accidentally - I've just detached all power cables 24-pin MB cable, 2 8-pin CPU cables and 2 8-pin GPU caples, blown on them a bit and reattached back. Yes, that was it! Glory to the Omnissiah I guess...)

Last edited by Heretic12 (2025-02-13 17:47:33)

seth · 2024-11-06 14:12:05

Start by posting a system journal that covers at least some of the issues, eg.

sudo journalctl -b -1 | curl -F 'file=@-' 0x0.st

for the previous ("-1") boot.
Avoifd rebooting w/ the power button, setup and use https://wiki.archlinux.org/title/Keyboa … el_(SysRq) instead.

"nvme_core.defult_ps_max_latency_us=0" and "iommu=soft" are strong contenders, but rn. it's not even remotely clear what your problem is, beyond some vague "Input/Output error … crash … unresponsive… black screen" (the latter likely being the compositor.

Heretic12 · 2024-11-06 14:30:38

Hi, thanks for the reply, and for SysRq advice!

So the journal for the previous boot is here - https://0x0.st/XDZo.txt

UPD
Now I think I'll launch Stellaris right now to trigger that error once again and reboot to capture it in it's full glory

Last edited by Heretic12 (2024-11-06 14:37:00)

seth · 2024-11-06 14:38:56

That journal wasn't sync'd to disk after the root switch (initramfs phase)

Because of the SSDs, https://wiki.archlinux.org/title/Power_ … Management has been more often an issue because several drives were over-optimistically "upgraded" to med_power_with_dipm - check your value and set it to max_performance (or eventually medium_power, but check max_performance first)

Heretic12 · 2024-11-06 14:54:13

So the problem was with SATA drives? And should I change that value for all hosts?

seth · 2024-11-06 14:56:09

I do not know what the problem was/is - it's a guess based on "things that were problems for other people"
The journal doesn't record any issues because that boot ended w/ a hard reset.

Heretic12 · 2024-11-06 16:50:12

OK, I had to battle with my system for a bit, so here's the log:

System started fine - and i caught Input/Output Error with Stellaris. Btw, Steam now is refusing to even launch so I used .sh file. Then after rebooting with SysRq I ran into system being unable to mount drives so I had to hard reboot 3 times and fixed the issue with fsck /dev/Array/lv_home from a bootable usb. So here are journals for the 5 last boots. The 4th is most likely the one that contains that boot with I/O Error - at least it reads so, although I am not that experienced yet to decipher it's content(

UPD
It seems like despite I've done fsck, there are still a lot of corrupted inodes. And - which is even more interesting, there are corrupted inodes in /boot/efi. That's at least was my understanding of those logs...

Last edited by Heretic12 (2024-11-06 17:22:34)

seth · 2024-11-06 20:33:44

4 of them end early

Nov 06 17:16:31 archlinux systemd-journald[469]: Time spent on flushing to /var/log/journal/689dd85299b14c96aac4b7be644764be is 4.566ms for 1091 entries.

https://0x0.st/XDNY.txt contains some actuall runtime journal.

Nov 06 17:18:01 archlinux mount[787]: WARNING: blksize option is ignored because ntfs-3g must calculate it.
Nov 06 17:18:01 archlinux mount[791]: WARNING: blksize option is ignored because ntfs-3g must calculate it.
Nov 06 17:18:01 archlinux kernel: EXT4-fs (nvme0n1p2): recovery complete
Nov 06 17:18:01 archlinux kernel: EXT4-fs (nvme0n1p2): mounted filesystem a8d527f2-71bc-4760-a7ea-b01c99d9ee5c r/w with ordered data mode. Quota mode: none.
Nov 06 17:18:01 archlinux systemd[1]: Mounted /boot.
Nov 06 17:18:01 archlinux systemd[1]: Mounting /boot/efi...
Nov 06 17:18:01 archlinux kernel: EXT4-fs error (device dm-1): ext4_orphan_get:1421: comm mount: bad orphan inode 88891646
Nov 06 17:18:01 archlinux kernel: ext4_test_bit(bit=253, block=355467283) = 0

There's an fsck because of the unclean previous shutdown - and an ntfs partition.

==> IS THERE A PARALLEL WINDOWS INSTALLATION?
=> 3rd link below. Mandatory.
Disable it (it's NOT the BIOS setting!) and reboot windows and linux twice for voodo reasons.

Later on

Nov 06 17:18:05 archlinux kernel: EXT4-fs error (device dm-1): ext4_mb_generate_buddy:1217: group 10945, block bitmap and bg descriptor inconsistent: 313 vs 307 free clusters
Nov 06 17:18:05 archlinux kernel: EXT4-fs error (device dm-1): ext4_mb_generate_buddy:1217: group 10929, block bitmap and bg descriptor inconsistent: 8828 vs 8773 free clusters

there're still some ext4 inconsistencies.

Nov 06 17:18:20 192.168.1.17 kernel: EXT4-fs error (device dm-1): ext4_lookup:1815: inode #89269276: comm kwin_wayland: deleted inode referenced: 89275064
Nov 06 17:18:21 192.168.1.17 kernel: usb 1-6: reset high-speed USB device number 5 using xhci_hcd
Nov 06 17:18:21 192.168.1.17 kcminit_startup[1179]: Initializing  "/usr/lib/qt6/plugins/plasma/kcms/systemsettings/kcm_mouse.so"
Nov 06 17:18:21 192.168.1.17 kcminit_startup[1179]: Initializing  "/usr/lib/qt6/plugins/plasma/kcms/systemsettings/kcm_style.so"
Nov 06 17:18:21 192.168.1.17 kernel: EXT4-fs error (device dm-1): ext4_lookup:1815: inode #89269276: comm ksplashqml: deleted inode referenced: 89275064
Nov 06 17:18:21 192.168.1.17 kernel: EXT4-fs error (device dm-1): ext4_lookup:1815: inode #89269276: comm Xwayland: deleted inode referenced: 89275064

and more

And finally =======================================================

Nov 06 17:19:20 192.168.1.17 kernel: nvme nvme1: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
Nov 06 17:19:20 192.168.1.17 kernel: nvme nvme1: Does your device have a faulty power saving mode enabled?
Nov 06 17:19:20 192.168.1.17 kernel: nvme nvme1: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pcie_port_pm=off" and report a bug
Nov 06 17:19:20 192.168.1.17 udisksd[1184]: Error probing device: NVMe Identify Controller command error: Interrupted system call (g-bd-nvme-error-quark, 1)
Nov 06 17:19:20 192.168.1.17 kernel: nvme 0000:03:00.0: enabling device (0000 -> 0002)
Nov 06 17:19:20 192.168.1.17 kernel: nvme nvme1: Disabling device after reset failure: -19
Nov 06 17:19:20 192.168.1.17 udisksd[1184]: Error probing device: NVMe Identify Namespace command error: Input/output error (g-bd-nvme-error-quark, 1)
Nov 06 17:19:21 192.168.1.17 kernel: nvme nvme1: Identify namespace failed (-5)

You already have "nvme_core.default_ps_max_latency_us=0" so add the others and also "iommu=soft" and run a complete fsck.

Heretic12 · 2024-11-06 21:24:45

==> IS THERE A PARALLEL WINDOWS INSTALLATION?
No, I have 2 SATA SSDs that contain files from my previous setup which was Windows, and I've decided to leave them on NTFS and not to do any additional manipulations. Is it OK to have NTFS partitions in my system? Because at least one of them needs to be available to Windows and Mac as it is a data storage and could be used in different scenarios

So, I've created a file /etc/udev/rules.d/hd_power_save.rules with

ACTION=="add", SUBSYSTEM=="scsi_host", KERNEL=="host*", ATTR{link_power_management_policy}="max_performance"

And also added "iommu=soft" to both GRUB_CMDLINE_LINUX_DEFAULT and GRUB_CMDLINE_LINUX

Now going to reboot and do a complete fsck from a bootable usb. Hope it will fix the issue and I didn't miss anything

Last edited by Heretic12 (2024-11-06 21:27:59)

seth · 2024-11-06 21:34:29

And also added "iommu=soft" to both GRUB_CMDLINE_LINUX_DEFAULT and GRUB_CMDLINE_LINUX

1. editing /etc/default/grub doesn't do anything, you also have to run grub-mkconfig
2. don't forget "pcie_aspm=off pcie_port_pm=off"

Heretic12 · 2024-11-06 21:43:45

Good thing I waited for your reply before doing reboot! I know about generating config and I did it. But totally forgot "pcie_aspm=off pcie_port_pm=off") So if it's gonna solve the issue - it would mean that power management was to blame all along? And btw, woud this affect graphics card since it adjusts all pcie?
So the total list of added kernel parameters would be:

GRUB_CMDLINE_LINUX_DEFAULT="loglevel=3 quiet nvidia-drm.modeset=1 nvme_core.default_ps_max_latency_us=0 iommu=soft pcie_aspm=off pcie_port_pm=off"
GRUB_CMDLINE_LINUX="nvidia-drm.modeset=1 nvme_core.default_ps_max_latency_us=0 iommu=soft pcie_aspm=off pcie_port_pm=off"

Last edited by Heretic12 (2024-11-06 21:45:39)

seth · 2024-11-06 21:57:38

And btw, woud this affect graphics card since it adjusts all pcie?

"pcie_aspm=off" prevents ASPM on the entire bus, but since you've only one GPU, it will likely not have any effect on that.

Also nb. that the general strategy is to first see whether you can stabilize the system at all.
If yes, you'd try to narrow down on the critical parameters to limit the impact as much as possible.

Heretic12 · 2024-11-06 23:10:50

Unfortunately it didn't help. Well... at least Stellaris managed to work a few minutes before the crash. Again, journal entries:

https://0x0.st/XDAs.txt - that boot when I've managed to play a few minutes of Stellaris
https://0x0.st/XDT7.txt - and here I couldn't even enter the Plasma session after the reboot from previous crash - it just hand with greeting animation completely stopping and no ability to switch to TTY

Last edited by Heretic12 (2024-11-07 01:57:35)

Heretic12 · 2024-11-19 15:13:26

Ok, I've tried many things, reinstalled the system a few times even - removed SATA HDDs - the problem still persists. After analyzing logs, I've located a few interesting things that might cause this problem:

Nov 18 23:32:42 archlinux kernel: nvme nvme0: missing or invalid SUBNQN field.
Nov 18 23:32:42 archlinux kernel: nvme nvme2: missing or invalid SUBNQN field.
Nov 18 23:32:42 archlinux kernel: nvme nvme1: missing or invalid SUBNQN field.

and:

Nov 18 23:32:40 archlinux (udev-worker)[472]: host4: /etc/udev/rules.d/hd_power_save.rules:1 Failed to write ATTR{/sys/devices/pci0000:00/0000:00:14.0/usb2/2-3/2-3:1.0/host4/scsi_host/host4/link_power_management_policy}="max_performance", ignoring: No such file or directory

and finally the crash:

Nov 18 23:48:59 192.168.1.17 kernel: nvme nvme2: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
Nov 18 23:48:59 192.168.1.17 kernel: nvme nvme2: Does your device have a faulty power saving mode enabled?
Nov 18 23:48:59 192.168.1.17 kernel: nvme nvme2: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pcie_port_pm=off" and report a bug
Nov 18 23:48:59 192.168.1.17 kernel: nvme 0000:3f:00.0: enabling device (0000 -> 0002)
Nov 18 23:48:59 192.168.1.17 kernel: nvme nvme2: Disabling device after reset failure: -19
Nov 18 23:48:59 192.168.1.17 brave[1317]: [1317:1323:1118/234859.704654:ERROR:simple_index_file.cc(327)] Failed to write the temporary index file
Nov 18 23:48:59 192.168.1.17 kernel: Buffer I/O error on dev dm-1, logical block 183, lost async page write
Nov 18 23:48:59 192.168.1.17 kernel: EXT4-fs warning (device dm-1): ext4_end_bio:342: I/O error 10 writing to inode 96075777 starting block 4001538)
Nov 18 23:48:59 192.168.1.17 kernel: Buffer I/O error on dev dm-1, logical block 9653, lost async page write
Nov 18 23:48:59 192.168.1.17 kernel: EXT4-fs warning (device dm-1): ext4_end_bio:342: I/O error 10 writing to inode 96075777 starting block 4005888)
Nov 18 23:48:59 192.168.1.17 kernel: Buffer I/O error on dev dm-1, logical block 3670020, lost async page write
Nov 18 23:48:59 192.168.1.17 kernel: Buffer I/O error on dev dm-1, logical block 9961480, lost async page write
Nov 18 23:48:59 192.168.1.17 kernel: Aborting journal on device dm-1-8.
Nov 18 23:48:59 192.168.1.17 kernel: EXT4-fs warning (device dm-1): ext4_end_bio:342: I/O error 10 writing to inode 95714855 starting block 10425451)
Nov 18 23:48:59 192.168.1.17 kernel: Buffer I/O error on device dm-1, logical block 10425451
Nov 18 23:48:59 192.168.1.17 kernel: EXT4-fs error (device dm-1): ext4_journal_check_start:84: comm ThreadPoolForeg: Detected aborted journal
Nov 18 23:48:59 192.168.1.17 kernel: Buffer I/O error on dev dm-1, logical block 239632384, lost sync page write
Nov 18 23:48:59 192.168.1.17 kernel: EXT4-fs error (device dm-1) in ext4_reserve_inode_write:5813: Journal has aborted
Nov 18 23:48:59 192.168.1.17 kernel: JBD2: I/O error when updating journal superblock for dm-1-8.
Nov 18 23:48:59 192.168.1.17 kernel: EXT4-fs warning (device dm-1): ext4_convert_unwritten_extents:4870: inode #96075777: block 0: len 512: ext4_ext_map_blocks returned -5
Nov 18 23:48:59 192.168.1.17 kernel: Buffer I/O error on device dm-1, logical block 4001538
Nov 18 23:48:59 192.168.1.17 kernel: Buffer I/O error on dev dm-1, logical block 0, lost sync page write
Nov 18 23:48:59 192.168.1.17 kernel: Buffer I/O error on device dm-1, logical block 4001539
Nov 18 23:48:59 192.168.1.17 kernel: EXT4-fs (dm-1): I/O error while writing superblock
Nov 18 23:48:59 192.168.1.17 kernel: Buffer I/O error on device dm-1, logical block 4001540
Nov 18 23:48:59 192.168.1.17 kernel: Buffer I/O error on device dm-1, logical block 4001541
Nov 18 23:48:59 192.168.1.17 kernel: EXT4-fs (dm-1): Remounting filesystem read-only
Nov 18 23:48:59 192.168.1.17 kernel: Buffer I/O error on device dm-1, logical block 4001542
Nov 18 23:48:59 192.168.1.17 kernel: Buffer I/O error on device dm-1, logical block 4001543
Nov 18 23:48:59 192.168.1.17 kernel: Buffer I/O error on device dm-1, logical block 4001544
Nov 18 23:48:59 192.168.1.17 kernel: Buffer I/O error on device dm-1, logical block 4001545
Nov 18 23:48:59 192.168.1.17 kernel: Buffer I/O error on device dm-1, logical block 4001546
Nov 18 23:48:59 192.168.1.17 kernel: Buffer I/O error on dev dm-1, logical block 0, lost sync page write
Nov 18 23:48:59 192.168.1.17 kernel: EXT4-fs (dm-1): I/O error while writing superblock
Nov 18 23:49:09 192.168.1.17 systemd-coredump[3335]: Process 3196 (stellaris) of user 1000 terminated abnormally with signal 7/BUS, processing...
Nov 18 23:49:09 192.168.1.17 systemd[1]: Created slice Slice /system/drkonqi-coredump-processor.
Nov 18 23:49:09 192.168.1.17 systemd[1]: Created slice Slice /system/systemd-coredump.
Nov 18 23:49:09 192.168.1.17 systemd[1]: Started Process Core Dump (PID 3335/UID 0).
Nov 18 23:49:09 192.168.1.17 systemd[1]: Started Pass systemd-coredump journal entries to relevant user for potential DrKonqi handling.
Nov 18 23:49:19 192.168.1.17 systemd-coredump[3336]: Process 3196 (stellaris) of user 1000 dumped core.
                                                     
                                                     Stack trace of thread 3196:
                                                     #0  0x000079205cae3d58 n/a (/run/host/usr/lib/libgcc_s.so.1 + 0x25d58)
                                                     #1  0x000079205cadeff3 n/a (/run/host/usr/lib/libgcc_s.so.1 + 0x20ff3)
                                                     #2  0x000079205cae133e n/a (/run/host/usr/lib/libgcc_s.so.1 + 0x2333e)
                                                     #3  0x000079205af339a1 n/a (/run/host/usr/lib/libc.so.6 + 0x1249a1)
                                                     #4  0x00000000035022e8 n/a (/home/Heretic/.local/share/Steam/steamapps/common/Stellaris/stellaris + 0x31022e8)
                                                     ELF object binary architecture: AMD x86-64
Nov 18 23:49:19 192.168.1.17 drkonqi-coredump-processor[3337]: "/home/Heretic/.local/share/Steam/steamapps/common/Stellaris/stellaris" 3196 "/var/lib/systemd/coredump/core.stellaris.1000.ba52bc2ff283450a842b4009102f7c20.3196.1731973749000000.zst"

seth · 2024-11-19 15:30:06

The crashes likely happen because your root partition drops out.
Are there any BIOS/UEFI updates or firmware updates for the nvme available?

Otherwise it's increasingly likely that one of the nvme's is faulty, but the error seems to have shifted from 03:00.0 to 3f:00.0 - did you swap the nvme's around?

Heretic12 · 2024-11-19 15:38:09

No, I've removed them from m.2 sockets to check if there are any physical problems, but put them back to the same sockets. The mboard BIOS is updated top the latest version already. Now I'll try to update the SSD firmware using fwupd. Right now the system is relatively usable - at least since I use SysRq instead of power button when crash is happening) The best way to reproduce the error - is to play some game - namely Stellaris - which quickly throws

Nov 18 23:45:54 192.168.1.17 steam[2481]: (process:3123): GLib-GObject-CRITICAL **: 23:45:54.938: g_object_unref: assertion 'G_IS_OBJECT (object)' failed

UPD
fwupd did not find any new updates

Successfully downloaded new metadata: Updates have been published for 0 of 4 local devices

Last edited by Heretic12 (2024-11-19 15:45:42)

seth · 2024-11-19 15:45:46

is to play some game - namely Stellaris - which quickly throws

That's a generic glib error.

If it happens w/ a game, but not eg. a dd stresstest you might be facing a general power issue.
Is the system somehow overclocked? Can you downclock it?

Heretic12 · 2024-11-19 15:55:12

Still, drive error starts promptly after glib error, so I suspect they are somehow connected. The system is using XMP (standard profile for the sticks that I use). Also graphics card is faulty - it crashes during peak loads (did so back on Windows as well) - so I cap the boost clocks using

nvidia-smi --lock-gpu-clocks=0,1695 --mode=1

Otherwise there are no overclocks.

Last edited by Heretic12 (2024-11-19 16:03:16)

seth · 2024-11-19 16:02:58

Still, drive error starts promptly after glib error

Yeah, but it's more likely that the HW failure drives the SW one.

Disable XMP!

Also graphics card is faulty - it crashed during peak loads (did so back on Windows as well) - so I cap the boost clocks using

Maybe cap them a bit more…

Heretic12 · 2024-11-19 16:28:21

Ok, I've played Stellaris for several minutes already - no crashes... yet. I think I'll need to play at least for an hour to make a conclusion, but it already shows - before disabling XMP I couldn't play even for a minute. And so the question is - if it's a power issue - are there ways to fix it? Or is it purely hardware stuff like faulty PSU (even though it's freaking Seasonic)?

seth · 2024-11-19 16:30:49

Run memtest86+ over night or until errors to see whether the XMP config is stable if there's no other load on the system.

Heretic12 · 2024-11-19 16:49:42

Nope! But it's interesting - now it took longer to catch this error!

Nov 19 16:37:05 192.168.1.17 kwin_wayland[871]: kwin_scene_opengl: 0x502: GL_INVALID_OPERATION error generated.
Nov 19 16:37:05 192.168.1.17 kwin_wayland[871]: kwin_scene_opengl: 0x502: GL_INVALID_OPERATION error generated.
Nov 19 16:37:05 192.168.1.17 kwin_wayland[871]: kwin_scene_opengl: 0x502: GL_INVALID_OPERATION error generated.
Nov 19 16:37:10 192.168.1.17 wireplumber[950]: wp-event-dispatcher: <WpAsyncEventHook:0x5763827754c0> failed: <WpSiStandardLink:0x5763829ac7c0> link failed: some node was destroyed before the link was created
Nov 19 16:37:16 192.168.1.17 kernel: nvme nvme2: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
Nov 19 16:37:16 192.168.1.17 kernel: nvme nvme2: Does your device have a faulty power saving mode enabled?
Nov 19 16:37:16 192.168.1.17 kernel: nvme nvme2: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pcie_port_pm=off" and report a bug
Nov 19 16:37:16 192.168.1.17 kernel: nvme 0000:3f:00.0: enabling device (0000 -> 0002)
Nov 19 16:37:16 192.168.1.17 kernel: nvme nvme2: Disabling device after reset failure: -19
Nov 19 16:37:16 192.168.1.17 wireplumber[950]: wp-state: <WpState:0x576382791cd0> could not save stream-properties: Failed to create file “/home/Heretic/.local/state/wireplumber/stream-properties.ER3GX2”: Read-only file system
Nov 19 16:37:16 192.168.1.17 kwin_killer_helper[3703]: org.kde.kwin.killer: Failed to create ApplicationNotResponding path "/home/Heretic/.cache/drkonqi/application-not-responding/"
Nov 19 16:37:16 192.168.1.17 kernel: I/O error, dev nvme2n1, sector 79966216 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
Nov 19 16:37:16 192.168.1.17 kernel: I/O error, dev nvme2n1, sector 82014208 op 0x1:(WRITE) flags 0x4800 phys_seg 108 prio class 0
Nov 19 16:37:16 192.168.1.17 kernel: EXT4-fs warning (device dm-1): ext4_end_bio:342: I/O error 10 writing to inode 95689170 starting block 9995265)
Nov 19 16:37:16 192.168.1.17 kernel: Buffer I/O error on device dm-1, logical block 9995265
Nov 19 16:37:16 192.168.1.17 kernel: EXT4-fs warning (device dm-1): ext4_end_bio:342: I/O error 10 writing to inode 96075877 starting block 10251264)
Nov 19 16:37:16 192.168.1.17 kernel: EXT4-fs warning (device dm-1): ext4_end_bio:342: I/O error 10 writing to inode 96075877 starting block 4000256)
Nov 19 16:37:16 192.168.1.17 kernel: EXT4-fs warning (device dm-1): ext4_end_bio:342: I/O error 10 writing to inode 95689170 starting block 9995391)
Nov 19 16:37:16 192.168.1.17 kernel: Buffer I/O error on device dm-1, logical block 9995391
Nov 19 16:37:16 192.168.1.17 kernel: Buffer I/O error on device dm-1, logical block 9995392
Nov 19 16:37:16 192.168.1.17 kernel: EXT4-fs warning (device dm-1): ext4_end_bio:342: I/O error 10 writing to inode 95689170 starting block 9995394)
Nov 19 16:37:16 192.168.1.17 kernel: Buffer I/O error on device dm-1, logical block 9995394
Nov 19 16:37:16 192.168.1.17 kernel: Buffer I/O error on device dm-1, logical block 9995395
Nov 19 16:37:16 192.168.1.17 kernel: EXT4-fs warning (device dm-1): ext4_end_bio:342: I/O error 10 writing to inode 95689170 starting block 9995417)
Nov 19 16:37:16 192.168.1.17 kernel: Buffer I/O error on device dm-1, logical block 9995417
Nov 19 16:37:16 192.168.1.17 kernel: EXT4-fs warning (device dm-1): ext4_end_bio:342: I/O error 10 writing to inode 95689170 starting block 9995268)
Nov 19 16:37:16 192.168.1.17 kernel: Buffer I/O error on device dm-1, logical block 9995418
Nov 19 16:37:16 192.168.1.17 kernel: EXT4-fs warning (device dm-1): ext4_end_bio:342: I/O error 10 writing to inode 95689170 starting block 9995424)
Nov 19 16:37:16 192.168.1.17 kernel: Buffer I/O error on device dm-1, logical block 9995268
Nov 19 16:37:16 192.168.1.17 kernel: Buffer I/O error on device dm-1, logical block 9995424
Nov 19 16:37:16 192.168.1.17 kernel: Buffer I/O error on device dm-1, logical block 10251265
Nov 19 16:37:16 192.168.1.17 kernel: EXT4-fs warning (device dm-1): ext4_end_bio:342: I/O error 10 writing to inode 95682606 starting block 4136389)
Nov 19 16:37:16 192.168.1.17 kernel: EXT4-fs warning (device dm-1): ext4_end_bio:342: I/O error 10 writing to inode 95689170 starting block 9995427)
Nov 19 16:37:16 192.168.1.17 kernel: Aborting journal on device dm-1-8.
Nov 19 16:37:16 192.168.1.17 kernel: EXT4-fs error (device dm-1) in ext4_reserve_inode_write:5813: Journal has aborted
Nov 19 16:37:16 192.168.1.17 kernel: EXT4-fs error (device dm-1) in add_dirent_to_buf:2149: Journal has aborted
Nov 19 16:37:16 192.168.1.17 kernel: EXT4-fs error (device dm-1): ext4_journal_check_start:84: comm ThreadPoolForeg: Detected aborted journal
Nov 19 16:37:16 192.168.1.17 kernel: Buffer I/O error on dev dm-1, logical block 239632384, lost sync page write
Nov 19 16:37:16 192.168.1.17 kernel: EXT4-fs error (device dm-1) in ext4_reserve_inode_write:5813: Journal has aborted
Nov 19 16:37:16 192.168.1.17 kernel: JBD2: I/O error when updating journal superblock for dm-1-8.
Nov 19 16:37:16 192.168.1.17 kernel: EXT4-fs error (device dm-1): ext4_dirty_inode:6017: inode #95689170: comm ThreadPoolForeg: mark_inode_dirty error
Nov 19 16:37:16 192.168.1.17 kernel: Buffer I/O error on dev dm-1, logical block 0, lost sync page write
Nov 19 16:37:16 192.168.1.17 kernel: EXT4-fs (dm-1): I/O error while writing superblock
Nov 19 16:37:16 192.168.1.17 kernel: EXT4-fs (dm-1): Remounting filesystem read-only
Nov 19 16:37:16 192.168.1.17 kernel: Buffer I/O error on dev dm-1, logical block 183, lost async page write
Nov 19 16:37:16 192.168.1.17 kernel: Buffer I/O error on dev dm-1, logical block 9510, lost async page write
Nov 19 16:37:16 192.168.1.17 kernel: Buffer I/O error on dev dm-1, logical block 9679, lost async page write
Nov 19 16:37:16 192.168.1.17 kernel: Buffer I/O error on dev dm-1, logical block 9961473, lost async page write
Nov 19 16:37:16 192.168.1.17 kernel: Buffer I/O error on dev dm-1, logical block 0, lost sync page write
Nov 19 16:37:16 192.168.1.17 kernel: EXT4-fs (dm-1): I/O error while writing superblock

Heretic12 · 2024-11-19 17:52:39

Another error cough while opening browser

Nov 19 17:43:14 192.168.1.17 systemd[849]: Starting Virtual filesystem service...
Nov 19 17:43:14 192.168.1.17 systemd[849]: Started Virtual filesystem service.
Nov 19 17:43:16 192.168.1.17 baloo_file[895]: kf.solid.backends.udisks2: Failed enumerating UDisks2 objects: "org.freedesktop.DBus.Error.NoReply" 
                                               "Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the net>
Nov 19 17:43:19 192.168.1.17 plasmashell[1050]: kf.solid.backends.udisks2: Failed enumerating UDisks2 objects: "org.freedesktop.DBus.Error.NoReply" 
                                                 "Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the n>
Nov 19 17:43:28 192.168.1.17 kernel: nvme nvme1: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
Nov 19 17:43:28 192.168.1.17 kernel: nvme nvme1: Does your device have a faulty power saving mode enabled?
Nov 19 17:43:28 192.168.1.17 kernel: nvme nvme1: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pcie_port_pm=off" and report a bug
Nov 19 17:43:28 192.168.1.17 udisksd[905]: Error probing device: NVMe Identify Controller command error: Interrupted system call (g-bd-nvme-error-quark, 1)
Nov 19 17:43:28 192.168.1.17 kernel: nvme 0000:02:00.0: enabling device (0000 -> 0002)
Nov 19 17:43:28 192.168.1.17 kernel: nvme nvme1: Disabling device after reset failure: -19
Nov 19 17:43:28 192.168.1.17 udisksd[905]: Error probing device: NVMe Identify Namespace command error: Input/output error (g-bd-nvme-error-quark, 1)
Nov 19 17:43:28 192.168.1.17 kernel: nvme nvme1: Identify namespace failed (-5)
Nov 19 17:43:28 192.168.1.17 systemd[1]: Started Disk Manager.
Nov 19 17:43:28 192.168.1.17 udisksd[905]: Acquired the name org.freedesktop.UDisks2 on the system message bus
Nov 19 17:43:28 192.168.1.17 udisksd[905]: Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/Samsung_SSD_970_EVO_Plus_1TB_S4EWNMFN810703J: Error updating Health Information: No probed controller info available>
Nov 19 17:43:28 192.168.1.17 drkonqi-coredump-processor[1079]: "/opt/brave-bin/brave" 67418 "/var/lib/systemd/coredump/core.brave.1000.c6e3e981cdbb494090b21109b36dba99.67418.1731963956000000.zst"
Nov 19 17:43:28 192.168.1.17 drkonqi-coredump-processor[1079]: "/home/Heretic/.local/share/Steam/steamapps/common/Stellaris/stellaris" 3196 "/var/lib/systemd/coredump/core.stellaris.1000.ba52bc2ff283450a842b4009102f7c20.3196.173197>
Nov 19 17:43:28 192.168.1.17 systemd[849]: Started Launch DrKonqi for a systemd-coredump crash (PID 1079/UID 1000).
Nov 19 17:43:28 192.168.1.17 systemd[849]: Started Launch DrKonqi for a systemd-coredump crash (PID 1079/UID 1000).
Nov 19 17:43:28 192.168.1.17 drkonqi-coredump-processor[1079]: "/home/Heretic/.local/share/Steam/ubuntu12_64/steamwebhelper" 2747 "/var/lib/systemd/coredump/core.steamwebhelper.1000.ba52bc2ff283450a842b4009102f7c20.2747.17319738620>
Nov 19 17:43:28 192.168.1.17 drkonqi-coredump-launcher[1308]: Unable to find file for pid 67418 expected at "kcrash-metadata/brave.c6e3e981cdbb494090b21109b36dba99.67418.ini"
Nov 19 17:43:28 192.168.1.17 drkonqi-coredump-processor[1079]: "/home/Heretic/.local/share/Steam/ubuntu12_64/steamwebhelper" 2697 "/var/lib/systemd/coredump/core.steamwebhelper.1000.ba52bc2ff283450a842b4009102f7c20.2697.17319738620>
Nov 19 17:43:28 192.168.1.17 drkonqi-coredump-launcher[1310]: Unable to find file for pid 3196 expected at "kcrash-metadata/stellaris.ba52bc2ff283450a842b4009102f7c20.3196.ini"
Nov 19 17:43:28 192.168.1.17 drkonqi-coredump-processor[1079]: "/home/Heretic/.local/share/Steam/ubuntu12_64/steamwebhelper" 2700 "/var/lib/systemd/coredump/core.steamwebhelper.1000.ba52bc2ff283450a842b4009102f7c20.2700.17319738630>
Nov 19 17:43:28 192.168.1.17 drkonqi-coredump-processor[1079]: "/home/Heretic/.local/share/Steam/steamapps/common/Stellaris/stellaris" 3540 "/var/lib/systemd/coredump/core.stellaris.1000.c6c6b7340b0e4804be8565d95169590e.3540.173203>

seth · 2024-11-19 19:25:37

As long as the nvme drops out anything else will die.

now it took longer to catch this error!

Is this still the case?
Can you downclock the RAM more (more conservative timings, clockrate, …)?

Try to memtest86+ it w/ XMP and see whether and how fast you get errors and then maybe whether you still get (but much later) errors w/o XMP.

Heretic12 · 2024-11-19 22:21:43

Did 1 full run with memtest86+ with XMP on - passed, no errors

Arch Linux

#1 2024-11-06 13:28:07