You are not logged in.

#1 2024-06-10 13:12:20

Markus00000
Member
Registered: 2011-03-27
Posts: 323

Data loss & broken system after kernel panic. But why?

Context
  • After suspending, my system sometimes freezes with a kernel panic (blinking Caps Lock). I suspect it to be related to “nvidia” and “nvidia-persistenced”.

  • Kernel panics might occur right after resuming or when shutting the system down. In this incident, the kernel panic occurred during a later shutdown via “systemctl poweroff”.

Post-panic reboot
  • “NetworkManager.service” failed to start.

  • i3status reported that “/usr/lib/libogg.so.0” was too short.

  • Firefox could not launch due to some other file being too short.

  • My “.histfile” was empty.

  • (There were likely other issues that I did not notice before trying to fix the system.)

Recovery

The NVMe SDD did not report any errors.

“pacman -Qkk” showed 16 packages with the following error:

error: error while reading file /var/lib/pacman/local/<package>/mtree: Unrecognized archive format

Among these packages was, for example, “libogg”, which prevented i3status from running.

Reinstalling these packages (mostly) unbroke the system.

I searched for files that had been changed between a known-good time before the panic and after mounting the filesystem for recovery. I also diffed the system to a backup. I did not spot any unexpected changes or corrupted files.

Still broken

Twice the system slowed to a crawl. Opening a terminal to run btop and iotop took half a minute. Disk was almost idle. kworker processes topped CPU usage, but the total was only about 20% and dropped quickly back to idle. One time everything returned to normal after 2–3 minutes. The other time the wireless interface disappeared (ip did not show it anymore). This required a reboot. I will check the journal if it happens again.

But why?

Losing “.histfile” did not surprise me as I initiated the shutdown from a terminal. I assume the file was being written as the kernel panic occurred.

What I do not understand is how the other files could be corrupted. I thought files that had not been written within one minute of the kernel panic were safe:

# /etc/sysctl.d/vm.conf
[...]
vm.dirty_writeback_centisecs = 6000
[...]

One broken package was “inkscape”, which was installed a month earlier. It did not even run before the kernel panic (or in the week before).

Questions
  1. How could files like libraries be affected?

  2. What can I do to restore (or ensure) my system’s integrity as it is still partially broken?

  3. Any tips on how to move forward?

  4. Is your backup up to date?

Offline

#2 2024-06-10 16:15:04

seth
Member
Registered: 2012-09-03
Posts: 60,378

Re: Data loss & broken system after kernel panic. But why?

1. filesystem - the files aren't necessarily broken, but the reference is
2. 

sudo LC_ALL=C pacman -Qkk | grep -v ', 0 altered files'

3. https://wiki.archlinux.org/title/Kdump (or for the shutdown, do you have a screenshot of the panic?) + https://aur.archlinux.org/packages/nvidia-535xx-dkms + sleep hook to "sync" the filesystem (though no help for the shutdown) + https://wiki.archlinux.org/title/Solid_ … leshooting
4. I've multiple copies of my porn collection, it's so I can access it anywhere, but of course … wait, that's really none of your business!

Offline

#3 2024-06-10 17:04:38

Markus00000
Member
Registered: 2011-03-27
Posts: 323

Re: Data loss & broken system after kernel panic. But why?

0. Thanks! (Also for your other 53K posts, which helped me more than once.)

1. Can you roughly explain how you think that happened? The journal (XFS) cannot prevent broken references?

2. In case that was a request for the output (no more errors):

backup file: acpid: /etc/acpi/handler.sh (Modification time mismatch)
backup file: acpid: /etc/acpi/handler.sh (Size mismatch)
backup file: acpid: /etc/acpi/handler.sh (SHA256 checksum mismatch)
backup file: apache: /etc/httpd/conf/httpd.conf (Modification time mismatch)
backup file: apache: /etc/httpd/conf/httpd.conf (Size mismatch)
backup file: apache: /etc/httpd/conf/httpd.conf (SHA256 checksum mismatch)
backup file: at: /var/spool/atd/.SEQ (Modification time mismatch)
backup file: at: /var/spool/atd/.SEQ (Size mismatch)
backup file: at: /var/spool/atd/.SEQ (SHA256 checksum mismatch)
backup file: bogofilter: /etc/bogofilter/bogofilter.cf (Size mismatch)
backup file: bogofilter: /etc/bogofilter/bogofilter.cf (SHA256 checksum mismatch)
warning: brscan4: /opt/brother/scanner/brscan4/brsanenetdevice4.cfg (Modification time mismatch)
warning: brscan4: /opt/brother/scanner/brscan4/brsanenetdevice4.cfg (Size mismatch)
warning: brscan4: /opt/brother/scanner/brscan4/brsanenetdevice4.cfg (SHA256 checksum mismatch)
warning: cups: /etc/cups/classes.conf (Permissions mismatch)
warning: cups: /etc/cups/printers.conf (Permissions mismatch)
brscan4: 66 total files, 1 altered file
backup file: cronie: /etc/anacrontab (Modification time mismatch)
backup file: cronie: /etc/anacrontab (SHA256 checksum mismatch)
backup file: cups: /etc/cups/classes.conf (Modification time mismatch)
backup file: cups: /etc/cups/classes.conf (Size mismatch)
backup file: cups: /etc/cups/classes.conf (SHA256 checksum mismatch)
backup file: cups: /etc/cups/printers.conf (Modification time mismatch)
backup file: cups: /etc/cups/printers.conf (Size mismatch)
backup file: cups: /etc/cups/printers.conf (SHA256 checksum mismatch)
cups: 946 total files, 2 altered files
backup file: dnscrypt-proxy: /etc/dnscrypt-proxy/dnscrypt-proxy.toml (Modification time mismatch)
backup file: dnscrypt-proxy: /etc/dnscrypt-proxy/dnscrypt-proxy.toml (Size mismatch)
backup file: dnscrypt-proxy: /etc/dnscrypt-proxy/dnscrypt-proxy.toml (SHA256 checksum mismatch)
backup file: dnsmasq: /etc/dnsmasq.conf (Modification time mismatch)
backup file: dnsmasq: /etc/dnsmasq.conf (Size mismatch)
backup file: dnsmasq: /etc/dnsmasq.conf (SHA256 checksum mismatch)
warning: filesystem: /etc/gshadow (Permissions mismatch)
warning: filesystem: /etc/shadow (Permissions mismatch)
backup file: filesystem: /etc/crypttab (Modification time mismatch)
backup file: filesystem: /etc/crypttab (Size mismatch)
backup file: filesystem: /etc/crypttab (SHA256 checksum mismatch)
backup file: filesystem: /etc/fstab (Modification time mismatch)
backup file: filesystem: /etc/fstab (Size mismatch)
backup file: filesystem: /etc/fstab (SHA256 checksum mismatch)
backup file: filesystem: /etc/group (Modification time mismatch)
backup file: filesystem: /etc/group (Size mismatch)
backup file: filesystem: /etc/group (SHA256 checksum mismatch)
backup file: filesystem: /etc/gshadow (Modification time mismatch)
backup file: filesystem: /etc/gshadow (Size mismatch)
backup file: filesystem: /etc/gshadow (SHA256 checksum mismatch)
backup file: filesystem: /etc/hosts (Modification time mismatch)
backup file: filesystem: /etc/hosts (Size mismatch)
backup file: filesystem: /etc/hosts (SHA256 checksum mismatch)
backup file: filesystem: /etc/passwd (Modification time mismatch)
backup file: filesystem: /etc/passwd (Size mismatch)
backup file: filesystem: /etc/passwd (SHA256 checksum mismatch)
backup file: filesystem: /etc/resolv.conf (Modification time mismatch)
backup file: filesystem: /etc/resolv.conf (Size mismatch)
backup file: filesystem: /etc/resolv.conf (SHA256 checksum mismatch)
backup file: filesystem: /etc/shadow (Modification time mismatch)
backup file: filesystem: /etc/shadow (Size mismatch)
backup file: filesystem: /etc/shadow (SHA256 checksum mismatch)
backup file: filesystem: /etc/shells (Modification time mismatch)
backup file: filesystem: /etc/shells (Size mismatch)
backup file: filesystem: /etc/shells (SHA256 checksum mismatch)
filesystem: 124 total files, 2 altered files
warning: ghc-libs: /usr/lib/ghc-9.2.8/lib/package.conf.d/package.cache (Modification time mismatch)
warning: ghc-libs: /usr/lib/ghc-9.2.8/lib/package.conf.d/package.cache (Size mismatch)
warning: ghc-libs: /usr/lib/ghc-9.2.8/lib/package.conf.d/package.cache (SHA256 checksum mismatch)
ghc-libs: 1459 total files, 1 altered file
backup file: glibc: /etc/locale.gen (Modification time mismatch)
backup file: glibc: /etc/locale.gen (Size mismatch)
backup file: glibc: /etc/locale.gen (SHA256 checksum mismatch)
warning: intel-ucode: /boot/intel-ucode.img (Permissions mismatch)
warning: intel-ucode: /boot/intel-ucode.img (Modification time mismatch)
warning: java-runtime-common: /usr/lib/jvm/default (Symlink path mismatch)
warning: java-runtime-common: /usr/lib/jvm/default (Modification time mismatch)
warning: java-runtime-common: /usr/lib/jvm/default-runtime (Symlink path mismatch)
warning: java-runtime-common: /usr/lib/jvm/default-runtime (Modification time mismatch)
warning: jdownloader2: /opt/JDownloader (UID mismatch)
warning: jdownloader2: /opt/JDownloader (GID mismatch)
intel-ucode: 159 total files, 1 altered file
backup file: intel-undervolt: /etc/intel-undervolt.conf (Modification time mismatch)
backup file: intel-undervolt: /etc/intel-undervolt.conf (Size mismatch)
backup file: intel-undervolt: /etc/intel-undervolt.conf (SHA256 checksum mismatch)
java-runtime-common: 21 total files, 2 altered files
jdownloader2: 59 total files, 1 altered file
backup file: libpaper: /etc/papersize (Modification time mismatch)
backup file: libpaper: /etc/papersize (Size mismatch)
backup file: libpaper: /etc/papersize (SHA256 checksum mismatch)
warning: libutempter: /usr/lib/utempter/utempter (GID mismatch)
warning: libutempter: /usr/lib/utempter/utempter (Permissions mismatch)
libutempter: 20 total files, 1 altered file
warning: mitogen: /usr/lib/python3.12/site-packages/ansible_mitogen/loaders.py (Modification time mismatch)
warning: mitogen: /usr/lib/python3.12/site-packages/ansible_mitogen/loaders.py (SHA256 checksum mismatch)
warning: mlocate: /var/lib/locate (GID mismatch)
warning: nginx: /var/lib/nginx/proxy (UID mismatch)
mitogen: 245 total files, 1 altered file
backup file: mkinitcpio: /etc/mkinitcpio.conf (Modification time mismatch)
backup file: mkinitcpio: /etc/mkinitcpio.conf (Size mismatch)
backup file: mkinitcpio: /etc/mkinitcpio.conf (SHA256 checksum mismatch)
mlocate: 142 total files, 1 altered file
backup file: nginx: /etc/nginx/nginx.conf (Modification time mismatch)
backup file: nginx: /etc/nginx/nginx.conf (Size mismatch)
backup file: nginx: /etc/nginx/nginx.conf (SHA256 checksum mismatch)
nginx: 46 total files, 1 altered file
warning: npm: /usr/lib/node_modules/npm/node_modules/cmd-shim/lib (UID mismatch)
warning: npm: /usr/lib/node_modules/npm/node_modules/cmd-shim/lib (GID mismatch)
npm: 2357 total files, 1 altered file
backup file: openssh: /etc/ssh/sshd_config (Modification time mismatch)
backup file: openssh: /etc/ssh/sshd_config (Size mismatch)
backup file: openssh: /etc/ssh/sshd_config (SHA256 checksum mismatch)
backup file: pacman: /etc/makepkg.conf (Modification time mismatch)
backup file: pacman: /etc/makepkg.conf (Size mismatch)
backup file: pacman: /etc/makepkg.conf (SHA256 checksum mismatch)
backup file: pacman: /etc/pacman.conf (Modification time mismatch)
backup file: pacman: /etc/pacman.conf (Size mismatch)
backup file: pacman: /etc/pacman.conf (SHA256 checksum mismatch)
backup file: pacman-mirrorlist: /etc/pacman.d/mirrorlist (Modification time mismatch)
backup file: pacman-mirrorlist: /etc/pacman.d/mirrorlist (Size mismatch)
backup file: pacman-mirrorlist: /etc/pacman.d/mirrorlist (SHA256 checksum mismatch)
backup file: php: /etc/php/php.ini (Modification time mismatch)
backup file: php: /etc/php/php.ini (Size mismatch)
backup file: php: /etc/php/php.ini (SHA256 checksum mismatch)
warning: shadow: /usr/bin/groupmems (GID mismatch)
warning: shadow: /usr/bin/groupmems (Permissions mismatch)
backup file: sane: /etc/sane.d/dll.conf (Modification time mismatch)
backup file: sane: /etc/sane.d/dll.conf (Size mismatch)
backup file: sane: /etc/sane.d/dll.conf (SHA256 checksum mismatch)
backup file: shadow: /etc/login.defs (Modification time mismatch)
backup file: shadow: /etc/login.defs (SHA256 checksum mismatch)
shadow: 588 total files, 1 altered file
warning: systemd: /var/log/journal (GID mismatch)
warning: texlive-basic: /etc/texmf/tex/generic/config/language.dat (Modification time mismatch)
warning: texlive-basic: /etc/texmf/tex/generic/config/language.dat (Size mismatch)
warning: texlive-basic: /etc/texmf/tex/generic/config/language.dat (SHA256 checksum mismatch)
warning: texlive-basic: /etc/texmf/tex/generic/config/language.dat.lua (Modification time mismatch)
warning: texlive-basic: /etc/texmf/tex/generic/config/language.dat.lua (Size mismatch)
warning: texlive-basic: /etc/texmf/tex/generic/config/language.dat.lua (SHA256 checksum mismatch)
warning: texlive-basic: /etc/texmf/tex/generic/config/language.def (Modification time mismatch)
warning: texlive-basic: /etc/texmf/tex/generic/config/language.def (Size mismatch)
warning: texlive-basic: /etc/texmf/tex/generic/config/language.def (SHA256 checksum mismatch)
backup file: sudo: /etc/sudoers (Modification time mismatch)
backup file: sudo: /etc/sudoers (Size mismatch)
backup file: sudo: /etc/sudoers (SHA256 checksum mismatch)
backup file: systemd: /etc/systemd/coredump.conf (Modification time mismatch)
backup file: systemd: /etc/systemd/coredump.conf (Size mismatch)
backup file: systemd: /etc/systemd/coredump.conf (SHA256 checksum mismatch)
backup file: systemd: /etc/systemd/journald.conf (Modification time mismatch)
backup file: systemd: /etc/systemd/journald.conf (Size mismatch)
backup file: systemd: /etc/systemd/journald.conf (SHA256 checksum mismatch)
backup file: systemd: /etc/systemd/logind.conf (Modification time mismatch)
backup file: systemd: /etc/systemd/logind.conf (Size mismatch)
backup file: systemd: /etc/systemd/logind.conf (SHA256 checksum mismatch)
systemd: 1451 total files, 1 altered file
backup file: texlive-basic: /etc/texmf/web2c/fmtutil.cnf (Modification time mismatch)
backup file: texlive-basic: /etc/texmf/web2c/fmtutil.cnf (Size mismatch)
backup file: texlive-basic: /etc/texmf/web2c/fmtutil.cnf (SHA256 checksum mismatch)
texlive-basic: 2673 total files, 3 altered files
backup file: tpacpi-bat: /etc/conf.d/tpacpi (Modification time mismatch)
backup file: tpacpi-bat: /etc/conf.d/tpacpi (SHA256 checksum mismatch)
warning: vlc: /usr/lib/vlc/plugins/plugins.dat (Modification time mismatch)
warning: vlc: /usr/lib/vlc/plugins/plugins.dat (Size mismatch)
warning: vlc: /usr/lib/vlc/plugins/plugins.dat (SHA256 checksum mismatch)
backup file: ufw: /etc/ufw/ufw.conf (Modification time mismatch)
backup file: ufw: /etc/ufw/ufw.conf (Size mismatch)
backup file: ufw: /etc/ufw/ufw.conf (SHA256 checksum mismatch)
backup file: ufw: /etc/ufw/user.rules (Modification time mismatch)
backup file: ufw: /etc/ufw/user.rules (Size mismatch)
backup file: ufw: /etc/ufw/user.rules (SHA256 checksum mismatch)
backup file: ufw: /etc/ufw/user6.rules (Modification time mismatch)
backup file: ufw: /etc/ufw/user6.rules (Size mismatch)
backup file: ufw: /etc/ufw/user6.rules (SHA256 checksum mismatch)
backup file: upower: /etc/UPower/UPower.conf (Modification time mismatch)
backup file: upower: /etc/UPower/UPower.conf (Size mismatch)
backup file: upower: /etc/UPower/UPower.conf (SHA256 checksum mismatch)
vlc: 1085 total files, 1 altered file

Can I conclude that the slowdowns were unrelated to whatever happened to the filesystem?

3. No screenshot. It panicked on the virtual console after exiting X without any error messages shown. There were only successful shutdown messages up to that point. As I do not need the NVIDIA card at the moment, I removed “nvidia”. I am aware of the many threads on this topic. Never spotted any NVMe errors but will check when the system misbehaves.

4. Post it under Community Contributions? The backup strategy… not the collection.

Offline

#4 2024-06-10 22:28:35

seth
Member
Registered: 2012-09-03
Posts: 60,378

Re: Data loss & broken system after kernel panic. But why?

1. "XFS" (thoug I would have bet on btrfs) - did you run xfs_repair?  Did you (have to) reset the log?
But a systematic XFS problem itr is the delayed allocation. Are you sure about the month old inkscape version? There was an update on June 8th - like with libogg…
2. more an answer for "how do I check my system installation integrity" (and nothing there looks like trouble)
4.

53K posts, which helped me more than once

and now you're trying to get me banned? tongue

Offline

#5 2024-06-11 06:27:16

Markus00000
Member
Registered: 2011-03-27
Posts: 323

Re: Data loss & broken system after kernel panic. But why?

1. I did not run xfs_repair as the file system mounted normally. Running it now gives:

root@archiso ~ # xfs_repair /dev/mapper/arch
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 3
        - agno = 2
clearing reflink flag on inodes when possible
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done

If my “pacman.log” does not lie, then I only got “inkscape (1.3.2-6)” when reinstalling broken packages from the live ISO. Same for “libogg (1.3.5-2)”. Previous version was installed on 2021-06-05. I hope allocation is not that delayed.

Are you suggesting I am old and forgetful and that I might have updated on 2024-06-08 and experienced a kernel panic and then got distracted or otherwise forgot that it happened? Intriguing idea. However, I used this system for hours on 2024-06-09 before the corrupting panic and I think I would have noticed the big red i3status banner telling me it failed to run.

4. My dream is to become this forum’s top poster. You must admit that it is more reasonable to get people like you banned than to post nearly 53000 times. Who in their right mind would do such a thing?

Offline

#6 2024-06-11 07:40:14

seth
Member
Registered: 2012-09-03
Posts: 60,378

Re: Data loss & broken system after kernel panic. But why?

Basic xfs_repair runs on mounting the FS anyway, it's only if you tried to manually repair the system.

Are you suggesting I am old and forgetful and that I might have updated on 2024-06-08 and experienced a kernel panic and then got distracted or otherwise forgot that it happened?

I'm pointing out a pattern that exists in the admittedly very limited sample of broken packages (and that would handily explain the situation) - the failed update could btw. have happend at any point between 2024-06-08 and the hard reboot on 2024-06-09, including minutes before the reboot.
Were other packages affected that would not have possibly gotten an update on 2024-06-09 (because their last repo update was somewhen in April or so)?

Offline

#7 2024-06-11 12:35:45

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 30,330
Website

Re: Data loss & broken system after kernel panic. But why?

One small thought to add: you're facing a range of observable symptoms one of which is a kernel panic.  And it seems you have inferred that the kernel panic itself is the cause of the other symptoms.  I see no evidence in this thread that would support that conclusion.  So asking how or why a kernel panic could cause filesystem corruption seems a bit beside the point: we don't know that it did cause it.


"UNIX is simple and coherent" - Dennis Ritchie; "GNU's Not Unix" - Richard Stallman

Offline

#8 2024-06-11 13:31:08

Markus00000
Member
Registered: 2011-03-27
Posts: 323

Re: Data loss & broken system after kernel panic. But why?

seth, I applaud your pattern recognition skills.

Full list of corrupted packages: easyeffects ethtool gsl inkscape jq lib2geom libmupfd libogg libotr nspr nss sqlcipher unzip woff2 x264 zathura-pdf-mupdf

All of them were updated on 2024-06-08. Let’s see what “pacman.log” has to say:

[2024-06-08T21:56:48+0200] [PACMAN] Running 'pacman --color=always --sync --refresh'
[2024-06-08T21:56:48+0200] [PACMAN] synchronizing package lists
[2024-06-08T21:56:52+0200] [PACMAN] Running 'pacman --color=always --sync --sysupgrade'
[2024-06-08T21:56:52+0200] [PACMAN] starting full system upgrade
[2024-06-09T10:35:21+0200] ...

This might be the evidence Trilby was looking for.

To summarize the likely chain of events:

1) Interrupted attempt to fully update the system at 2024-06-08T21:56. All corrupted packages were part of this update.

2) I turn the system off over night by long-pressing the power button.

3) On 2024-06-09 I boot the system in the morning and everything seems fine. Certainly, there is no big red banner complaining about libogg. I use pacman multiple times that day to install and remove packages without issues. No partial updates. No system updates.

4) I attempt a system update later that day:

[2024-06-09T16:10:02+0200] [PACMAN] Running 'pacman --color=always --sync --refresh'
[2024-06-09T16:10:02+0200] [PACMAN] synchronizing package lists
[2024-06-09T19:14:04+0200] ...

I do not remember what happened here. I do not interrupt updates. Either it was stuck and I thought my Internet connection or WiFi had issues or the system crashed.

5) I reboot the system and all the previously mentioned problems appear (e.g. i3status banner due to libogg being too short).

6) I boot from a live ISO to reinstall the corrupted packages. Reboot. All seems fine.

7) I experience two unexpected slowdowns. Unknown cause and unknown if they are related.

8) Zero issues since.

Sorry for not spotting the interrupted update earlier. If I had Microsoft Recall, I could have retraced every step…

My only remaining question is: Why did I have no issues the next morning if the packages got corrupted the evening before?

Offline

#9 2024-06-11 14:19:59

seth
Member
Registered: 2012-09-03
Posts: 60,378

Re: Data loss & broken system after kernel panic. But why?

None of the packages is super-critical to simply boot the system, you probably noticed the corruption when you looked at it.
Why would i3statusbar complain about libogg itfp? Because it wants to play some ogg file? What would trigger that? Did that status apply the other day?

Offline

#10 2024-06-11 17:20:54

Markus00000
Member
Registered: 2011-03-27
Posts: 323

Re: Data loss & broken system after kernel panic. But why?

$ fuser -v /usr/lib/libogg.so.0.8.5
                     USER        PID ACCESS COMMAND
/usr/lib/libogg.so.0.8.5:
[...]
                     m         79311 ....m i3status

No idea why. Need to look into this some more.

seth wrote:

you probably noticed the corruption when you looked at it.

Right. My point was that I used i3status and Firefox after the first post-panic boot and they were fine.

But you made me realize that there is an explanation: The first panic did not corrupt anything. That is why the next day everything was working. Then, when I tried to update again, it obviously tried to update those same packages. That was when the corruption must have occurred.

Makes sense?

Offline

Board footer

Powered by FluxBB