You are not logged in.

#1 2023-01-05 06:22:55

mwigzell
Member
Registered: 2016-01-29
Posts: 38

[SOLVED] intermittent Failed Load Kernel Modules

I have been dogged for a year or two with intermittent boot failures at the point where systemd-modules-load.service is run. Since I had the logging on "quiet" I had no context to interpret. However I removed the "quiet" and now see that it is a systemd service failure. So I examined the journal and found messages like:

Failed to look up module alias 'crypto_user': Function not implemented
ditto 'sg'
ditto 'nvidia-uvm'

I am using the Nvidia drivers. (and have been doing so probably since this issue started happening), I suspect.
Sometimes I have to try 6 times before it will succeed. Other times it succeeds multiple times in a row.
How to debug this further?
The journal does show also that there is a memory map error, not sure if this is related or not.

 Jan 04 00:00:33 confucius kernel: Cannot map memory with base addr 0x7f549df58000 and size of 0x1 pages
 Jan 04 00:00:33 confucius kernel: Cannot map memory with base addr 0x7f549df57000 and size of 0x1 pages
 Jan 04 00:00:33 confucius kernel: Cannot map memory with base addr 0x7f549df56000 and size of 0x1 pages
 Jan 04 00:00:33 confucius kernel: Cannot map memory with base addr 0x7f549df55000 and size of 0x1 pages
 Jan 04 00:01:00 confucius kernel: Cannot map memory with base addr 0x7f549df54000 and size of 0x1 pages
 Jan 04 00:01:00 confucius kernel: Cannot map memory with base addr 0x7f549df53000 and size of 0x1 pages
 Jan 04 00:01:00 confucius kernel: Cannot map memory with base addr 0x7f549df52000 and size of 0x1 pages
 Jan 04 00:01:00 confucius kernel: Cannot map memory with base addr 0x7f549df51000 and size of 0x1 pages
 Jan 04 00:01:51 confucius kernel: Cannot map memory with base addr 0x7f549df50000 and size of 0x1 pages
 Jan 04 00:01:51 confucius kernel: Cannot map memory with base addr 0x7f549df4f000 and size of 0x1 pages
 Jan 04 00:01:51 confucius kernel: Cannot map memory with base addr 0x7f549df4e000 and size of 0x1 pages
 Jan 04 00:01:51 confucius kernel: Cannot map memory with base addr 0x7f549df4d000 and size of 0x1 pages

Last edited by mwigzell (2023-01-06 22:32:09)

Offline

#2 2023-01-05 07:59:23

d.ALT
Member
Registered: 2019-05-10
Posts: 959

Re: [SOLVED] intermittent Failed Load Kernel Modules

Please provide us complete log(s) / output(s).

What kernel are you on? What version? What nVIDIA drivers are you using?

Cannot map memory with base addr BLAHBLAH and size of BLAHBLAH pages

I can't remember where I've already saw it...


<49,17,III,I>    Fama di loro il mondo esser non lassa;
<50,17,III,I>    misericordia e giustizia li sdegna:
<51,17,III,I>    non ragioniam di lor, ma guarda e passa.

Offline

#3 2023-01-05 09:09:33

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 25,206

Re: [SOLVED] intermittent Failed Load Kernel Modules

That service is for stuff that's in /etc/modules-load.d/ what do you have there? In any case yes, post a complete journal, you can also use journalctl -b -2 and so forth to check the journal two boots ago and the like.

Just from that error this also might read like BIOS emulation bugs on buggy UEFIs, What's your situation here? What's your system (mainboard/laptop model), how did you install Arch? EFI or BIOS mode?

Offline

#4 2023-01-05 16:35:58

mwigzell
Member
Registered: 2016-01-29
Posts: 38

Re: [SOLVED] intermittent Failed Load Kernel Modules

I am using Legacy BIOS boot. No UEFI. This problem is intermittent, not constant.
uname -a

Linux confucius 6.1.1-arch1-1 #1 SMP PREEMPT_DYNAMIC Wed, 21 Dec 2022 22:27:55 +0000 x86_64 GNU/Linux

sudo dmidecode -t 2

# dmidecode 3.4
Getting SMBIOS data from sysfs.
SMBIOS 3.2.0 present.

Handle 0x0002, DMI type 2, 15 bytes
Base Board Information
	Manufacturer: ASUSTeK COMPUTER INC.
	Product Name: ROG STRIX Z490-E GAMING
	Version: Rev 1.xx
	Serial Number: 201278698901574
	Asset Tag: Default string
	Features:
		Board is a hosting board
		Board is replaceable
	Location In Chassis: Default string
	Chassis Handle: 0x0003
	Type: Motherboard
	Contained Object Handles: 0

The journal entries were only there whilst I rescued my system, since systemd just failed badly after that. The above shows the bad journal entries for the modules that failed. (I copied them down).
There is nothing in /etc/modules-load.d.
I am using the automatic pacman -Syu update to my Nvidia drivers and kernel:
Nvidia drivers: 525.60.11
The modules that fail to load intermittently, as logged in that lost journal, are crypto_user, sg, nvidia_uvm.
This problem was not in my installation originally, but crept in at some point when I was tweaking things. I'm not sure it was to do with Nvidia drivers at all.
My feeling is that there is some kind of race condition. Is there a way to attach a file? I could attach a complete successful boot. It has some oddities in it for sure.

ok, I was able to get a journal log for the failed boot. How to upload to you?

Last edited by mwigzell (2023-01-05 20:10:19)

Offline

#5 2023-01-05 17:25:47

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 25,206

Re: [SOLVED] intermittent Failed Load Kernel Modules

So this can definitely be an issue here. Boot a live disk in EFI mode and install/setup a bootloader in EFI mode and boot your system in EFI mode. Many mainboards do not actually have a real tested or working BIOS emulation anymore.

As for attaching files, not directly, use a pastebin service: https://wiki.archlinux.org/title/List_o … n_services

Assuming it is an issue of a plain race condition you can do something about on the linux side while retaining in "BIOS mode", the most likely candidate is KMS/late loading of the graphics kernel modules: https://wiki.archlinux.org/title/Kernel … _KMS_start

But seriously, you board is modern, there's no telling how well tested and implemented it is with BIOS mode and you are basically gimping a lot of modern features by trying to rely on it.

Last edited by V1del (2023-01-05 17:29:53)

Offline

#6 2023-01-05 20:26:01

mwigzell
Member
Registered: 2016-01-29
Posts: 38

Re: [SOLVED] intermittent Failed Load Kernel Modules

Yes, I want to move to EFI. However as I keep trying to indicate: This problem arrived as a result of a tweak, the board worked just fine in BIOS legacy mode until then.
Frankly I'm scared of EFI, probably its not so bad once I go through the steps, but it will require updates to my grub boot scheme for sure.
I had a journal log, but when I got my system back, it wan't there in "/root" I guess I was on some virtual disk somewhere. So I lost it again. Sorry. I will get it next time.

I looked into the reference to KMS_start. Please see this:

pwd
/etc/pacman.d/hooks
mark@confucius:hooks$ ls -al 
total 16
drwxr-xr-x 2 root root 4096 Feb  5  2021  .
drwxr-xr-x 4 root root 4096 Jan  1 11:17  ..
-rw-r--r-- 1 root root  395 Feb  5  2021 '\'
-rw-r--r-- 1 root root  395 Feb  5  2021  nvidia.hook

I don'tknow wht that entry is doing there '\', is it normal?

Last edited by mwigzell (2023-01-05 21:08:18)

Offline

#7 2023-01-05 20:50:58

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 25,206

Re: [SOLVED] intermittent Failed Load Kernel Modules

Well you need to know what you "tweaked". What did you do? The journal will not be in /root but /var/log/journal but the physical presence of this should be largely irrelevant to you, use

sudo journalctl -b-1

for the boot prior to your current one and so forth.

At least the mapping failures in BIOS mode are quite indicative of potential issues with the BIOS emulation.

Last edited by V1del (2023-01-05 20:52:56)

Offline

#8 2023-01-05 21:11:33

mwigzell
Member
Registered: 2016-01-29
Posts: 38

Re: [SOLVED] intermittent Failed Load Kernel Modules

Pls. see edit above regarding the hooks.
Sorry to be such a dunce, I have lived with this issue for at least a year.
I don't know what caused it, I was trying to get nvidia drivers to install with the "pacman -Syu", that was the last major thing I did.
I get it that BIOS failures can cause issues. But this problem was not there with my current BIOS originally. It crept in without my noticing just what change caused it. It was too late to back out, because it is INTERMITTENT.

When I get back from such a crash as above, the journal is MISSING. (I got a TTY during the fail, and saved it off but that didn't work either because I wasn't on my normal root drive I guess)

What is that hook file (see above): '\' ?

Last edited by mwigzell (2023-01-05 21:17:52)

Offline

#9 2023-01-06 17:19:19

mwigzell
Member
Registered: 2016-01-29
Posts: 38

Re: [SOLVED] intermittent Failed Load Kernel Modules

I discovered why the journal was missing: There were two partitions with same label, and systemd assigned root to the wrong one, due to the errors it had during boot. I would consider this some kind of systemd bug, since I use GUIDs for my grub.cfg etc. I fixed that partition label. Not sure yet if this fixes any boot problem. But I was able to get a new journal entry with the latest bad boot, you can plainly see the "Function not implemented" messages in it.
See: http://0x0.st/oR5A.txt

grep "Function not implemented" <journal_1
Jan 06 08:37:33 confucius systemd-modules-load[394]: Failed to look up module alias 'crypto_user': Function not implemented
Jan 06 08:37:33 confucius systemd-modules-load[394]: Failed to look up module alias 'sg': Function not implemented
Jan 06 08:37:33 confucius systemd-modules-load[394]: Failed to look up module alias 'nvidia-uvm': Function not implemented

Also I'm convinced now I may have made a typo with that hook? It seems suspicious, like a finger fumble:

-rw-r--r-- 1 root root  395 Feb  5  2021 '\'

Last edited by mwigzell (2023-01-06 17:21:57)

Offline

#10 2023-01-06 17:35:32

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 25,206

Re: [SOLVED] intermittent Failed Load Kernel Modules

Yes that is nothing remove it. If your labels were the same then this basically answers the question and isn't a systemd bug but the kernel simply picking whichever device it sees first for matching tokens, which is why using unique identifiers for all things related to fstab/static partition  mounting is highly reccommended: https://wiki.archlinux.org/title/Persis … ice_naming

Offline

#11 2023-01-06 17:45:34

mwigzell
Member
Registered: 2016-01-29
Posts: 38

Re: [SOLVED] intermittent Failed Load Kernel Modules

Yes, the dup label occurred because I copied a partition I think.
I thought I was controlling the root partition via grub? Obviously not.

update: I rebooted and got a fairly clean boot. No unmapped memory either. Maybe that is the issue then.Whew!
Here is new journal: http://0x0.st/oR5e.txt

Offline

#12 2023-01-06 17:51:12

Scimmia
Fellow
Registered: 2012-09-01
Posts: 13,726

Re: [SOLVED] intermittent Failed Load Kernel Modules

Copying a partition and dupe label would also mean a dupe UUID.

Online

#13 2023-01-06 22:31:28

mwigzell
Member
Registered: 2016-01-29
Posts: 38

Re: [SOLVED] intermittent Failed Load Kernel Modules

@Scimmia I don't think that is true. I used "dd" to copy. They are separate partitions, having different UUIDs. The relevant partitions are /dev/sdb6 and /dev/sdb9 which I have now formatted so it no longer contains the label. You can see the UUIDs are different:

sudo blkid
[sudo] password for root: 
/dev/nvme0n1p1: LABEL_FATBOOT="ESP" LABEL="ESP" UUID="0D51-DAF0" BLOCK_SIZE="512" TYPE="vfat" PARTLABEL="BOOT" PARTUUID="f68392e0-32b7-4f68-a18b-27ad8f86a7a2"
/dev/sdd1: LABEL="BACKUP_5TB" UUID="ad38cae7-0469-45e0-9cce-123483da4436" BLOCK_SIZE="4096" TYPE="ext4"
/dev/sdb4: PARTUUID="6fe04393-31cf-4e4d-b62c-365d5bd41447"
/dev/sdb2: UUID="23b5d15a-dc24-4d7e-bc5f-ef7b3bb0a75c" TYPE="swap" PARTUUID="be882e2d-d534-4551-9525-75fa93b67197"
/dev/sdb9: UUID="0edd0170-3bd9-48de-9bc8-b53b9e127f7b" BLOCK_SIZE="4096" TYPE="ext4" PARTLABEL="UNUSED_1" PARTUUID="294fa6e5-d39a-4b58-a9e0-edcad7cd753c"
/dev/sdb3: UUID="c2a8fb11-1612-478a-96d3-580c89f46599" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="5ff40e59-316a-4336-9f4f-05dc53d1d5d6"
/dev/sdb1: LABEL="boot" UUID="57183d54-e946-4104-85e7-b7cdca3edd26" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="9b3faad6-8521-493c-8545-c03f6963e9aa"
/dev/sdb8: LABEL="LFS 8.3" UUID="88b7a55e-854b-4e4f-ad6d-0c0d7c9ee0b4" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="c1291794-ce03-4cb6-8ed5-4597cec6593a"
/dev/sdb6: LABEL="ARCH_2021_1" UUID="45616ae4-a8ee-4801-9520-2f2b3724ffae" BLOCK_SIZE="4096" TYPE="ext4" PARTLABEL="ARCH_2021" PARTUUID="ce46d989-c240-42db-9b75-1229dea3fc1d"
/dev/sdc2: LABEL="VAR" UUID="310d82db-f95a-4538-a610-7300159dfa7c" BLOCK_SIZE="4096" TYPE="ext4" PARTLABEL="VAR" PARTUUID="9ce74ef1-324c-4375-83e9-e742e8ebaace"
/dev/sdc1: LABEL="SCRATCH" UUID="5e78efd1-989a-44ef-bc04-cb19a5280dfc" BLOCK_SIZE="4096" TYPE="ext4" PARTLABEL="SCRATCH" PARTUUID="40bd903b-af76-4d6a-89c3-e1b1cea16d55"
/dev/sda1: LABEL="HOME" UUID="0e759725-e822-48f7-ad2e-43eb44120e72" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="00092b70-01"

I have been able to boot no problem since I fixed the label issue, I think that solves it. Marking as SOLVED.

Offline

#14 2023-01-07 07:38:40

Scimmia
Fellow
Registered: 2012-09-01
Posts: 13,726

Re: [SOLVED] intermittent Failed Load Kernel Modules

If you formatted it, that will change the UUID. It's a identifier for the filesystem.

Online

Board footer

Powered by FluxBB