You are not logged in.
I have been dogged for a year or two with intermittent boot failures at the point where systemd-modules-load.service is run. Since I had the logging on "quiet" I had no context to interpret. However I removed the "quiet" and now see that it is a systemd service failure. So I examined the journal and found messages like:
Failed to look up module alias 'crypto_user': Function not implemented
ditto 'sg'
ditto 'nvidia-uvm'I am using the Nvidia drivers. (and have been doing so probably since this issue started happening), I suspect.
Sometimes I have to try 6 times before it will succeed. Other times it succeeds multiple times in a row.
How to debug this further?
The journal does show also that there is a memory map error, not sure if this is related or not.
Jan 04 00:00:33 confucius kernel: Cannot map memory with base addr 0x7f549df58000 and size of 0x1 pages
Jan 04 00:00:33 confucius kernel: Cannot map memory with base addr 0x7f549df57000 and size of 0x1 pages
Jan 04 00:00:33 confucius kernel: Cannot map memory with base addr 0x7f549df56000 and size of 0x1 pages
Jan 04 00:00:33 confucius kernel: Cannot map memory with base addr 0x7f549df55000 and size of 0x1 pages
Jan 04 00:01:00 confucius kernel: Cannot map memory with base addr 0x7f549df54000 and size of 0x1 pages
Jan 04 00:01:00 confucius kernel: Cannot map memory with base addr 0x7f549df53000 and size of 0x1 pages
Jan 04 00:01:00 confucius kernel: Cannot map memory with base addr 0x7f549df52000 and size of 0x1 pages
Jan 04 00:01:00 confucius kernel: Cannot map memory with base addr 0x7f549df51000 and size of 0x1 pages
Jan 04 00:01:51 confucius kernel: Cannot map memory with base addr 0x7f549df50000 and size of 0x1 pages
Jan 04 00:01:51 confucius kernel: Cannot map memory with base addr 0x7f549df4f000 and size of 0x1 pages
Jan 04 00:01:51 confucius kernel: Cannot map memory with base addr 0x7f549df4e000 and size of 0x1 pages
Jan 04 00:01:51 confucius kernel: Cannot map memory with base addr 0x7f549df4d000 and size of 0x1 pagesLast edited by mwigzell (2023-01-06 22:32:09)
Offline
Please provide us complete log(s) / output(s).
What kernel are you on? What version? What nVIDIA drivers are you using?
Cannot map memory with base addr BLAHBLAH and size of BLAHBLAH pagesI can't remember where I've already saw it...
<49,17,III,I> Fama di loro il mondo esser non lassa;
<50,17,III,I> misericordia e giustizia li sdegna:
<51,17,III,I> non ragioniam di lor, ma guarda e passa.
Offline
That service is for stuff that's in /etc/modules-load.d/ what do you have there? In any case yes, post a complete journal, you can also use journalctl -b -2 and so forth to check the journal two boots ago and the like.
Just from that error this also might read like BIOS emulation bugs on buggy UEFIs, What's your situation here? What's your system (mainboard/laptop model), how did you install Arch? EFI or BIOS mode?
Offline
I am using Legacy BIOS boot. No UEFI. This problem is intermittent, not constant.
uname -a
Linux confucius 6.1.1-arch1-1 #1 SMP PREEMPT_DYNAMIC Wed, 21 Dec 2022 22:27:55 +0000 x86_64 GNU/Linuxsudo dmidecode -t 2
# dmidecode 3.4
Getting SMBIOS data from sysfs.
SMBIOS 3.2.0 present.
Handle 0x0002, DMI type 2, 15 bytes
Base Board Information
Manufacturer: ASUSTeK COMPUTER INC.
Product Name: ROG STRIX Z490-E GAMING
Version: Rev 1.xx
Serial Number: 201278698901574
Asset Tag: Default string
Features:
Board is a hosting board
Board is replaceable
Location In Chassis: Default string
Chassis Handle: 0x0003
Type: Motherboard
Contained Object Handles: 0The journal entries were only there whilst I rescued my system, since systemd just failed badly after that. The above shows the bad journal entries for the modules that failed. (I copied them down).
There is nothing in /etc/modules-load.d.
I am using the automatic pacman -Syu update to my Nvidia drivers and kernel:
Nvidia drivers: 525.60.11
The modules that fail to load intermittently, as logged in that lost journal, are crypto_user, sg, nvidia_uvm.
This problem was not in my installation originally, but crept in at some point when I was tweaking things. I'm not sure it was to do with Nvidia drivers at all.
My feeling is that there is some kind of race condition. Is there a way to attach a file? I could attach a complete successful boot. It has some oddities in it for sure.
ok, I was able to get a journal log for the failed boot. How to upload to you?
Last edited by mwigzell (2023-01-05 20:10:19)
Offline
So this can definitely be an issue here. Boot a live disk in EFI mode and install/setup a bootloader in EFI mode and boot your system in EFI mode. Many mainboards do not actually have a real tested or working BIOS emulation anymore.
As for attaching files, not directly, use a pastebin service: https://wiki.archlinux.org/title/List_o … n_services
Assuming it is an issue of a plain race condition you can do something about on the linux side while retaining in "BIOS mode", the most likely candidate is KMS/late loading of the graphics kernel modules: https://wiki.archlinux.org/title/Kernel … _KMS_start
But seriously, you board is modern, there's no telling how well tested and implemented it is with BIOS mode and you are basically gimping a lot of modern features by trying to rely on it.
Last edited by V1del (2023-01-05 17:29:53)
Offline
Yes, I want to move to EFI. However as I keep trying to indicate: This problem arrived as a result of a tweak, the board worked just fine in BIOS legacy mode until then.
Frankly I'm scared of EFI, probably its not so bad once I go through the steps, but it will require updates to my grub boot scheme for sure.
I had a journal log, but when I got my system back, it wan't there in "/root" I guess I was on some virtual disk somewhere. So I lost it again. Sorry. I will get it next time.
I looked into the reference to KMS_start. Please see this:
pwd
/etc/pacman.d/hooks
mark@confucius:hooks$ ls -al
total 16
drwxr-xr-x 2 root root 4096 Feb 5 2021 .
drwxr-xr-x 4 root root 4096 Jan 1 11:17 ..
-rw-r--r-- 1 root root 395 Feb 5 2021 '\'
-rw-r--r-- 1 root root 395 Feb 5 2021 nvidia.hookI don'tknow wht that entry is doing there '\', is it normal?
Last edited by mwigzell (2023-01-05 21:08:18)
Offline
Well you need to know what you "tweaked". What did you do? The journal will not be in /root but /var/log/journal but the physical presence of this should be largely irrelevant to you, use
sudo journalctl -b-1for the boot prior to your current one and so forth.
At least the mapping failures in BIOS mode are quite indicative of potential issues with the BIOS emulation.
Last edited by V1del (2023-01-05 20:52:56)
Offline
Pls. see edit above regarding the hooks.
Sorry to be such a dunce, I have lived with this issue for at least a year.
I don't know what caused it, I was trying to get nvidia drivers to install with the "pacman -Syu", that was the last major thing I did.
I get it that BIOS failures can cause issues. But this problem was not there with my current BIOS originally. It crept in without my noticing just what change caused it. It was too late to back out, because it is INTERMITTENT.
When I get back from such a crash as above, the journal is MISSING. (I got a TTY during the fail, and saved it off but that didn't work either because I wasn't on my normal root drive I guess)
What is that hook file (see above): '\' ?
Last edited by mwigzell (2023-01-05 21:17:52)
Offline
I discovered why the journal was missing: There were two partitions with same label, and systemd assigned root to the wrong one, due to the errors it had during boot. I would consider this some kind of systemd bug, since I use GUIDs for my grub.cfg etc. I fixed that partition label. Not sure yet if this fixes any boot problem. But I was able to get a new journal entry with the latest bad boot, you can plainly see the "Function not implemented" messages in it.
See: http://0x0.st/oR5A.txt
grep "Function not implemented" <journal_1
Jan 06 08:37:33 confucius systemd-modules-load[394]: Failed to look up module alias 'crypto_user': Function not implemented
Jan 06 08:37:33 confucius systemd-modules-load[394]: Failed to look up module alias 'sg': Function not implemented
Jan 06 08:37:33 confucius systemd-modules-load[394]: Failed to look up module alias 'nvidia-uvm': Function not implementedAlso I'm convinced now I may have made a typo with that hook? It seems suspicious, like a finger fumble:
-rw-r--r-- 1 root root 395 Feb 5 2021 '\'Last edited by mwigzell (2023-01-06 17:21:57)
Offline
Yes that is nothing remove it. If your labels were the same then this basically answers the question and isn't a systemd bug but the kernel simply picking whichever device it sees first for matching tokens, which is why using unique identifiers for all things related to fstab/static partition mounting is highly reccommended: https://wiki.archlinux.org/title/Persis … ice_naming
Offline
Yes, the dup label occurred because I copied a partition I think.
I thought I was controlling the root partition via grub? Obviously not.
update: I rebooted and got a fairly clean boot. No unmapped memory either. Maybe that is the issue then.Whew!
Here is new journal: http://0x0.st/oR5e.txt
Offline
Copying a partition and dupe label would also mean a dupe UUID.
Online
@Scimmia I don't think that is true. I used "dd" to copy. They are separate partitions, having different UUIDs. The relevant partitions are /dev/sdb6 and /dev/sdb9 which I have now formatted so it no longer contains the label. You can see the UUIDs are different:
sudo blkid
[sudo] password for root:
/dev/nvme0n1p1: LABEL_FATBOOT="ESP" LABEL="ESP" UUID="0D51-DAF0" BLOCK_SIZE="512" TYPE="vfat" PARTLABEL="BOOT" PARTUUID="f68392e0-32b7-4f68-a18b-27ad8f86a7a2"
/dev/sdd1: LABEL="BACKUP_5TB" UUID="ad38cae7-0469-45e0-9cce-123483da4436" BLOCK_SIZE="4096" TYPE="ext4"
/dev/sdb4: PARTUUID="6fe04393-31cf-4e4d-b62c-365d5bd41447"
/dev/sdb2: UUID="23b5d15a-dc24-4d7e-bc5f-ef7b3bb0a75c" TYPE="swap" PARTUUID="be882e2d-d534-4551-9525-75fa93b67197"
/dev/sdb9: UUID="0edd0170-3bd9-48de-9bc8-b53b9e127f7b" BLOCK_SIZE="4096" TYPE="ext4" PARTLABEL="UNUSED_1" PARTUUID="294fa6e5-d39a-4b58-a9e0-edcad7cd753c"
/dev/sdb3: UUID="c2a8fb11-1612-478a-96d3-580c89f46599" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="5ff40e59-316a-4336-9f4f-05dc53d1d5d6"
/dev/sdb1: LABEL="boot" UUID="57183d54-e946-4104-85e7-b7cdca3edd26" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="9b3faad6-8521-493c-8545-c03f6963e9aa"
/dev/sdb8: LABEL="LFS 8.3" UUID="88b7a55e-854b-4e4f-ad6d-0c0d7c9ee0b4" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="c1291794-ce03-4cb6-8ed5-4597cec6593a"
/dev/sdb6: LABEL="ARCH_2021_1" UUID="45616ae4-a8ee-4801-9520-2f2b3724ffae" BLOCK_SIZE="4096" TYPE="ext4" PARTLABEL="ARCH_2021" PARTUUID="ce46d989-c240-42db-9b75-1229dea3fc1d"
/dev/sdc2: LABEL="VAR" UUID="310d82db-f95a-4538-a610-7300159dfa7c" BLOCK_SIZE="4096" TYPE="ext4" PARTLABEL="VAR" PARTUUID="9ce74ef1-324c-4375-83e9-e742e8ebaace"
/dev/sdc1: LABEL="SCRATCH" UUID="5e78efd1-989a-44ef-bc04-cb19a5280dfc" BLOCK_SIZE="4096" TYPE="ext4" PARTLABEL="SCRATCH" PARTUUID="40bd903b-af76-4d6a-89c3-e1b1cea16d55"
/dev/sda1: LABEL="HOME" UUID="0e759725-e822-48f7-ad2e-43eb44120e72" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="00092b70-01"I have been able to boot no problem since I fixed the label issue, I think that solves it. Marking as SOLVED.
Offline
If you formatted it, that will change the UUID. It's a identifier for the filesystem.
Online