You are not logged in.
Pages: 1
Topic closed
Both after the 4.17.6 and 4.17.8 kernel updates, my system tends to lock up during boot, right after 'Starting version 239'.
System info:
System:
Host: umaro Kernel: 4.17.8-1-ARCH x86_64 bits: 64 Console: tty 2
Distro: Arch Linux
Machine:
Type: Desktop Mobo: Gigabyte model: EP45-UD3R serial: <root required>
BIOS: Award v: F12 date: 01/25/2010
CPU:
Topology: Quad Core model: Intel Core2 Quad Q9650 bits: 64 type: MCP
L2 cache: 6144 KiB
Speed: 2666 MHz min/max: 2000/3000 MHz Core speeds (MHz): 1: 2730 2: 2535
3: 2764 4: 2571
Graphics:
Card-1: AMD Curacao XT / Trinidad XT [Radeon R7 370 / R9 270X/370X]
driver: radeon v: kernel
Display: server: No display server data found. Headless machine?
tty: 80x24
Message: Unable to show advanced data. Required tool glxinfo missing.
Audio:
Card-1: Intel 82801JI HD Audio driver: snd_hda_intel
Card-2: AMD Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series]
driver: snd_hda_intel
Card-3: Creative Labs EMU20k1 [Sound Blaster X-Fi Series]
driver: snd_ctxfi
Sound Server: ALSA v: k4.17.8-1-ARCH
Network:
Card-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet
driver: r8169
IF: enp4s0 state: up speed: 1000 Mbps duplex: full mac: 00:1f:d0:d4:3e:ae
IF-ID-1: br-416fd823133a state: up speed: N/A duplex: N/A
mac: 02:42:97:a5:71:8c
IF-ID-2: br-426d78a02f57 state: up speed: N/A duplex: N/A
mac: 02:42:fd:98:c5:8b
IF-ID-3: docker0 state: down mac: 02:42:aa:91:eb:a5
IF-ID-4: veth0f8bd52 state: up speed: 10000 Mbps duplex: full
mac: 7e:fc:50:c0:f2:eb
IF-ID-5: veth4218766 state: up speed: 10000 Mbps duplex: full
mac: 5a:8f:99:e5:fa:6d
IF-ID-6: veth513e6a2 state: up speed: 10000 Mbps duplex: full
mac: 1e:c1:0c:4c:d4:e3
IF-ID-7: veth52b8206 state: up speed: 10000 Mbps duplex: full
mac: 7a:76:60:cc:e6:61
IF-ID-8: vethb9efcc5 state: up speed: 10000 Mbps duplex: full
mac: 3e:1f:c5:94:25:7e
IF-ID-9: vethe00393c state: up speed: 10000 Mbps duplex: full
mac: 7e:62:29:68:5f:15
IF-ID-10: vethe309c34 state: up speed: 10000 Mbps duplex: full
mac: c6:7c:ca:6d:03:50
Drives:
Local Storage: total: 18.43 TiB used: 5.22 TiB (28.3%)
ID-1: /dev/sda vendor: Samsung model: SSD 850 PRO 256GB size: 238.47 GiB
ID-2: /dev/sdb vendor: Toshiba model: HDWE160 size: 5.46 TiB
ID-3: /dev/sdc vendor: Toshiba model: HDWE160 size: 5.46 TiB
ID-4: /dev/sdd vendor: Western Digital model: WD40EFRX-68WT0N0
size: 3.64 TiB
ID-5: /dev/sde vendor: Western Digital model: WD40EFRX-68WT0N0
size: 3.64 TiB
RAID:
Device-1: storage type: zfs status: ONLINE size: 9.06 TiB free: 3.85 TiB
array-1: mirror status: ONLINE size: 3.62 TiB free: 1.44 TiB Components:
online: N/A
array-2: mirror status: ONLINE size: 5.44 TiB free: 2.41 TiB Components:
online: N/A
Partition:
ID-1: / size: 62.75 GiB used: 11.67 GiB (18.6%) fs: ext4 dev: /dev/sda1
ID-2: /home size: 162.86 GiB used: 1.81 GiB (1.1%) fs: ext4 dev: /dev/sda3
ID-3: swap-1 size: 8.00 GiB used: 0 KiB (0.0%) fs: swap dev: /dev/sda2
Sensors:
System Temperatures: cpu: 45.0 C mobo: 35.0 C gpu: radeon temp: 40 C
Fan Speeds (RPM): cpu: 1503 fan-2: 0 fan-3: 0
Info:
Processes: 253 Uptime: 26m Memory: 7.79 GiB used: 6.27 GiB (80.5%)
Init: systemd Shell: bash inxi: 3.0.18
I managed to make it boot this time by deleting the intel-ucode image from the initrd line. It's currently running ucode version 0xa07, and the image appears to provide version 0xa0b, while Intel's own information charts show the latest to be 0xa0e. Intel's own update guidance lies and says the CPU is ID 0x10677 with microcode 0x70d, but I know in fact that it is 0x1067a, which their chart says has microcode up to 0xa0e.
Microcode check with iucode_tool shows the following:
$ bsdtar -Oxf /boot/intel-ucode.img | iucode_tool -tb -lS -
iucode_tool: system has processor(s) with signature 0x0001067a
microcode bundle 1: (stdin)
selected microcodes:
001/112: sig 0x00010676, pf_mask 0x80, 2010-09-29, rev 0x060f, size 4096
001/113: sig 0x00010676, pf_mask 0x40, 2010-09-29, rev 0x060f, size 4096
001/114: sig 0x00010676, pf_mask 0x10, 2010-09-29, rev 0x060f, size 4096
001/115: sig 0x00010676, pf_mask 0x04, 2010-09-29, rev 0x060f, size 4096
001/116: sig 0x00010676, pf_mask 0x01, 2010-09-29, rev 0x060f, size 4096
001/117: sig 0x00010677, pf_mask 0x10, 2010-09-29, rev 0x070a, size 4096
001/118: sig 0x0001067a, pf_mask 0xa0, 2010-09-28, rev 0x0a0b, size 8192
001/119: sig 0x0001067a, pf_mask 0x44, 2010-09-28, rev 0x0a0b, size 8192
001/120: sig 0x0001067a, pf_mask 0x11, 2010-09-28, rev 0x0a0b, size 8192
I have no idea which of those pf_masks applies to my CPU.
For now, I am uninstalling the intel-ucode package, until such time as I can determine the safety of using it on a machine this old.
Last edited by kode54 (2018-08-22 02:52:12)
Offline
Looks like 4.18.3 is doing this whether or not the intel-ucode package is installed. And now, after about a minute or two, some crap about rcu_preempt detecting a stall in some CPU tasks. I can't dump a log since the machine is locked up, so I would have to photograph the monitor and transcribe it by hand.
Offline
Forum topic here covers this issue:
https://bbs.archlinux.org/viewtopic.php?id=239672
Has more information, but no long term solution.
Offline
I've had a similar problem, and I posted my solution here...
Offline
Quote from my github post...
I've had the same problem for months now with ArchLinux 4.18.16 (was driving me mad), but finally I could solve it after many many tests and debugging. The problem comes somehow from the `systemd-fsck@dev-sdxy.service` (xy = a1, a2, ..., b1, b2, ...). So I did following:
1.) I disabled checking my root file system at boot time by adding 'fsck.mode=skip' to the bootloader's kernel command line...
# vi /boot/grub/grub.cfg
...
menuentry "Linux, ..." {
linux /boot/vmlinuz-linux root=... fsck.mode=skip
2.) I disabled all boot-time fsck by setting the pass parameter (6th column) in /etc/fstab to 0...
# vi /etc/fstab
...
/dev/sda1 / ext4 rw,relatime,data=ordered 0 1
/dev/sda2 none swap defaults 0 0
/dev/sda3 /data0 ext4 defaults 0 0
/dev/sdb1 /data1 ext4 defaults 0 0
/dev/sdc1 /data2 ext4 defaults 0 0
3.) I avoid starting X (startx) automatically, but I boot into the text console only
4.) I updated ~/.bashrc such, that the main data disks are check after I logged in...
# vi /etc/sudoers (SHIFT + G, ESC + I)
...
Cmnd_Alias CMDS1 = /usr/bin/mount,/usr/bin/umount,/usr/bin/fsck
myusername ALL = (root) NOPASSWD: CMDS1
# vi /home/myusername/.bashrc (SHIFT + G, ESC + I)
...
if [ "$DISPLAY" = "" ]
then
echo " "
sudo umount /dev/sdb1
sudo fsck -y /dev/sdb1
sudo umount /dev/sdc1
sudo fsck -y /dev/sdc1
sudo mount -a
echo " "
fi
That's it!
So now, after I logged in to the text console, I first watch all messages to be OK, then I type startx to get into my desktop manager (XFCE4). From there I log out (not shut down), when finished with my work. Then I type shutdown now on the text console to shutdown the system.
Offline
This really doesn't sound like a generally good idea or suggestion, nor really related to the original issue. Boot time fscks can be quite important, you should find the root cause. Did you mess with your mkinitcpio config?
That said, going from the last post, this concerned a well known timing issue with early 4.18 kernels, that does not have a relation to the issue you are currently seeing. If you'd like to follow this further, please open up your own thread.
Closing.
Online
Pages: 1
Topic closed