You are not logged in.

#1 2016-01-13 13:58:00

katsuki
Member
From: NY, USA
Registered: 2015-01-28
Posts: 26

Kernel 4.3.3-2 breaks rEFInd

I've been debugging this for a week (and worked with the author of rEFInd as well trying to track this down), but I have not yet been able to solve the problem.

The Problem:

After upgrading from 4.2.5-1 to 4.3.3-2, rEFInd stops recognizing 1440x900 as a valid resolution. This of course has the knock on effect of making my console resolution abysmal. I have been able to reproduce this in two different Arch set ups.

This does not happen until the system is shutdown with the new kernel and then cold booted. I can reboot over and over again without seeing the issue. Also, if I shutdown instead of rebooting immediately after the upgrade (i.e. 4.2.5-1 is in memory kernel), the problem does not happen until the next shutdown (i.e. 4.3.3-2 is the in memory kernel). Thus, it is clearly something having to do with shutting down 4.3.3-2. I also believe that the trigger isn't actually executed on shutdown but on first kernel load.

While rEFInd in the arch repository is 0.9.2, the most recent version (0.10.1) also exhibits the issue (I manually installed it as one of my tests).

Systems:

Both Arch systems are x86_64, one is a full blown KDE set up, the other is a new build that I was going to use to replace the KDE one. Both are running under VMWare Fusion 8.1.0. I have snapshots for both prior to kernel upgrade so I can easily reproduce this over and over again. Both use SATA based virtual disk controllers (instead of the default SCSI).

I am aware that VMWare does not officially support Arch nor EFI boot for anything other than OS X.

Both systems have the EFI partition (/dev/sda1) tied to /boot/efi (and is not mounted automatically). My kernels live on /dev/sda2 which is mounted at /boot.

Debugging Steps Thus Far:

I have isolated problem package to linux-4.3.3-2 (by doing upgrades one at a time until the problem was demonstrated). The issue is not specific to rEFInd 0.9.2.

Moving my kernels to the EFI partition does not solve the issue.

The author of rEFInd had me do the following tests:

* From a working state, create a backup of the original kernel and its
  initrd (for instance, call it vmlinuz-1 and initramfs-1; using numbers
  in the second kernel and its initrd will cause rEFInd's auto-detection
  to pick them up correctly). Then install the kernel update, shut down,
  and reboot into the OLD kernel. Shut down again. If the problem
  reappears, then it's likely caused by the files or package
  installation itself; but if the problem does not reappear, then it's
  likely caused by shutting down with the new kernel. You can then boot
  into the new kernel, shut down, and see what happens. If the problem
  appears then, I recommend doing another test in which you boot into
  the old kernel, shut down, and see if the problem continues. If it
  does, then that suggests a permanent change caused by the old kernel;
  but if not, then it suggests that both kernels are doing something,
  but the new one is doing it in a way that causes problems and the old
  one in a way that prevents the problem.
* From a working state, install the problematic kernel and boot to it;
  but instead of shutting down normally, kill the VMware process
  (equivalent to pulling the plug on a real computer). This should
  prevent the kernel from doing whatever it does at shutdown, and
  should prevent the problem from occurring. Of course, this isn't
  something you'd want to do regularly; it's just meant to test that
  it's a kernel shutdown operation that's causing problems.
* From a working state, install the problematic kernel and reboot. When
  you select the kernel, hit F2 or Insert twice to edit kernel options
  and add "noefi" to the kernel options. This should disable the
  kernel's ability to "talk" to the EFI, so tools like efibootmgr
  should stop working. My suspicion is that the problem will also go
  away, although I'm not 100% positive of that -- the kernel change
  I'm hypothesizing could be caused by something other than an
  explicitly EFI-related call, although I think that's the most likely
  path for something like this to happen.

My results:

Test 1: This one is inconclusive; booting the old kernel does not result in a usable system due to package hooks that alter systemd components on kernel upgrade (such as udev). I t seems that post-upgrade but with the old kernel, rEFInd is not affected by shutdown. I say this is inconclusive as boot aborted and dropped me to single user mode. When putting the new kernel back in place, shutdown broke rEFInd.

Test 2: EFI breaks. I didn't actually kill VMWare; it has a hidden option (hold down option and select the virtual machine menu) that allows you to pull the plug on a VM. No shutdown anything fires, it just dies. On start up, the problem returns. This leads me to believe that if it isn't the package upgrade (assuming we can trust test #1) that the kernel and/or systemd is doing something on load. Which brings us to test #3.

Test 3: 'noefi' has no effect; shutdown still causes a break.

Additionally, I wrote an efivar dump script that tracked changes to my EFI vars. The only things that changed were MTC on every reboot/cold boot and the MemoryTypeInformation when swapping between a reboot and shutdown/cold boot (R <-> S).

I'm at a loss as to how to debug this further.

Any help would be greatly appreciated.

Last edited by katsuki (2016-01-13 15:18:55)

Offline

#2 2016-01-21 14:05:43

katsuki
Member
From: NY, USA
Registered: 2015-01-28
Posts: 26

Re: Kernel 4.3.3-2 breaks rEFInd

I have tried kernel 4.3.3-3 with the same results as 4.3.3-2. I still haven't figured this out, so any help would be appreciated.

Offline

#3 2016-01-21 16:46:34

ewaller
Administrator
From: Pasadena, CA
Registered: 2009-07-13
Posts: 20,654

Re: Kernel 4.3.3-2 breaks rEFInd

No good advice for you.  Have you considered the 4.4 kernel in testing?


Nothing is too wonderful to be true, if it be consistent with the laws of nature -- Michael Faraday
The shortest way to ruin a country is to give power to demagogues.— Dionysius of Halicarnassus
---
How to Ask Questions the Smart Way

Online

#4 2016-01-25 15:37:55

katsuki
Member
From: NY, USA
Registered: 2015-01-28
Posts: 26

Re: Kernel 4.3.3-2 breaks rEFInd

I'm willing to give it a shot, I'll look into how to set that up and post the result here.

Offline

#5 2016-01-25 15:44:35

katsuki
Member
From: NY, USA
Registered: 2015-01-28
Posts: 26

Re: Kernel 4.3.3-2 breaks rEFInd

Kernel 4.4.0-4 exhibits the same broken behavior. Something is going on with the way the kernel is interacting with the EFI settings, I just don't know how to figure out what it is doing.

Offline

#6 2016-02-02 22:26:57

katsuki
Member
From: NY, USA
Registered: 2015-01-28
Posts: 26

Re: Kernel 4.3.3-2 breaks rEFInd

Same issue with 4.4.1-1. I am going to run the OS in full debug mode (debug as a kernel param). I have the OS fully staged on 4.2.5 with debug ready to do the upgrade to 4.4.1, so I can get before and after logs if need be.

Is there anything I should be looking for in particular?

Offline

#7 2016-02-03 15:03:06

katsuki
Member
From: NY, USA
Registered: 2015-01-28
Posts: 26

Re: Kernel 4.3.3-2 breaks rEFInd

I noticed one odd thing after running a full test run with captured details: rEFInd is choosing initramfs-linux-fallback.img over initramfs-linux.img for the initrd kernel option. While I didn't bother capturing debug details for the non-fallback version, I did run a full test with initrd=initramfs-linux.img specified in /boot/refind_linux.conf and had the same broken behavior; as such I'm certain my log captures are valid.

The logs look innocuous (log output filtered):

Log filtering:

egrep 'efifb|EFI|refind|esi|ESI' journal_full.txt  | grep -v systemd-tmpfiles

4.2.5

Feb 03 09:21:00 kk-archvm kernel: efi: EFI v2.31 by VMware, Inc.
Feb 03 09:21:00 kk-archvm kernel: ACPI: SRAT 0x000000000EF980B4 0008A8 (v02 VMWARE EFISRAT  06040001 VMW  000007CE)
Feb 03 09:21:00 kk-archvm kernel: ACPI: APIC 0x000000000EFAB277 000742 (v02 VMWARE EFIAPIC  06040001 VMW  000007CE)
Feb 03 09:21:00 kk-archvm kernel: ACPI: MCFG 0x000000000EFAB9B9 00003C (v01 VMWARE EFIMCFG  06040001 VMW  000007CE)
Feb 03 09:21:00 kk-archvm kernel: efifb: probing for efifb
Feb 03 09:21:00 kk-archvm kernel: efifb: framebuffer at 0xf0000000, mapped to 0xffffc90001000000, using 5064k, total 5062k
Feb 03 09:21:00 kk-archvm kernel: efifb: mode is 1440x900x32, linelength=5760, pages=1
Feb 03 09:21:00 kk-archvm kernel: efifb: scrolling: redraw
Feb 03 09:21:00 kk-archvm kernel: efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
Feb 03 09:21:00 kk-archvm kernel: fb0: EFI VGA frame buffer device
Feb 03 09:21:00 kk-archvm systemd-fstab-generator[200]: Found entry what=/dev/disk/by-partlabel/ESI where=/boot/efi type=vfat nofail=yes noauto=no
Feb 03 09:21:01 kk-archvm systemd[1]: dev-disk-by\x2dpartlabel-ESI.device: Changed dead -> plugged
Feb 03 09:21:01 kk-archvm kernel: fb: switching to svgadrmfb from EFI VGA
Feb 03 09:21:41 kk-archvm systemd[4117]: dev-disk-by\x2dpartlabel-ESI.device: Changed dead -> plugged
Feb 03 09:27:12 kk-archvm systemd-udevd[245]: device /dev/sda6 closed, synthesising 'change'
Feb 03 09:27:12 kk-archvm systemd[1]: dev-disk-by\x2dpartlabel-ESI.device: Failed to send unit remove signal for dev-disk-by\x2dpartlabel-ESI.device: Transport endpoint is not connected

4.4.1-a (post-reboot)

Feb 03 09:27:26 kk-archvm kernel: efi: EFI v2.31 by VMware, Inc.
Feb 03 09:27:26 kk-archvm kernel: ACPI: SRAT 0x000000000EF980B4 0008A8 (v02 VMWARE EFISRAT  06040001 VMW  000007CE)
Feb 03 09:27:26 kk-archvm kernel: ACPI: APIC 0x000000000EFAB277 000742 (v02 VMWARE EFIAPIC  06040001 VMW  000007CE)
Feb 03 09:27:26 kk-archvm kernel: ACPI: MCFG 0x000000000EFAB9B9 00003C (v01 VMWARE EFIMCFG  06040001 VMW  000007CE)
Feb 03 09:27:26 kk-archvm kernel: efifb: probing for efifb
Feb 03 09:27:26 kk-archvm kernel: efifb: framebuffer at 0xf0000000, mapped to 0xffffc90001000000, using 5064k, total 5062k
Feb 03 09:27:26 kk-archvm kernel: efifb: mode is 1440x900x32, linelength=5760, pages=1
Feb 03 09:27:26 kk-archvm kernel: efifb: scrolling: redraw
Feb 03 09:27:26 kk-archvm kernel: efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
Feb 03 09:27:26 kk-archvm kernel: fb0: EFI VGA frame buffer device
Feb 03 09:27:26 kk-archvm systemd-fstab-generator[227]: Found entry what=/dev/disk/by-partlabel/ESI where=/boot/efi type=vfat nofail=yes noauto=no
Feb 03 09:27:26 kk-archvm kernel: fb: switching to svgadrmfb from EFI VGA
Feb 03 09:27:26 kk-archvm systemd[1]: dev-disk-by\x2dpartlabel-ESI.device: Changed dead -> plugged
Feb 03 09:27:36 kk-archvm systemd[489]: dev-disk-by\x2dpartlabel-ESI.device: Changed dead -> plugged
Feb 03 09:28:58 kk-archvm systemd-udevd[285]: device /dev/sda6 closed, synthesising 'change'
Feb 03 09:28:59 kk-archvm systemd[1]: dev-disk-by\x2dpartlabel-ESI.device: Failed to send unit remove signal for dev-disk-by\x2dpartlabel-ESI.device: Transport endpoint is not connected

4.4.1-b (post-cold boot)

Feb 03 09:29:16 kk-archvm kernel: efi: EFI v2.31 by VMware, Inc.
Feb 03 09:29:16 kk-archvm kernel: ACPI: SRAT 0x000000000EF980B4 0008A8 (v02 VMWARE EFISRAT  06040001 VMW  000007CE)
Feb 03 09:29:16 kk-archvm kernel: ACPI: APIC 0x000000000EFAB277 000742 (v02 VMWARE EFIAPIC  06040001 VMW  000007CE)
Feb 03 09:29:16 kk-archvm kernel: ACPI: MCFG 0x000000000EFAB9B9 00003C (v01 VMWARE EFIMCFG  06040001 VMW  000007CE)
Feb 03 09:29:16 kk-archvm kernel: efifb: probing for efifb
Feb 03 09:29:16 kk-archvm kernel: efifb: framebuffer at 0xf0000000, mapped to 0xffffc90000c00000, using 3072k, total 3072k
Feb 03 09:29:16 kk-archvm kernel: efifb: mode is 1024x768x32, linelength=4096, pages=1
Feb 03 09:29:16 kk-archvm kernel: efifb: scrolling: redraw
Feb 03 09:29:16 kk-archvm kernel: efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
Feb 03 09:29:16 kk-archvm kernel: fb0: EFI VGA frame buffer device
Feb 03 09:29:16 kk-archvm systemd-fstab-generator[228]: Found entry what=/dev/disk/by-partlabel/ESI where=/boot/efi type=vfat nofail=yes noauto=no
Feb 03 09:29:17 kk-archvm systemd[1]: dev-disk-by\x2dpartlabel-ESI.device: Changed dead -> plugged
Feb 03 09:29:17 kk-archvm kernel: fb: switching to svgadrmfb from EFI VGA
Feb 03 09:29:32 kk-archvm systemd[505]: dev-disk-by\x2dpartlabel-ESI.device: Changed dead -> plugged

I have dmesg output for each boot cycle (with kernel debug enabled) that I can share if it helps. I have no idea what to look for in the output.

Thanks.

Last edited by katsuki (2016-02-03 15:08:32)

Offline

#8 2016-02-03 18:27:07

katsuki
Member
From: NY, USA
Registered: 2015-01-28
Posts: 26

Re: Kernel 4.3.3-2 breaks rEFInd

Additional test:

Setting efivars to RO mount as per https://bbs.archlinux.org/viewtopic.php?id=208102 does not prevent the issue from happening. I rebooted with efivars set to RO in /etc/fstab prior to doing the kernel upgrade.

So, whatever is doing this is doing so directly and not via the /sys path/

Offline

#9 2016-02-22 15:46:32

katsuki
Member
From: NY, USA
Registered: 2015-01-28
Posts: 26

Re: Kernel 4.3.3-2 breaks rEFInd

Utilizing the EFI shell, I have isolated what is changed in the NVRAM:

@@ -76,7 +76,7 @@
 Variable NV+RT+BS '378D7B65-8DA9-4773-B6E4-A47826A833E1:RTC' DataSize = 4
   00000000: FF 07 00 00                                      *....*
 Variable NV+RT+BS 'EB704011-1402-11D3-8E77-00A0C969723B:MTC' DataSize = 4
-  00000000: 35 00 00 00                                      *5...*
+  00000000: 37 00 00 00                                      *7...*
 Variable RT+BS 'Efi:BootOptionSupport' DataSize = 4
   00000000: 03 03 00 00                                      *....*
 Variable RT+BS 'Efi:LangCodes' DataSize = D
@@ -106,7 +106,7 @@
 Variable BS 'IPv4Sb:000C29867463' DataSize = 40
   00000000: 18 A0 74 0D 00 00 00 00-03 00 00 00 00 00 00 00  *..t.............*
   00000010: 98 25 4C 0D 00 00 00 00-00 00 00 00 00 00 00 00  *.%L.............*
-  00000020: 98 F0 4B 0D 00 00 00 00-00 00 00 00 00 00 00 00  *..K.............*
+  00000020: 98 00 4C 0D 00 00 00 00-00 00 00 00 00 00 00 00  *..L.............*
   00000030: 18 CC 4B 0D 00 00 00 00-00 00 00 00 00 00 00 00  *..K.............*
 Variable BS 'UDPv4Sb:000C29867463' DataSize = 28
   00000000: 18 DA 6F 0D 00 00 00 00-01 00 00 00 00 00 00 00  *..o.............*

Worth noting is that if I boot into the new kernel (4.4.1-2) for the first time, then reboot, then shutdown the system from rEFInd, then cold boot the problem exhibits itself. Thus, I think that whatever is happening is definitely happening the first time the new kernel is loaded/unloaded (however, I don't know why it only shows up on first cold boot as reboots do not exhibit the issue until the first cold boot; then they never work right).

Offline

#10 2016-02-26 16:21:57

katsuki
Member
From: NY, USA
Registered: 2015-01-28
Posts: 26

Re: Kernel 4.3.3-2 breaks rEFInd

More information:

Before upgrade the kernel, rEFInd displays 29 supported resolution modes (I intentionally broke refind.conf to see what was supported). After the upgrade, it supports 12 resolutions. Very strange.

Offline

Board footer

Powered by FluxBB