Kernel 5.8.8 fails to recognize software RAID

Nuckal777 · 2020-09-11 16:46:58

Hello ,

after upgrading to linux 5.8.8.arch1-1 my system starts up in emergency mode, because it is unable to mount partitions that are located on my RAID 0. The respective systemd units fail with a timeout.

Sep 11 20:18:01 GaliusBonus systemd[1]: dev-disk-by\x2duuid-64C646AEC6467FF2.device: Job dev-disk-by\x2duuid-64C646AEC6467FF2.device/start failed with result 'timeout'.
Sep 11 20:18:01 GaliusBonus systemd[1]: mnt-raid.mount: Job mnt-raid.mount/start failed with result 'dependency'.
Sep 11 20:18:01 GaliusBonus systemd[1]: Dependency failed for /mnt/raid.
Sep 11 20:18:01 GaliusBonus systemd[1]: Timed out waiting for device /dev/disk/by-uuid/64C646AEC6467FF2.
Sep 11 20:18:01 GaliusBonus systemd[1]: dev-disk-by\x2duuid-64C646AEC6467FF2.device: Job dev-disk-by\x2duuid-64C646AEC6467FF2.device/start timed out.
Sep 11 20:18:01 GaliusBonus systemd[1]: dev-disk-by\x2duuid-19333c8d\x2dc400\x2d4123\x2d89d5\x2d6590c3a04ea1.device: Job dev-disk-by\x2duuid-19333c8d\x2dc400\x2d4123\x2d89d5\x2d6590c3a04ea1.device/start failed with result 'timeout'.

All physical harddrives are listed when using "fdisk -l". Just the raid "md" device is missing. Downgrading to kernel version 5.8.7 fixed the issue. I assume something changed in the kernel regarding software RAIDs. I am also dual-booting Windows and the RAID works perfectly over there. What am I missing or how can I troubleshoot this further?

Asking mdadm about the RAID platform it reports the same in both kernel versions.

mdadm --detail-platform
       Platform : Intel(R) Rapid Storage Technology
        Version : 11.1.0.1413
    RAID Levels : raid0 raid1 raid10 raid5
    Chunk Sizes : 4k 8k 16k 32k 64k 128k
    2TB volumes : supported
      2TB disks : supported
      Max Disks : 6
    Max Volumes : 2 per array, 4 per controller
 I/O Controller : /sys/devices/pci0000:00/0000:00:1f.2 (SATA)
          Port5 : /dev/sdd (S1D5NSCF342734B)
          Port3 : /dev/sdb (S3Z2NB0M525985W)
          Port4 : /dev/sdc (Z1DB5B6N)
          Port2 : /dev/sda (Z1DB4Q7R)
          Port0 : - no device attached -
          Port1 : - no device attached -

Thank you in advance.

Shinigami92 · 2020-09-11 21:03:30

I had the same issue

But because I experimented too much with different raid settings, I've now completely broken it and (I think) lost a VM image that I didn't have a backup of

Edit: I also opened a followup topic for my problem: https://bbs.archlinux.org/viewtopic.php … 1#p1925781

Last edited by Shinigami92 (2020-09-11 21:55:52)

loqs · 2020-09-11 21:19:43

There are a few dm related commits in https://cdn.kernel.org/pub/linux/kernel … eLog-5.8.8

If you revert them one by one or by bisecting between 5.8.7 and 5.8.8 can you locate the causal commit?

Garzet · 2020-09-12 09:06:09

I had the same issue. Thanks for pointing me to downgrading the kernel.

I'll keep checking here to see when it is safe to upgrade, as I don't think I'm skilled enough to understand kernel commit jargon.

Last edited by Garzet (2020-09-12 09:06:35)

loqs · 2020-09-12 10:34:21

Follow Arch_Build_System#Retrieve_PKGBUILD_source to obtain the PKGBUILD and config.
Change the prepare function of the PKGBUILD to the following:

prepare() {
  cd $_srcname

  echo "Setting version..."
  scripts/setlocalversion --save-scmversion
  echo "-$pkgrel" > localversion.10-pkgrel
  echo "${pkgbase#linux}" > localversion.20-pkgname

  git revert -n 4469ea5972ab9c3064af6dcc0d76c1dfa6bb7913
  git revert -n b3c76fdbb11988c5775b684980aabc02886e5d41
  git revert -n d02a33a248258cc0c2803f7af318ddcd8d83ba16
  git revert -n 0a495d145f59939cba68849a721e6cf27babce34
  git revert -n 372236a01bc548c3a0fdb02eb362144a3b10a233

  local src
  for src in "${source[@]}"; do
    src="${src%%::*}"
    src="${src##*/}"
    [[ $src = *.patch ]] || continue
    echo "Applying patch $src..."
    patch -Np1 < "../$src"
  done

  echo "Setting config..."
  cp ../config .config
  make olddefconfig

  make -s kernelrelease > version
  echo "Prepared $pkgbase version $(<version)"
}

This reverts some dm commits. Enable parallel compilation to reduce build time. Build the package and test if the issue is still present.

Nuckal777 · 2020-09-12 21:03:30

I did build the kernel with the patches provided by loqs reverted, but the issue stayed the same.
I will try to revert the other dm related commits soon.

loqs · 2020-09-12 21:18:02

The following will add more commits and only rebuild what has changed (should now include all block commits but not NVME)

  cd src/archlinux-linux
  git revert -n bf8fe7b755c2ccdf8fd739ad71dd0d035588511a
  git revert -n 3c761332597d1dc3bc527ba5924f300dc43ae9a2
  git revert -n 70d22582c3eb6d50c30574019777d546fbd5cc81
  git revert -n dea6f05d372a2117b581e17a3638a72d696ac6aa
  git revert -n e37bc36aaff38fdf8fafc52bc88ad98ed1ff7a88
  git revert -n 329c9ffc81cfb985c6d131e94e6d220d7c1b19ca
  git revert -n a7a42c1e5023cdac2bbc1038689509595d279cd2
  git revert -n b7df98a8b7b8abce596e9696d5c3183fc4c0019d
  git revert -n 692d0626557451c4b557397f20b7394b612d0289
  cd ../..
  makepkg -e

Nuckal777 · 2020-09-12 22:53:45

Reverting the additional patches fixed the issue for me.

loqs · 2020-09-12 23:07:14

So it was one of those nine extra commits. The following reapplies four of them.

cd src/archlinux-linux
git cherry-pick -n 692d0626557451c4b557397f20b7394b612d0289
git cherry-pick -n b7df98a8b7b8abce596e9696d5c3183fc4c0019d
git cherry-pick -n a7a42c1e5023cdac2bbc1038689509595d279cd2
git cherry-pick -n 329c9ffc81cfb985c6d131e94e6d220d7c1b19ca
cd ../..
makepkg -e

Then repeat reverting or cherry-picking as needed until you have a single commit left as the cause.
Edit:
If anyone else wants to test please try the following change to the PKGBUILD which reverts the five commits Nuckal777 would not reapplying:

prepare() {
  cd $_srcname

  echo "Setting version..."
  scripts/setlocalversion --save-scmversion
  echo "-$pkgrel" > localversion.10-pkgrel
  echo "${pkgbase#linux}" > localversion.20-pkgname

  git revert -n bf8fe7b755c2ccdf8fd739ad71dd0d035588511a
  git revert -n 3c761332597d1dc3bc527ba5924f300dc43ae9a2
  git revert -n 70d22582c3eb6d50c30574019777d546fbd5cc81
  git revert -n dea6f05d372a2117b581e17a3638a72d696ac6aa
  git revert -n e37bc36aaff38fdf8fafc52bc88ad98ed1ff7a88

  local src
  for src in "${source[@]}"; do
    src="${src%%::*}"
    src="${src##*/}"
    [[ $src = *.patch ]] || continue
    echo "Applying patch $src..."
    patch -Np1 < "../$src"
  done

  echo "Setting config..."
  cp ../config .config
  make olddefconfig

  make -s kernelrelease > version
  echo "Prepared $pkgbase version $(<version)"
}

Last edited by loqs (2020-09-13 00:23:33)

Shinigami92 · 2020-09-13 11:18:39

I failed with my other topic and falled back to using linux-lts now
I reformatted my raid and lost my VM, and will install it from scratch ¯\_(ツ)_/¯

Hope this bug will be fixed soon

loqs · 2020-09-13 14:38:57

Please test cherry-picking https://git.kernel.org/pub/scm/linux/ke … 5957a60e1a

Nuckal777 · 2020-09-13 16:14:42

After cherry-picking the raid is not recognized. So one of the following commits should be the cause.

692d0626557451c4b557397f20b7394b612d0289
b7df98a8b7b8abce596e9696d5c3183fc4c0019d
a7a42c1e5023cdac2bbc1038689509595d279cd2
329c9ffc81cfb985c6d131e94e6d220d7c1b19ca

loqs wrote:

Please test cherry-picking https://git.kernel.org/pub/scm/linux/ke … 5957a60e1a

Ontop of a clean 5.8.8 build?

loqs · 2020-09-13 16:24:46

Yes on a clean 5.8.8 that will be the only needed change.

Nuckal777 · 2020-09-13 18:12:37

Commit 692d0626557451c4b557397f20b7394b612d0289 is causing the issue on my machine. I will try to test the 5.8.8 + cherry-pick 88ce2a530cc9865a894454b2e40eba5957a60e1a tomorrow.

loqs · 2020-09-13 18:18:56

~~If it is 692d0626557451c4b557397f20b7394b612d0289 88ce2a530cc9865a894454b2e40eba5957a60e1a will not fix it. Try just reverting 692d0626557451c4b557397f20b7394b612d0289 on 5.8.8 to confirm.~~
Ignore that I got confused between the backport commit IDs and the upstream commit IDs.
https://bugs.archlinux.org/task/67891
Nuckal777 thank you for all the testing.
Edit:
Fixed in linux 5.8.9.arch2-1

Last edited by loqs (2020-09-14 02:09:25)

Arch Linux

#1 2020-09-11 16:46:58

Kernel 5.8.8 fails to recognize software RAID

#2 2020-09-11 21:03:30

Re: Kernel 5.8.8 fails to recognize software RAID

#3 2020-09-11 21:19:43

Re: Kernel 5.8.8 fails to recognize software RAID

#4 2020-09-12 09:06:09

Re: Kernel 5.8.8 fails to recognize software RAID

#5 2020-09-12 10:34:21

Re: Kernel 5.8.8 fails to recognize software RAID

#6 2020-09-12 21:03:30

Re: Kernel 5.8.8 fails to recognize software RAID

#7 2020-09-12 21:18:02

Re: Kernel 5.8.8 fails to recognize software RAID

#8 2020-09-12 22:53:45

Re: Kernel 5.8.8 fails to recognize software RAID

#9 2020-09-12 23:07:14

Re: Kernel 5.8.8 fails to recognize software RAID

#10 2020-09-13 11:18:39

Re: Kernel 5.8.8 fails to recognize software RAID

#11 2020-09-13 14:38:57

Re: Kernel 5.8.8 fails to recognize software RAID

#12 2020-09-13 16:14:42

Re: Kernel 5.8.8 fails to recognize software RAID

#13 2020-09-13 16:24:46

Re: Kernel 5.8.8 fails to recognize software RAID

#14 2020-09-13 18:12:37

Re: Kernel 5.8.8 fails to recognize software RAID

#15 2020-09-13 18:18:56

Re: Kernel 5.8.8 fails to recognize software RAID

Board footer