You are not logged in.

#1 2019-01-03 17:52:55

archcry
Member
Registered: 2018-02-23
Posts: 20

IOMMU causing issues with newer kernel versions

Hello,

I've been having crashes on my PC since november and at first I thought this was caused by the ext4 corruption bug in the kernel. This issue was solved in 4.20 and backported to the 4.19-8 kernel release. In the mean time I decided to use the linux-lts kernel instead of the linux kernel. Recently the lts kernel was also updated to the 4.19 release but unfortunately the crashes did not stop. So I decided to start investigating the issue and it seems like iommu is causing some problems for my system. At first it starts with an IOMMU mapping error saying "IOMMU mapping error in map_sg (io-pages: 5)" followed by my filesystem going into read only mode causing a complete lock up of the system. I decided to disable IOMMU in the BIOS and I haven't had any issues since. The strange part about this is that the crashes started to manifest since 4.19 when 4.18 was working just fine. I have searched the internet but I haven't found a single report stating the same issue. I do want to report this issue properly but I guess I do need some more information on what exactly is causing the issue before filing a bug report. I have posted a screenshot below which shows the dmesg messages I had. I now re-enabled IOMMU in the bios and used the amd_iommu=off kernel flag to turn of iommu for arch. What do you guys suggest I should do with this problem?

Screenshot with dmesg: https://i.imgur.com/frF3Onu.jpg

System:
Motherboard: Asus Prime X370-Pro (latest bios)
CPU: AMD Ryzen 7 1700X (non-bugged version)
Graphics: MSI GeForce GFX 970 Gaming 4G
RAM: Corsair Vengeance LPX CMK16GX4M2B3200C16
Storage: Samsung 850 EVO 1TB

~ Archcry

Last edited by archcry (2019-01-03 17:56:19)

Offline

#2 2019-01-03 18:03:57

loqs
Member
Registered: 2014-03-06
Posts: 17,323

Re: IOMMU causing issues with newer kernel versions

If no one else is experiencing the issue and it has passed through two kernel releases I would suggest bisecting between 4.18 and 4.19 to find the causal commit and reporting it upstream.
Please do not post images of text Code_of_conduct#Pasting_pictures_and_code.  please just post the output of dmesg as text in code tags or a link to a pastebin.
If you need help with the bisection process please do not hesitate to ask.

Offline

#3 2019-01-03 18:08:05

archcry
Member
Registered: 2018-02-23
Posts: 20

Re: IOMMU causing issues with newer kernel versions

Ye but posting the output of dmesg is kind of hard when it puts the filesystem in readonly mode because it would be impossible to write it to disk then. I do need help for the bisection though :S I have no previous experience with that kind of stuff unfortunately. I do have git skills though.

Last edited by archcry (2019-01-03 18:10:55)

Offline

#4 2019-01-03 18:42:36

loqs
Member
Registered: 2014-03-06
Posts: 17,323

Re: IOMMU causing issues with newer kernel versions

Can you try booting 4.18 with the kerel parameter scsi_mod.use_blk_mq=Y or 4.19 with scsi_moduse_blk_mq=N.  This is testing if it was a config change between 4.18 and 4.19 that exposed an already existing issue.
To ensure you will not be hit by the corruption issue make sure that /sys/block/*/queue/scheduler shows that a scheduler is always in use with scsi_mod.use_blk_mq=Y
When the filesystem is readonly can you use the method from the tip box of List_of_applications#Pastebin_clients to save dmesg?

dmesg | curl -F c=@- https://ptpb.pw 

Last edited by loqs (2019-01-03 18:42:48)

Offline

#5 2019-01-03 19:18:43

archcry
Member
Registered: 2018-02-23
Posts: 20

Re: IOMMU causing issues with newer kernel versions

Thank you for the instructions, I just don't understand the following part about the scheduler being active:

loqs wrote:

To ensure you will not be hit by the corruption issue make sure that /sys/block/*/queue/scheduler shows that a scheduler is always in use with scsi_mod.use_blk_mq=Y

I tried to use cat on my working system and got the following output, is this what I should see when booting the "faulty" kernel with the kernel param? And should I monitor this continuously with the watch command or something?

$ cat /sys/block/*/queue/scheduler
none
none
none
[mq-deadline] kyber bfq none
[mq-deadline] kyber bfq none

Also, I assume that I am going to remove the amd_iommu flag during this debugging process?

Last edited by archcry (2019-01-03 19:23:33)

Offline

#6 2019-01-03 20:00:19

loqs
Member
Registered: 2014-03-06
Posts: 17,323

Re: IOMMU causing issues with newer kernel versions

Yes the test would be with IOMMU enabled.
The three entries with none for the scheduler should device-mapper devices and the other two have a scheduler so that should be fine.
The corruption issue needs the device to use scsi_mod so that excludes NVME/device-mapper and scsi_mod.use_blk_mq=Y which was changed to the default in 4.19 and no scheduler to be used which has never been the default but I wanted to check.
You do not need to watch those settings they will not change without you manually changing the scheduler.
The error messages should appear in dmesg if the combination is affected.

Offline

#7 2019-01-04 21:01:36

archcry
Member
Registered: 2018-02-23
Posts: 20

Re: IOMMU causing issues with newer kernel versions

loqs wrote:

Can you try booting 4.18 with the kerel parameter scsi_mod.use_blk_mq=Y or 4.19 with scsi_moduse_blk_mq=N.  This is testing if it was a config change between 4.18 and 4.19 that exposed an already existing issue.
To ensure you will not be hit by the corruption issue make sure that /sys/block/*/queue/scheduler shows that a scheduler is always in use with scsi_mod.use_blk_mq=Y

Ok, so I tried booting kernel 4.19.8 with scsi_mod.use_blk_mq=N and the same issues happened, the system freezes completely after some time. I also tried kernel 4.18 with scsi_mod.use_blk_mq=Y but I did not have any issues that time.

loqs wrote:

When the filesystem is readonly can you use the method from the tip box of List_of_applications#Pastebin_clients to save dmesg?

dmesg | curl -F c=@- https://ptpb.pw 

I tried using curl to upload the dmesg but even that resulted in an io error, so as far as I know there is no good way to get the complete dmesg in here without taking a picture. I salvaged the dmesg from last boot but it is incomplete. It does show the first errors but there were some ext4 errors in the dmesg that could not be written to disk. They were roughly the same as the ones posted in the picture in the first post. Here is the content I could salvage in plain text:

https://ptpb.pw/BxLW

Last edited by archcry (2019-01-04 21:05:16)

Offline

#8 2019-01-05 14:59:38

loqs
Member
Registered: 2014-03-06
Posts: 17,323

Re: IOMMU causing issues with newer kernel versions

The following assumes the base-devel group and git are installed and I recomend enabling Makepkg#Parallel_compilation to reduce build times

$ git clone git://git.archlinux.org/svntogit/packages.git --single-branch --branch "packages/linux"
$ cd packages/trunk
$ git checkout 	3a924e2d7781a981654c00a5b911f02d557a7375 #4.18.10.arch1-1
$ cd ../..
$ cp -r packages/trunk linux-git
$ rm -rf packages
$ cd linux-git
# Edit replace the PKGBUILD with the one below
$ makepkg -rsi #this is to confirm 4.18 as built on your system does have the issue update bootloader for new kernel if needed

$ cd linux-git/src/linux
$ git checkout v4.19
$ cd ../..
$ makepkg -ersi #this is to confirm 4.19 as built on your system does not have the issue.  Select the default option for all prompted options

$ cd linux-git/src/linux/
$ git bisect start
$ git bisect good v4.18
$ git bisect bad v4.19
$ cd ../..
$ makepkg -ersi

$ cd linux-git/src/linux/
$ git bisect $result #substitue good or bad here
$ cd ../..
$ makepkg -ersi #repeat these four lines and test the generated kernel until git has found the bad commit

PKGBUILD

# Maintainer: Boohbah <boohbah at gmail.com>
# Contributor: Tobias Powalowski <tpowa@archlinux.org>
# Contributor: Thomas Baechler <thomas@archlinux.org>
# Contributor: Jonathan Chan <jyc@fastmail.fm>
# Contributor: misc <tastky@gmail.com>
# Contributor: NextHendrix <cjones12 at sheffield.ac.uk>

pkgbase=linux-git
_srcname=linux
pkgver=0
pkgrel=1
arch=('x86_64')
url="https://www.kernel.org/"
license=('GPL2')
makedepends=('kmod' 'inetutils' 'bc' 'libelf')
options=('!strip')
source=('git+https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git#tag=v4.18'
        #'git+https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git#tag=vX.X.Y'
  'config'   # the main kernel config file
  '60-linux.hook'  # pacman hook for depmod
  '90-linux.hook'  # pacman hook for initramfs regeneration
  'linux.preset'   # standard config files for mkinitcpio ramdisk
)

sha256sums=('SKIP'
            '1fc23bd2613b821d8bdca1a33dc421e21de296221108ce047176d27d37ce397f'
            'ae2e95db94ef7176207c690224169594d49445e04249d2499e9d2fbc117a0b21'
            '75f99f5239e03238f88d1a834c50043ec32b1dc568f2cc291b07d04718483919'
            'ad6344badc91ad0630caacde83f7f9b97276f80d26a20619a87952be65492c65')

_kernelname=${pkgbase#linux}
: ${_kernelname:=-ARCH}

pkgver() {
  cd "${_srcname}"

  git describe --long | sed -E 's/^v//;s/([^-]*-g)/r\1/;s/-/./g;s/\.rc/rc/'
}

prepare() {
  cd ${_srcname}

  cp -Tf ../config .config

  # set localversion to git commit
  sed -i "s|CONFIG_LOCALVERSION=.*|CONFIG_LOCALVERSION=\"${_kernelname}\"|g" ./.config
  sed -i "s|^.*CONFIG_LOCALVERSION_AUTO.*|CONFIG_LOCALVERSION_AUTO=y|" ./.config

  # don't run depmod on 'make install'. We'll do this ourselves in packaging
#  git tracks scripts/depmod.sh so do not change it when using the existing source dir for bisection
#  sed -i '2iexit 0' scripts/depmod.sh

  # get kernel version
  make prepare

  # load configuration
  # Configure the kernel. Replace the line below with one of your choice.
  #make menuconfig # CLI menu for configuration
  #make nconfig # new CLI menu for configuration
  #make xconfig # X-based configuration
  #make oldconfig # using old config from previous kernel version
  make olddefconfig # old config from previous kernel, defaults for new options
  # ... or manually edit .config
}

build() {
  cd ${_srcname}

  make bzImage modules
}

_package() {
  pkgdesc="The Linux kernel and modules (git version)"
  depends=('coreutils' 'linux-firmware' 'kmod' 'mkinitcpio>=0.7')
  optdepends=('crda: to set the correct wireless channels of your country')
  backup=("etc/mkinitcpio.d/${pkgbase}.preset")
  install=linux.install

  cd ${_srcname}

  # get kernel version
  _kernver="$(make kernelrelease)"
  _kernver=${_kernver%-dirty} #https://bbs.archlinux.org/viewtopic.php?id=236702
  _basekernel="$(make kernelversion)"
  _basekernel=${_basekernel%.*}

  mkdir -p "${pkgdir}"/{boot,usr/lib/modules}
  make INSTALL_MOD_PATH="${pkgdir}/usr" modules_install
  cp arch/x86/boot/bzImage "${pkgdir}/boot/vmlinuz-${pkgbase}"

  # make room for external modules
  local _extramodules="extramodules-${_basekernel}${_kernelname}"
  ln -s "../${_extramodules}" "${pkgdir}/usr/lib/modules/${_kernver}/extramodules"

  # add real version for building modules and running depmod from hook
  echo "${_kernver}" |
    install -Dm644 /dev/stdin "${pkgdir}/usr/lib/modules/${_extramodules}/version"

  # remove build and source links
  rm "${pkgdir}"/usr/lib/modules/${_kernver}/{source,build}

  # now we call depmod...
  depmod -b "${pkgdir}/usr" -F System.map "${_kernver}"

  # add vmlinux
  install -Dt "${pkgdir}/usr/lib/modules/${_kernver}/build" -m644 vmlinux

  # sed expression for following substitutions
  local _subst="
    s|%PKGBASE%|${pkgbase}|g
    s|%KERNVER%|${_kernver}|g
    s|%EXTRAMODULES%|${_extramodules}|g
  "

  # hack to allow specifying an initially nonexisting install file
  sed "${_subst}" "${startdir}/${install}" > "${startdir}/${install}.pkg"
  true && install=${install}.pkg

  # install mkinitcpio preset file
  sed "${_subst}" ../linux.preset |
    install -Dm644 /dev/stdin "${pkgdir}/etc/mkinitcpio.d/${pkgbase}.preset"

  # install pacman hooks
  sed "${_subst}" ../60-linux.hook |
    install -Dm644 /dev/stdin "${pkgdir}/usr/share/libalpm/hooks/60-${pkgbase}.hook"
  sed "${_subst}" ../90-linux.hook |
    install -Dm644 /dev/stdin "${pkgdir}/usr/share/libalpm/hooks/90-${pkgbase}.hook"
}

_package-headers() {
  pkgdesc="Header files and scripts for building modules for Linux kernel (git version)"

  cd ${_srcname}
  local _builddir="${pkgdir}/usr/lib/modules/${_kernver}/build"

  install -Dt "${_builddir}" -m644 Makefile .config Module.symvers
  install -Dt "${_builddir}/kernel" -m644 kernel/Makefile

  mkdir "${_builddir}/.tmp_versions"

  cp -t "${_builddir}" -a include scripts

  install -Dt "${_builddir}/arch/x86" -m644 arch/x86/Makefile
  install -Dt "${_builddir}/arch/x86/kernel" -m644 arch/x86/kernel/asm-offsets.s

  cp -t "${_builddir}/arch/x86" -a arch/x86/include

  install -Dt "${_builddir}/drivers/md" -m644 drivers/md/*.h
  install -Dt "${_builddir}/net/mac80211" -m644 net/mac80211/*.h

  # http://bugs.archlinux.org/task/13146
  install -Dt "${_builddir}/drivers/media/i2c" -m644 drivers/media/i2c/msp3400-driver.h

  # http://bugs.archlinux.org/task/20402
  install -Dt "${_builddir}/drivers/media/usb/dvb-usb" -m644 drivers/media/usb/dvb-usb/*.h
  install -Dt "${_builddir}/drivers/media/dvb-frontends" -m644 drivers/media/dvb-frontends/*.h
  install -Dt "${_builddir}/drivers/media/tuners" -m644 drivers/media/tuners/*.h

  # add xfs and shmem for aufs building
  mkdir -p "${_builddir}"/{fs/xfs,mm}

  # copy in Kconfig files
  find . -name Kconfig\* -exec install -Dm644 {} "${_builddir}/{}" \;

  # add objtool for external module building and enabled VALIDATION_STACK option
  install -Dt "${_builddir}/tools/objtool" tools/objtool/objtool

  # remove unneeded architectures
  local _arch
  for _arch in "${_builddir}"/arch/*/; do
    [[ ${_arch} == */x86/ ]] && continue
    rm -r "${_arch}"
  done

  # remove files already in linux-docs package
  rm -r "${_builddir}/Documentation"

  # remove now broken symlinks
  find -L "${_builddir}" -type l -printf 'Removing %P\n' -delete

  # Fix permissions
  chmod -R u=rwX,go=rX "${_builddir}"

  # strip scripts directory
  local _binary _strip
  while read -rd '' _binary; do
    case "$(file -bi "${_binary}")" in
      *application/x-sharedlib*)  _strip="${STRIP_SHARED}"   ;; # Libraries (.so)
      *application/x-archive*)    _strip="${STRIP_STATIC}"   ;; # Libraries (.a)
      *application/x-executable*) _strip="${STRIP_BINARIES}" ;; # Binaries
      *) continue ;;
    esac
    /usr/bin/strip ${_strip} "${_binary}"
  done < <(find "${_builddir}/scripts" -type f -perm -u+w -print0 2>/dev/null)
}

_package-docs() {
  pkgdesc="Kernel hackers manual - HTML documentation that comes with the Linux kernel (git version)"

  cd ${_srcname}
  local _builddir="${pkgdir}/usr/lib/modules/${_kernver}/build"

  mkdir -p "${_builddir}"
  cp -t "${_builddir}" -a Documentation

  # Fix permissions
  chmod -R u=rwX,go=rX "${_builddir}"
}

pkgname=("${pkgbase}" "${pkgbase}-headers" "${pkgbase}-docs")
for _p in ${pkgname[@]}; do
  eval "package_${_p}() {
    $(declare -f "_package${_p#${pkgbase}}")
    _package${_p#${pkgbase}}
  }"
done

# vim:set ts=8 sts=2 sw=2 et:

Offline

#9 2019-01-08 07:13:24

archcry
Member
Registered: 2018-02-23
Posts: 20

Re: IOMMU causing issues with newer kernel versions

Thanks for the help up untill now, I am afraid I do not have a lot of time in this week so I am going to do this on thursday most likely. I also noticed this commit to the new 5.0 kernel: Merge tag 'iommu-updates-v4.21'. It might also be interesting because it has some AMD specific stuff in it.

Last edited by archcry (2019-01-08 07:13:59)

Offline

#10 2019-03-26 22:59:34

dman777
Member
Registered: 2019-03-26
Posts: 1

Re: IOMMU causing issues with newer kernel versions

I have been getting this issue also. It has been driving me up a wall on a build where everything is new.

However, I am using Ubuntu Xenial LTS with kernel 4.15.0-46-generic.

I thought about building yet another new system, but if the issue is a kernel driver(like I suspected) I would really like to know how this thread goes.

Offline

Board footer

Powered by FluxBB