You are not logged in.

#1 2017-08-07 17:57:55

Al.Piotrowicz
Member
Registered: 2017-08-07
Posts: 116

[SOLVED] kernel 4.12.4-1 ata bug

During normal desktop operation bug apperars randomly affecting random ata HDDs :

sie 07 17:52:25 testowy kernel: ata3.00: exception Emask 0x0 SAct 0xc00 SErr 0x0 action 0x6 frozen
sie 07 17:52:25 testowy kernel: ata3.00: failed command: WRITE FPDMA QUEUED
sie 07 17:52:25 testowy kernel: ata3.00: cmd 61/30:50:20:01:b2/04:00:3e:00:00/40 tag 10 ncq dma 548864 out
                                         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
sie 07 17:52:25 testowy kernel: ata3.00: status: { DRDY }
sie 07 17:52:25 testowy kernel: ata3.00: failed command: WRITE FPDMA QUEUED
sie 07 17:52:25 testowy kernel: ata3.00: cmd 61/70:58:50:05:b2/04:00:3e:00:00/40 tag 11 ncq dma 581632 out
                                         res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
sie 07 17:52:25 testowy kernel: ata3.00: status: { DRDY }
sie 07 17:52:25 testowy kernel: ata3: hard resetting link
sie 07 17:52:35 testowy kernel: ata3: softreset failed (device not ready)
sie 07 17:52:35 testowy kernel: ata3: hard resetting link
sie 07 17:52:45 testowy kernel: ata3: softreset failed (device not ready)
sie 07 17:52:45 testowy kernel: ata3: hard resetting link
sie 07 17:52:55 testowy kernel: ata3: link is slow to respond, please be patient (ready=0)
sie 07 17:53:20 testowy kernel: ata3: softreset failed (device not ready)
sie 07 17:53:20 testowy kernel: ata3: limiting SATA link speed to 1.5 Gbps
sie 07 17:53:20 testowy kernel: ata3: hard resetting link
sie 07 17:53:25 testowy kernel: ata3: softreset failed (device not ready)
sie 07 17:53:25 testowy kernel: ata3: reset failed, giving up
sie 07 17:53:25 testowy kernel: ata3.00: disabled

Using updated packages from the stable repo. LVM on top of LUKS.

Was the similar issue few years ago so think its sime kind of bug regression. Can put any additional info if necessary.


Thanks for help

Last edited by Al.Piotrowicz (2019-10-04 14:41:45)

Offline

#2 2017-08-07 18:23:34

loqs
Member
Registered: 2014-03-06
Posts: 17,192

Re: [SOLVED] kernel 4.12.4-1 ata bug

Welcome to the arch linux forums Al.Piotrowicz.  What was the kernel version before the upgrade?

Offline

#3 2017-08-07 18:28:45

Al.Piotrowicz
Member
Registered: 2017-08-07
Posts: 116

Re: [SOLVED] kernel 4.12.4-1 ata bug

It was 4.12.3-1, Im updating frequently as I can.

Additionally dont know is it important or not, but like in the case in past I've write in the main post, affected disc is not accesible in bios after soft restart.

After shutdown and power up again its all come back to normal operational state. I remember well it was the same issue few years ago.

Offline

#4 2017-08-07 18:33:08

loqs
Member
Registered: 2014-03-06
Posts: 17,192

Re: [SOLVED] kernel 4.12.4-1 ata bug

Looking at https://cdn.kernel.org/pub/linux/kernel … Log-4.12.4 can not see an obviously related commit to me.
Could you please try bisecting between 4.12.3 and 4.12.4 and find which commit is the cause and report it upstream.

Offline

#5 2017-08-07 19:11:32

Al.Piotrowicz
Member
Registered: 2017-08-07
Posts: 116

Re: [SOLVED] kernel 4.12.4-1 ata bug

Ok, try do my best. Looks like bit challenging for me. Im not much tech guy and never done such bisect stuff.

I post my result here soon I hope.

Offline

#6 2017-08-07 19:21:24

loqs
Member
Registered: 2014-03-06
Posts: 17,192

Re: [SOLVED] kernel 4.12.4-1 ata bug

https://bbs.archlinux.org/viewtopic.php … 5#p1700245 details bisecting the kernel in a bit more detail although that PKGBUILD is aimed at bisecting 4.9 to 4.10.

Offline

#7 2017-08-08 13:19:35

Al.Piotrowicz
Member
Registered: 2017-08-07
Posts: 116

Re: [SOLVED] kernel 4.12.4-1 ata bug

Im a bit confused about the whole procedure. Please narrow me more if you can. What version of linux git code should I use to bisect, most recent one or some other ? Im trying to build it directly from latest rc tree, but dont know is it a right way.

Last edited by Al.Piotrowicz (2017-08-08 13:59:03)

Offline

#8 2017-08-08 14:33:20

loqs
Member
Registered: 2014-03-06
Posts: 17,192

Re: [SOLVED] kernel 4.12.4-1 ata bug

Please try this PKGBUILD and the instructions from the link I posted previously substituting 4.12.3 for 4.9 and 4.12.4 for 4.10 in the commands the PKGBUILD has already been adjusted

# Maintainer: Boohbah <boohbah at gmail.com>
# Contributor: Tobias Powalowski <tpowa@archlinux.org>
# Contributor: Thomas Baechler <thomas@archlinux.org>
# Contributor: Jonathan Chan <jyc@fastmail.fm>
# Contributor: misc <tastky@gmail.com>
# Contributor: NextHendrix <cjones12 at sheffield.ac.uk>

pkgbase=linux-git
_srcname=linux-stable
pkgver=4.12.3.r0.g8f883aa5b661
pkgrel=1
arch=('i686' 'x86_64')
url="http://www.kernel.org/"
license=('GPL2')
makedepends=('xmlto' 'docbook-xsl' 'kmod' 'inetutils' 'bc' 'git' 'libelf')
options=('!strip')
source=('git+https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git#tag=v4.12.3'
        # the main kernel config files
        'config.i686::https://git.archlinux.org/svntogit/packages.git/plain/trunk/config.i686?h=packages/linux&id=e1e9510f6d9d8856e2b36b073be8928a85d77a93' 
        'config.x86_64::https://git.archlinux.org/svntogit/packages.git/plain/trunk/config.x86_64?h=packages/linux&id=e1e9510f6d9d8856e2b36b073be8928a85d77a93'
        '90-linux.hook::https://git.archlinux.org/svntogit/packages.git/plain/trunk/90-linux.hook?h=packages/linux&id=e1e9510f6d9d8856e2b36b073be8928a85d77a93'
        # standard config files for mkinitcpio ramdisk
        "${pkgbase}.preset::https://git.archlinux.org/svntogit/packages.git/plain/trunk/linux.preset?h=packages/linux&id=e1e9510f6d9d8856e2b36b073be8928a85d77a93")
sha256sums=('SKIP'
            'df55887a43dcbb6bd35fd2fb1ec841427b6ea827334c0880cbc256d4f042a7a1'
            'bf84528c592d1841bba0662242f0339a24a1de384c31f28248631e8be9446586'
            '834bd254b56ab71d73f59b3221f056c72f559553c04718e350ab2a3e2991afe0'
            'ad6344badc91ad0630caacde83f7f9b97276f80d26a20619a87952be65492c65')

_kernelname=${pkgbase#linux}

pkgver() {
  cd "${_srcname}"

  git describe --long | sed -E 's/^v//;s/([^-]*-g)/r\1/;s/-/./g;s/\.rc/rc/'
}

prepare() {
  cd "${_srcname}"

  cat "${srcdir}/config.${CARCH}" > ./.config

  # set localversion to git commit
  sed -i "s|CONFIG_LOCALVERSION=.*|CONFIG_LOCALVERSION=\"-${pkgver##*.}\"|g" ./.config
  sed -i "s|CONFIG_LOCALVERSION_AUTO=.*|CONFIG_LOCALVERSION_AUTO=n|" ./.config

  # don't run depmod on 'make install'. We'll do this ourselves in packaging
#  git tracks scripts/depmod.sh so do not change it when using the existing source dir for bisection
#  sed -i '2iexit 0' scripts/depmod.sh

  # get kernel version
  make prepare

  # load configuration
  # Configure the kernel. Replace the line below with one of your choice.
  #make menuconfig # CLI menu for configuration
  #make nconfig # new CLI menu for configuration
  #make xconfig # X-based configuration
  #make oldconfig # using old config from previous kernel version
  make olddefconfig # old config from previous kernel, defaults for new options
  # ... or manually edit .config
}

build() {
  cd "${_srcname}"

  make ${MAKEFLAGS} LOCALVERSION= bzImage modules
}

_package() {
  pkgdesc="The Linux kernel and modules (git version)"
  depends=('coreutils' 'linux-firmware' 'kmod' 'mkinitcpio>=0.7')
  optdepends=('crda: to set the correct wireless channels of your country')
  provides=('linux')
  backup=("etc/mkinitcpio.d/${pkgbase}.preset")
  install=linux.install

  cd "${_srcname}"

  KARCH=x86

  # get kernel version
  _kernver="$(make LOCALVERSION= kernelrelease)"
  _basekernel=${_kernver%%-*}
  _basekernel=${_basekernel%.*}

  mkdir -p "${pkgdir}"/{lib/modules,lib/firmware,boot}
  make LOCALVERSION= INSTALL_MOD_PATH="${pkgdir}" modules_install
  cp arch/$KARCH/boot/bzImage "${pkgdir}/boot/vmlinuz-${pkgbase}"

  # set correct depmod command for install
  sed -e "s|%PKGBASE%|${pkgbase}|g;s|%KERNVER%|${_kernver}|g" \
    "${startdir}/${install}" > "${startdir}/${install}.pkg"
  true && install=${install}.pkg

  # install mkinitcpio preset file for kernel
  sed "s|%PKGBASE%|${pkgbase}|g" "${srcdir}/${pkgbase}.preset" |
    install -D -m644 /dev/stdin "${pkgdir}/etc/mkinitcpio.d/${pkgbase}.preset"

  # install pacman hook for initramfs regeneration
  sed "s|%PKGBASE%|${pkgbase}|g" "${srcdir}/90-linux.hook" |
    install -D -m644 /dev/stdin "${pkgdir}/usr/share/libalpm/hooks/90-${pkgbase}.hook"

  # remove build and source links
  rm -f "${pkgdir}"/lib/modules/${_kernver}/{source,build}
  # remove the firmware
  rm -rf "${pkgdir}/lib/firmware"
  # make room for external modules
  ln -s "../extramodules-${_basekernel}${_kernelname:--ARCH}" "${pkgdir}/lib/modules/${_kernver}/extramodules"
  # add real version for building modules and running depmod from post_install/upgrade
  mkdir -p "${pkgdir}/lib/modules/extramodules-${_basekernel}${_kernelname:--ARCH}"
  echo "${_kernver}" > "${pkgdir}/lib/modules/extramodules-${_basekernel}${_kernelname:--ARCH}/version"

  # Now we call depmod...
  depmod -b "${pkgdir}" -F System.map "${_kernver}"

  # move module tree /lib -> /usr/lib
  mkdir -p "${pkgdir}/usr"
  mv "${pkgdir}/lib" "${pkgdir}/usr/"

  # add vmlinux
  install -D -m644 vmlinux "${pkgdir}/usr/lib/modules/${_kernver}/build/vmlinux" 

  # add System.map
  install -D -m644 System.map "${pkgdir}/boot/System.map-${_kernver}"
}

_package-headers() {
  pkgdesc="Header files and scripts for building modules for Linux kernel (git version)"
  provides=('linux-headers')

  install -dm755 "${pkgdir}/usr/lib/modules/${_kernver}"

  cd "${_srcname}"
  install -D -m644 Makefile \
    "${pkgdir}/usr/lib/modules/${_kernver}/build/Makefile"
  install -D -m644 kernel/Makefile \
    "${pkgdir}/usr/lib/modules/${_kernver}/build/kernel/Makefile"
  install -D -m644 .config \
    "${pkgdir}/usr/lib/modules/${_kernver}/build/.config"

  mkdir -p "${pkgdir}/usr/lib/modules/${_kernver}/build/include"

  for i in acpi asm-generic config crypto drm generated keys linux math-emu \
    media net pcmcia rdma scsi soc sound trace uapi video xen; do
    cp -a include/${i} "${pkgdir}/usr/lib/modules/${_kernver}/build/include/"
  done

  # copy arch includes for external modules
  mkdir -p "${pkgdir}/usr/lib/modules/${_kernver}/build/arch/x86"
  cp -a arch/x86/include "${pkgdir}/usr/lib/modules/${_kernver}/build/arch/x86/"

  # copy files necessary for later builds, like nvidia and vmware
  cp Module.symvers "${pkgdir}/usr/lib/modules/${_kernver}/build"
  cp -a scripts "${pkgdir}/usr/lib/modules/${_kernver}/build"

  # fix permissions on scripts dir
  chmod og-w -R "${pkgdir}/usr/lib/modules/${_kernver}/build/scripts"
  mkdir -p "${pkgdir}/usr/lib/modules/${_kernver}/build/.tmp_versions"

  mkdir -p "${pkgdir}/usr/lib/modules/${_kernver}/build/arch/${KARCH}/kernel"

  cp arch/${KARCH}/Makefile "${pkgdir}/usr/lib/modules/${_kernver}/build/arch/${KARCH}/"

  if [ "${CARCH}" = "i686" ]; then
    cp arch/${KARCH}/Makefile_32.cpu "${pkgdir}/usr/lib/modules/${_kernver}/build/arch/${KARCH}/"
  fi

  cp arch/${KARCH}/kernel/asm-offsets.s "${pkgdir}/usr/lib/modules/${_kernver}/build/arch/${KARCH}/kernel/"

  # add docbook makefile
  install -D -m644 Documentation/DocBook/Makefile \
    "${pkgdir}/usr/lib/modules/${_kernver}/build/Documentation/DocBook/Makefile"

  # add dm headers
  mkdir -p "${pkgdir}/usr/lib/modules/${_kernver}/build/drivers/md"
  cp drivers/md/*.h "${pkgdir}/usr/lib/modules/${_kernver}/build/drivers/md"

  # add inotify.h
  mkdir -p "${pkgdir}/usr/lib/modules/${_kernver}/build/include/linux"
  cp include/linux/inotify.h "${pkgdir}/usr/lib/modules/${_kernver}/build/include/linux/"

  # add wireless headers
  mkdir -p "${pkgdir}/usr/lib/modules/${_kernver}/build/net/mac80211/"
  cp net/mac80211/*.h "${pkgdir}/usr/lib/modules/${_kernver}/build/net/mac80211/"

  # add dvb headers for external modules
  # in reference to:
  # http://bugs.archlinux.org/task/9912
  mkdir -p "${pkgdir}/usr/lib/modules/${_kernver}/build/drivers/media/dvb-core"
  cp drivers/media/dvb-core/*.h "${pkgdir}/usr/lib/modules/${_kernver}/build/drivers/media/dvb-core/"
  # and...
  # http://bugs.archlinux.org/task/11194
  mkdir -p "${pkgdir}/usr/lib/modules/${_kernver}/build/include/config/dvb/"
  cp include/config/dvb/*.h "${pkgdir}/usr/lib/modules/${_kernver}/build/include/config/dvb/"

  # add dvb headers for http://mcentral.de/hg/~mrec/em28xx-new
  # in reference to:
  # http://bugs.archlinux.org/task/13146
  mkdir -p "${pkgdir}/usr/lib/modules/${_kernver}/build/drivers/media/dvb-frontends/"
  cp drivers/media/dvb-frontends/lgdt330x.h "${pkgdir}/usr/lib/modules/${_kernver}/build/drivers/media/dvb-frontends/"
  mkdir -p "${pkgdir}/usr/lib/modules/${_kernver}/build/drivers/media/i2c/"
  cp drivers/media/i2c/msp3400-driver.h "${pkgdir}/usr/lib/modules/${_kernver}/build/drivers/media/i2c/"

  # add dvb headers
  # in reference to:
  # http://bugs.archlinux.org/task/20402
  mkdir -p "${pkgdir}/usr/lib/modules/${_kernver}/build/drivers/media/usb/dvb-usb"
  cp drivers/media/usb/dvb-usb/*.h "${pkgdir}/usr/lib/modules/${_kernver}/build/drivers/media/usb/dvb-usb/"
  mkdir -p "${pkgdir}/usr/lib/modules/${_kernver}/build/drivers/media/dvb-frontends"
  cp drivers/media/dvb-frontends/*.h "${pkgdir}/usr/lib/modules/${_kernver}/build/drivers/media/dvb-frontends/"
  mkdir -p "${pkgdir}/usr/lib/modules/${_kernver}/build/drivers/media/tuners"
  cp drivers/media/tuners/*.h "${pkgdir}/usr/lib/modules/${_kernver}/build/drivers/media/tuners/"

  # add xfs and shmem for aufs building
  mkdir -p "${pkgdir}/usr/lib/modules/${_kernver}/build/fs/xfs"
  mkdir -p "${pkgdir}/usr/lib/modules/${_kernver}/build/mm"
  # removed in 3.17 series
  # cp fs/xfs/xfs_sb.h "${pkgdir}/usr/lib/modules/${_kernver}/build/fs/xfs/xfs_sb.h"

  # copy in Kconfig files
  for i in $(find . -name "Kconfig*"); do
    mkdir -p "${pkgdir}"/usr/lib/modules/${_kernver}/build/`echo ${i} | sed 's|/Kconfig.*||'`
    cp ${i} "${pkgdir}/usr/lib/modules/${_kernver}/build/${i}"
  done

  # add objtool for external module building and enabled VALIDATION_STACK option
  if [ -f tools/objtool/objtool ];  then
      mkdir -p "${pkgdir}/usr/lib/modules/${_kernver}/build/tools/objtool"
      cp -a tools/objtool/objtool ${pkgdir}/usr/lib/modules/${_kernver}/build/tools/objtool/
  fi

  chown -R root.root "${pkgdir}/usr/lib/modules/${_kernver}/build"
  find "${pkgdir}/usr/lib/modules/${_kernver}/build" -type d -exec chmod 755 {} \;

  # strip scripts directory
  find "${pkgdir}/usr/lib/modules/${_kernver}/build/scripts" -type f -perm -u+w 2>/dev/null | while read binary ; do
    case "$(file -bi "${binary}")" in
      *application/x-sharedlib*) # Libraries (.so)
        /usr/bin/strip ${STRIP_SHARED} "${binary}";;
      *application/x-archive*) # Libraries (.a)
        /usr/bin/strip ${STRIP_STATIC} "${binary}";;
      *application/x-executable*) # Binaries
        /usr/bin/strip ${STRIP_BINARIES} "${binary}";;
    esac
  done

  # remove unneeded architectures
  rm -rf "${pkgdir}"/usr/lib/modules/${_kernver}/build/arch/{alpha,arc,arm,arm26,arm64,avr32,blackfin,c6x,cris,frv,h8300,hexagon,ia64,m32r,m68k,m68knommu,metag,mips,microblaze,mn10300,openrisc,parisc,powerpc,ppc,s390,score,sh,sh64,sparc,sparc64,tile,unicore32,um,v850,xtensa}

  # remove a files already in linux-docs package
  rm -f "${pkgdir}/usr/lib/modules/${_kernver}/build/Documentation/kbuild/Kconfig.recursion-issue-01"
  rm -f "${pkgdir}/usr/lib/modules/${_kernver}/build/Documentation/kbuild/Kconfig.recursion-issue-02"
  rm -f "${pkgdir}/usr/lib/modules/${_kernver}/build/Documentation/kbuild/Kconfig.select-break"
}

_package-docs() {
  pkgdesc="Kernel hackers manual - HTML documentation that comes with the Linux kernel (git version)"
  provides=('linux-docs')

  cd "${_srcname}"

  mkdir -p "${pkgdir}/usr/lib/modules/${_kernver}/build"
#  git tracks the contents of the Documentation dir so do not change it when using the existing source dir for bisection
#  it will be changed by the following chmod commands in conjuction with using hardlinks
#  cp -al Documentation "${pkgdir}/usr/lib/modules/${_kernver}/build"
  cp -a Documentation "${pkgdir}/usr/lib/modules/${_kernver}/build"    
  find "${pkgdir}" -type f -exec chmod 444 {} \;
  find "${pkgdir}" -type d -exec chmod 755 {} \;

  # remove a file already in linux package
  rm -f "${pkgdir}/usr/lib/modules/${_kernver}/build/Documentation/DocBook/Makefile"
}

pkgname=("${pkgbase}" "${pkgbase}-headers" "${pkgbase}-docs")
for _p in ${pkgname[@]}; do
  eval "package_${_p}() {
    $(declare -f "_package${_p#${pkgbase}}")
    _package${_p#${pkgbase}}
  }"
done

# vim:set ts=8 sts=2 sw=2 et:

Offline

#9 2017-08-08 14:39:48

seth
Member
Registered: 2012-09-03
Posts: 49,951

Re: [SOLVED] kernel 4.12.4-1 ata bug

First confirm that this doesn't show up with the older kernel (simply downgrade it)
Is this related to S3 or S4 cycles?

Offline

#10 2017-08-08 16:58:21

Al.Piotrowicz
Member
Registered: 2017-08-07
Posts: 116

Re: [SOLVED] kernel 4.12.4-1 ata bug

seth wrote:

First confirm that this doesn't show up with the older kernel (simply downgrade it)
Is this related to S3 or S4 cycles?

Ok guys - thanks for your kind help. I inform You that I downgraded into 4.12.3-1 and waiting for bug trigger up. Due to its nature occuring randomly (worked for couple of 10+ hours and then it suddenly occured locking completely one hdd out (ata controller disabled state)).

It doesnt show any abnormal smart entries either.

The first time a couple years ago, I wasn't sure its not a hdd itself dying up:

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green (AF)
Device Model:     WDC WD20EARS-00S8B1
Serial Number:    WD-WCAVY5975648
LU WWN Device Id: 5 0014ee 25a73facd
Firmware Version: 80.00A80
User Capacity:    2,000,398,934,016 bytes [2,00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Tue Aug  8 18:45:36 2017 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

But after discovered it was a kernel bug which didnt seem to affect the LTS kernel right then I gave it up and later (after few months) tried the stock kernel without any issue.

I dont understand your question about S3 and S4 cycles Seth, please roll it over.

Offline

#11 2017-08-08 17:55:40

seth
Member
Registered: 2012-09-03
Posts: 49,951

Re: [SOLVED] kernel 4.12.4-1 ata bug

The errors look more like a connection issue than a dying disk (unless it's indeed a kernel bug), ie. sth. on the pci bus or the cable.
S3 and S4 relate to suspending to RAM resp. disk (system sleep / hibernation)

Offline

#12 2017-08-08 18:12:05

alphaniner
Member
From: Ancapistan
Registered: 2010-07-12
Posts: 2,810

Re: [SOLVED] kernel 4.12.4-1 ata bug

seth wrote:

The errors look more like a connection issue than a dying disk (unless it's indeed a kernel bug), ie. sth. on the pci bus or the cable.

Strongly seconded. I had something similar occur years ago due to a bad power adapter (legacy to SATA), and all too often see the exact same errors occur with poor eSATA connections.

Another thing is the drive model, WD Caviar Green. The "Green" line was plagued with issues due to aggressive idling, google "wd idle". Though admittedly I would have thought it would have been worked out by the time AF drives were released. Still, if all your drives are Greens that could have something to do with it. It might be helpful to see the entire SMART status with "smartctl -a <dev>", or at least the SMART attributes with "smartctl -A <dev>".


But whether the Constitution really be one thing, or another, this much is certain - that it has either authorized such a government as we have had, or has been powerless to prevent it. In either case, it is unfit to exist.
-Lysander Spooner

Offline

#13 2017-08-08 18:13:31

Al.Piotrowicz
Member
Registered: 2017-08-07
Posts: 116

Re: [SOLVED] kernel 4.12.4-1 ata bug

It happened during normal desktop usage in the active state as I spoke before. I dont use sleep states whatsoever. I know it seems to look like the connection issue, but everything looks fine under the hood. No heat, no loose connectors. Moreover it happened first time since a long time as I previosly wrote at the begining.

In sum - I dont use any of them : hibernation or Suspend to ram either.

Offline

#14 2017-08-08 18:20:43

Al.Piotrowicz
Member
Registered: 2017-08-07
Posts: 116

Re: [SOLVED] kernel 4.12.4-1 ata bug

alphaniner wrote:
seth wrote:

The errors look more like a connection issue than a dying disk (unless it's indeed a kernel bug), ie. sth. on the pci bus or the cable.

Strongly seconded. I had something similar occur years ago due to a bad power adapter (legacy to SATA), and all too often see the exact same errors occur with poor eSATA connections.

Another thing is the drive model, WD Caviar Green. The "Green" line was plagued with issues due to aggressive idling, google "wd idle". Though admittedly I would have thought it would have been worked out by the time AF drives were released. Still, if all your drives are Greens that could have something to do with it. It might be helpful to see the entire SMART status with "smartctl -a <dev>", or at least the SMART attributes with "smartctl -A <dev>".

Thanks for your post alphaniner. No fancy stuff here just sata plugs into main Mb SATA controller. I've disabled parking headers using wdidle3 tool at the exact start of using those damn piece of crap drives and know that issue very well. Disabled it on all 3. Heres the smart output of all of them (lastly the last one has been affected):

smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.12.3-1-ARCH] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green (AF)
Device Model:     WDC WD20EARS-00S8B1
Serial Number:    WD-WCAVY5975520
LU WWN Device Id: 5 0014ee 25a73fa6e
Firmware Version: 80.00A80
User Capacity:    2,000,398,934,016 bytes [2,00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Tue Aug  8 20:16:23 2017 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)	Offline data collection activity
					was completed without error.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(40260) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 459) minutes.
Conveyance self-test routine
recommended polling time: 	 (   5) minutes.
SCT capabilities: 	       (0x3031)	SCT Status supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   145   140   021    Pre-fail  Always       -       9741
  4 Start_Stop_Count        0x0032   098   098   000    Old_age   Always       -       2332
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   073   073   000    Old_age   Always       -       20430
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   098   098   000    Old_age   Always       -       2296
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       249
193 Load_Cycle_Count        0x0032   199   199   000    Old_age   Always       -       3320
194 Temperature_Celsius     0x0022   110   103   000    Old_age   Always       -       42
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.12.3-1-ARCH] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green (AF)
Device Model:     WDC WD20EARS-00S8B1
Serial Number:    WD-WCAVY5973675
LU WWN Device Id: 5 0014ee 2051ec386
Firmware Version: 80.00A80
User Capacity:    2,000,398,934,016 bytes [2,00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Tue Aug  8 20:16:27 2017 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84)	Offline data collection activity
					was suspended by an interrupting command from host.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(40260) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 459) minutes.
Conveyance self-test routine
recommended polling time: 	 (   5) minutes.
SCT capabilities: 	       (0x3031)	SCT Status supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   145   144   021    Pre-fail  Always       -       9716
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1928
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   078   078   000    Old_age   Always       -       16720
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1901
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       208
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       2544
194 Temperature_Celsius     0x0022   113   101   000    Old_age   Always       -       39
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   001   000    Old_age   Always       -       1223
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      7174         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.12.3-1-ARCH] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green (AF)
Device Model:     WDC WD20EARS-00S8B1
Serial Number:    WD-WCAVY5975648
LU WWN Device Id: 5 0014ee 25a73facd
Firmware Version: 80.00A80
User Capacity:    2,000,398,934,016 bytes [2,00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Tue Aug  8 20:16:31 2017 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84)	Offline data collection activity
					was suspended by an interrupting command from host.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(41100) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 468) minutes.
Conveyance self-test routine
recommended polling time: 	 (   5) minutes.
SCT capabilities: 	       (0x3031)	SCT Status supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   147   144   021    Pre-fail  Always       -       9650
  4 Start_Stop_Count        0x0032   098   098   000    Old_age   Always       -       2200
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   074   074   000    Old_age   Always       -       19438
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   098   098   000    Old_age   Always       -       2171
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       216
193 Load_Cycle_Count        0x0032   199   199   000    Old_age   Always       -       5021
194 Temperature_Celsius     0x0022   112   103   000    Old_age   Always       -       40
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Offline

#15 2017-08-11 17:28:47

Al.Piotrowicz
Member
Registered: 2017-08-07
Posts: 116

Re: [SOLVED] kernel 4.12.4-1 ata bug

Found some spare time and tried to bisect as suggested. Unfortunately thats where the real problem starts. As mentioned before due to this bug nature (booted 4.12.4 kernel at 9am and bug triggered at 6pm) Im unable to quickly aim which commit causing it. It will take a weeks until I find out. Any suggestions of code minded pros are very welcome, because those facts are very annoying.

Thanks for help.

Offline

#16 2017-08-11 18:24:29

seth
Member
Registered: 2012-09-03
Posts: 49,951

Re: [SOLVED] kernel 4.12.4-1 ata bug

dd if=/dev/sda1 of=/dev/null

Run a couple of those, maybe in a loop and see whether that accelerates things ... ;-)

Offline

#17 2017-08-11 18:52:41

pereira_alex
Member
Registered: 2015-04-12
Posts: 6

Re: [SOLVED] kernel 4.12.4-1 ata bug

I have a similar problem ( error is the same, using an ssd disk ).

Using an older kernel or another distro, this problem doesn't happen. When using 4.12.x kernel, it happens alot. ( I thought on trying older kernel/ another distro before sending disk to trash, because it started happening when 4.12 came to my system )

Offline

#18 2017-08-11 19:24:54

Al.Piotrowicz
Member
Registered: 2017-08-07
Posts: 116

Re: [SOLVED] kernel 4.12.4-1 ata bug

Thanks for reply in subject pereira_alex. Now Im 90% sure the 4.12.4-1-arch is causing the issue for me. Tested earlier versions and it appears to work flawlessly. 4.12.3 doesnt seem to trigger the bug either. Now trying to reproduce it without luck bisecting meanwhile. dd loops dont help at all. Time is money friend smile

Last edited by Al.Piotrowicz (2017-08-11 19:26:28)

Offline

#19 2017-08-11 23:30:21

Potomac
Member
Registered: 2011-12-25
Posts: 526

Re: [SOLVED] kernel 4.12.4-1 ata bug

you can try to connect your hard disk to another sata port,

if all sata port are used on your motherboard then you can switch your sata device :

for example if hard-disk#1 is on sata port#1 and hard-disk#2 is on sata port#2 then swap them : hard-disk#1 on sata port#2 and hard-disk#2 on sata port#1

if you have a slow device ( dvd-burner, dvd player ) then don't put the dvd player on the same sata controler used by a fast device as hard-disk, in the past I discovered that the kernel can trigger weird bugs if a slow device ( dvd player ) was put on the same sata controler used by an hard disk ( at startup or on reboot the kernel will freeze randomly ),

the workaround I found is to connect the DVD player on another sata controler ( a PCie sata controler, instead of the sata ports of the motherboard ),

this problem of ata bug can occur on old motherboards where SATA ports are emulated as ATA in the bios,
check if in the bios you can find a setting related to port sata, sometimes you can disable the ATA emulation in order to use an advanced mode for sata ports

Offline

#20 2017-09-10 21:12:03

pereira_alex
Member
Registered: 2015-04-12
Posts: 6

Re: [SOLVED] kernel 4.12.4-1 ata bug

My problem went away:

Solved with the help of this bugreport -> https://bugs.launchpad.net/elementaryos/+bug/1576634

Aparently its a TLP issue, commenting
#SATA_LINKPWR_ON_AC=max_performance
#SATA_LINKPWR_ON_BAT=min_power

solved it. ( at least for a full day with lots of disk testing )

Offline

#21 2017-09-11 08:49:01

R00KIE
Forum Fellow
From: Between a computer and a chair
Registered: 2008-09-14
Posts: 4,734

Re: [SOLVED] kernel 4.12.4-1 ata bug

pereira_alex wrote:

My problem went away:

Solved with the help of this bugreport -> https://bugs.launchpad.net/elementaryos/+bug/1576634

Aparently its a TLP issue, commenting
#SATA_LINKPWR_ON_AC=max_performance
#SATA_LINKPWR_ON_BAT=min_power

solved it. ( at least for a full day with lots of disk testing )

I'd say that is not a TLP issue. I would be more inclined to say that it is a disk firmware problem or a combination of disk firmware + board chipset see [1,2].

[1] https://mjg59.dreamwidth.org/34868.html
[2] https://mjg59.dreamwidth.org/42156.html


R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K

Offline

#22 2017-09-11 09:00:18

pereira_alex
Member
Registered: 2015-04-12
Posts: 6

Re: [SOLVED] kernel 4.12.4-1 ata bug

R00KIE wrote:
pereira_alex wrote:

My problem went away:

Solved with the help of this bugreport -> https://bugs.launchpad.net/elementaryos/+bug/1576634

Aparently its a TLP issue, commenting
#SATA_LINKPWR_ON_AC=max_performance
#SATA_LINKPWR_ON_BAT=min_power

solved it. ( at least for a full day with lots of disk testing )

I'd say that is not a TLP issue. I would be more inclined to say that it is a disk firmware problem or a combination of disk firmware + board chipset see [1,2].

[1] https://mjg59.dreamwidth.org/34868.html
[2] https://mjg59.dreamwidth.org/42156.html


Don't know .... Might be, might not be .... since my ssd is from a different vendor than the one of the other bug report. Hope it helps anyone with the same problem.

Offline

#23 2017-09-11 12:02:50

seth
Member
Registered: 2012-09-03
Posts: 49,951

Re: [SOLVED] kernel 4.12.4-1 ata bug

This affects several devices - thus the warning in https://wiki.archlinux.org/index.php/Po … Management

About yours being a "similar issue" to the OP - did you have mentions of CommWake in *your* dmesg errors?

Offline

#24 2017-09-11 12:34:40

pereira_alex
Member
Registered: 2015-04-12
Posts: 6

Re: [SOLVED] kernel 4.12.4-1 ata bug

seth wrote:

This affects several devices - thus the warning in https://wiki.archlinux.org/index.php/Po … Management

About yours being a "similar issue" to the OP - did you have mentions of CommWake in *your* dmesg errors?

Had the error messages like the OP of this thread , not like the OP of the thread i linked.
I don't have a copy of the messages here, but if it is that important i can reenable the setting and copy it.

Thanks for the link to the wiki.

Offline

#25 2018-02-05 18:57:17

Al.Piotrowicz
Member
Registered: 2017-08-07
Posts: 116

Re: [SOLVED] kernel 4.12.4-1 ata bug

First of all, thanks for all the replies and your big effort in trying to help with this hard to track bug type. For now WD Green libata.c errors are gone due to change the SB PCI sata mode into AHCI (was IDE-NATIVE previously) in system BIOS.

I will inform about any related errors that might occur in this thread.

Thank you.

Offline

Board footer

Powered by FluxBB