You are not logged in.
Pages: 1
During normal desktop operation bug apperars randomly affecting random ata HDDs :
sie 07 17:52:25 testowy kernel: ata3.00: exception Emask 0x0 SAct 0xc00 SErr 0x0 action 0x6 frozen
sie 07 17:52:25 testowy kernel: ata3.00: failed command: WRITE FPDMA QUEUED
sie 07 17:52:25 testowy kernel: ata3.00: cmd 61/30:50:20:01:b2/04:00:3e:00:00/40 tag 10 ncq dma 548864 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
sie 07 17:52:25 testowy kernel: ata3.00: status: { DRDY }
sie 07 17:52:25 testowy kernel: ata3.00: failed command: WRITE FPDMA QUEUED
sie 07 17:52:25 testowy kernel: ata3.00: cmd 61/70:58:50:05:b2/04:00:3e:00:00/40 tag 11 ncq dma 581632 out
res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
sie 07 17:52:25 testowy kernel: ata3.00: status: { DRDY }
sie 07 17:52:25 testowy kernel: ata3: hard resetting link
sie 07 17:52:35 testowy kernel: ata3: softreset failed (device not ready)
sie 07 17:52:35 testowy kernel: ata3: hard resetting link
sie 07 17:52:45 testowy kernel: ata3: softreset failed (device not ready)
sie 07 17:52:45 testowy kernel: ata3: hard resetting link
sie 07 17:52:55 testowy kernel: ata3: link is slow to respond, please be patient (ready=0)
sie 07 17:53:20 testowy kernel: ata3: softreset failed (device not ready)
sie 07 17:53:20 testowy kernel: ata3: limiting SATA link speed to 1.5 Gbps
sie 07 17:53:20 testowy kernel: ata3: hard resetting link
sie 07 17:53:25 testowy kernel: ata3: softreset failed (device not ready)
sie 07 17:53:25 testowy kernel: ata3: reset failed, giving up
sie 07 17:53:25 testowy kernel: ata3.00: disabled
Using updated packages from the stable repo. LVM on top of LUKS.
Was the similar issue few years ago so think its sime kind of bug regression. Can put any additional info if necessary.
Thanks for help
Last edited by Al.Piotrowicz (2019-10-04 14:41:45)
Offline
Welcome to the arch linux forums Al.Piotrowicz. What was the kernel version before the upgrade?
Offline
It was 4.12.3-1, Im updating frequently as I can.
Additionally dont know is it important or not, but like in the case in past I've write in the main post, affected disc is not accesible in bios after soft restart.
After shutdown and power up again its all come back to normal operational state. I remember well it was the same issue few years ago.
Offline
Looking at https://cdn.kernel.org/pub/linux/kernel … Log-4.12.4 can not see an obviously related commit to me.
Could you please try bisecting between 4.12.3 and 4.12.4 and find which commit is the cause and report it upstream.
Offline
Ok, try do my best. Looks like bit challenging for me. Im not much tech guy and never done such bisect stuff.
I post my result here soon I hope.
Offline
https://bbs.archlinux.org/viewtopic.php … 5#p1700245 details bisecting the kernel in a bit more detail although that PKGBUILD is aimed at bisecting 4.9 to 4.10.
Offline
Im a bit confused about the whole procedure. Please narrow me more if you can. What version of linux git code should I use to bisect, most recent one or some other ? Im trying to build it directly from latest rc tree, but dont know is it a right way.
Last edited by Al.Piotrowicz (2017-08-08 13:59:03)
Offline
Please try this PKGBUILD and the instructions from the link I posted previously substituting 4.12.3 for 4.9 and 4.12.4 for 4.10 in the commands the PKGBUILD has already been adjusted
# Maintainer: Boohbah <boohbah at gmail.com>
# Contributor: Tobias Powalowski <tpowa@archlinux.org>
# Contributor: Thomas Baechler <thomas@archlinux.org>
# Contributor: Jonathan Chan <jyc@fastmail.fm>
# Contributor: misc <tastky@gmail.com>
# Contributor: NextHendrix <cjones12 at sheffield.ac.uk>
pkgbase=linux-git
_srcname=linux-stable
pkgver=4.12.3.r0.g8f883aa5b661
pkgrel=1
arch=('i686' 'x86_64')
url="http://www.kernel.org/"
license=('GPL2')
makedepends=('xmlto' 'docbook-xsl' 'kmod' 'inetutils' 'bc' 'git' 'libelf')
options=('!strip')
source=('git+https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git#tag=v4.12.3'
# the main kernel config files
'config.i686::https://git.archlinux.org/svntogit/packages.git/plain/trunk/config.i686?h=packages/linux&id=e1e9510f6d9d8856e2b36b073be8928a85d77a93'
'config.x86_64::https://git.archlinux.org/svntogit/packages.git/plain/trunk/config.x86_64?h=packages/linux&id=e1e9510f6d9d8856e2b36b073be8928a85d77a93'
'90-linux.hook::https://git.archlinux.org/svntogit/packages.git/plain/trunk/90-linux.hook?h=packages/linux&id=e1e9510f6d9d8856e2b36b073be8928a85d77a93'
# standard config files for mkinitcpio ramdisk
"${pkgbase}.preset::https://git.archlinux.org/svntogit/packages.git/plain/trunk/linux.preset?h=packages/linux&id=e1e9510f6d9d8856e2b36b073be8928a85d77a93")
sha256sums=('SKIP'
'df55887a43dcbb6bd35fd2fb1ec841427b6ea827334c0880cbc256d4f042a7a1'
'bf84528c592d1841bba0662242f0339a24a1de384c31f28248631e8be9446586'
'834bd254b56ab71d73f59b3221f056c72f559553c04718e350ab2a3e2991afe0'
'ad6344badc91ad0630caacde83f7f9b97276f80d26a20619a87952be65492c65')
_kernelname=${pkgbase#linux}
pkgver() {
cd "${_srcname}"
git describe --long | sed -E 's/^v//;s/([^-]*-g)/r\1/;s/-/./g;s/\.rc/rc/'
}
prepare() {
cd "${_srcname}"
cat "${srcdir}/config.${CARCH}" > ./.config
# set localversion to git commit
sed -i "s|CONFIG_LOCALVERSION=.*|CONFIG_LOCALVERSION=\"-${pkgver##*.}\"|g" ./.config
sed -i "s|CONFIG_LOCALVERSION_AUTO=.*|CONFIG_LOCALVERSION_AUTO=n|" ./.config
# don't run depmod on 'make install'. We'll do this ourselves in packaging
# git tracks scripts/depmod.sh so do not change it when using the existing source dir for bisection
# sed -i '2iexit 0' scripts/depmod.sh
# get kernel version
make prepare
# load configuration
# Configure the kernel. Replace the line below with one of your choice.
#make menuconfig # CLI menu for configuration
#make nconfig # new CLI menu for configuration
#make xconfig # X-based configuration
#make oldconfig # using old config from previous kernel version
make olddefconfig # old config from previous kernel, defaults for new options
# ... or manually edit .config
}
build() {
cd "${_srcname}"
make ${MAKEFLAGS} LOCALVERSION= bzImage modules
}
_package() {
pkgdesc="The Linux kernel and modules (git version)"
depends=('coreutils' 'linux-firmware' 'kmod' 'mkinitcpio>=0.7')
optdepends=('crda: to set the correct wireless channels of your country')
provides=('linux')
backup=("etc/mkinitcpio.d/${pkgbase}.preset")
install=linux.install
cd "${_srcname}"
KARCH=x86
# get kernel version
_kernver="$(make LOCALVERSION= kernelrelease)"
_basekernel=${_kernver%%-*}
_basekernel=${_basekernel%.*}
mkdir -p "${pkgdir}"/{lib/modules,lib/firmware,boot}
make LOCALVERSION= INSTALL_MOD_PATH="${pkgdir}" modules_install
cp arch/$KARCH/boot/bzImage "${pkgdir}/boot/vmlinuz-${pkgbase}"
# set correct depmod command for install
sed -e "s|%PKGBASE%|${pkgbase}|g;s|%KERNVER%|${_kernver}|g" \
"${startdir}/${install}" > "${startdir}/${install}.pkg"
true && install=${install}.pkg
# install mkinitcpio preset file for kernel
sed "s|%PKGBASE%|${pkgbase}|g" "${srcdir}/${pkgbase}.preset" |
install -D -m644 /dev/stdin "${pkgdir}/etc/mkinitcpio.d/${pkgbase}.preset"
# install pacman hook for initramfs regeneration
sed "s|%PKGBASE%|${pkgbase}|g" "${srcdir}/90-linux.hook" |
install -D -m644 /dev/stdin "${pkgdir}/usr/share/libalpm/hooks/90-${pkgbase}.hook"
# remove build and source links
rm -f "${pkgdir}"/lib/modules/${_kernver}/{source,build}
# remove the firmware
rm -rf "${pkgdir}/lib/firmware"
# make room for external modules
ln -s "../extramodules-${_basekernel}${_kernelname:--ARCH}" "${pkgdir}/lib/modules/${_kernver}/extramodules"
# add real version for building modules and running depmod from post_install/upgrade
mkdir -p "${pkgdir}/lib/modules/extramodules-${_basekernel}${_kernelname:--ARCH}"
echo "${_kernver}" > "${pkgdir}/lib/modules/extramodules-${_basekernel}${_kernelname:--ARCH}/version"
# Now we call depmod...
depmod -b "${pkgdir}" -F System.map "${_kernver}"
# move module tree /lib -> /usr/lib
mkdir -p "${pkgdir}/usr"
mv "${pkgdir}/lib" "${pkgdir}/usr/"
# add vmlinux
install -D -m644 vmlinux "${pkgdir}/usr/lib/modules/${_kernver}/build/vmlinux"
# add System.map
install -D -m644 System.map "${pkgdir}/boot/System.map-${_kernver}"
}
_package-headers() {
pkgdesc="Header files and scripts for building modules for Linux kernel (git version)"
provides=('linux-headers')
install -dm755 "${pkgdir}/usr/lib/modules/${_kernver}"
cd "${_srcname}"
install -D -m644 Makefile \
"${pkgdir}/usr/lib/modules/${_kernver}/build/Makefile"
install -D -m644 kernel/Makefile \
"${pkgdir}/usr/lib/modules/${_kernver}/build/kernel/Makefile"
install -D -m644 .config \
"${pkgdir}/usr/lib/modules/${_kernver}/build/.config"
mkdir -p "${pkgdir}/usr/lib/modules/${_kernver}/build/include"
for i in acpi asm-generic config crypto drm generated keys linux math-emu \
media net pcmcia rdma scsi soc sound trace uapi video xen; do
cp -a include/${i} "${pkgdir}/usr/lib/modules/${_kernver}/build/include/"
done
# copy arch includes for external modules
mkdir -p "${pkgdir}/usr/lib/modules/${_kernver}/build/arch/x86"
cp -a arch/x86/include "${pkgdir}/usr/lib/modules/${_kernver}/build/arch/x86/"
# copy files necessary for later builds, like nvidia and vmware
cp Module.symvers "${pkgdir}/usr/lib/modules/${_kernver}/build"
cp -a scripts "${pkgdir}/usr/lib/modules/${_kernver}/build"
# fix permissions on scripts dir
chmod og-w -R "${pkgdir}/usr/lib/modules/${_kernver}/build/scripts"
mkdir -p "${pkgdir}/usr/lib/modules/${_kernver}/build/.tmp_versions"
mkdir -p "${pkgdir}/usr/lib/modules/${_kernver}/build/arch/${KARCH}/kernel"
cp arch/${KARCH}/Makefile "${pkgdir}/usr/lib/modules/${_kernver}/build/arch/${KARCH}/"
if [ "${CARCH}" = "i686" ]; then
cp arch/${KARCH}/Makefile_32.cpu "${pkgdir}/usr/lib/modules/${_kernver}/build/arch/${KARCH}/"
fi
cp arch/${KARCH}/kernel/asm-offsets.s "${pkgdir}/usr/lib/modules/${_kernver}/build/arch/${KARCH}/kernel/"
# add docbook makefile
install -D -m644 Documentation/DocBook/Makefile \
"${pkgdir}/usr/lib/modules/${_kernver}/build/Documentation/DocBook/Makefile"
# add dm headers
mkdir -p "${pkgdir}/usr/lib/modules/${_kernver}/build/drivers/md"
cp drivers/md/*.h "${pkgdir}/usr/lib/modules/${_kernver}/build/drivers/md"
# add inotify.h
mkdir -p "${pkgdir}/usr/lib/modules/${_kernver}/build/include/linux"
cp include/linux/inotify.h "${pkgdir}/usr/lib/modules/${_kernver}/build/include/linux/"
# add wireless headers
mkdir -p "${pkgdir}/usr/lib/modules/${_kernver}/build/net/mac80211/"
cp net/mac80211/*.h "${pkgdir}/usr/lib/modules/${_kernver}/build/net/mac80211/"
# add dvb headers for external modules
# in reference to:
# http://bugs.archlinux.org/task/9912
mkdir -p "${pkgdir}/usr/lib/modules/${_kernver}/build/drivers/media/dvb-core"
cp drivers/media/dvb-core/*.h "${pkgdir}/usr/lib/modules/${_kernver}/build/drivers/media/dvb-core/"
# and...
# http://bugs.archlinux.org/task/11194
mkdir -p "${pkgdir}/usr/lib/modules/${_kernver}/build/include/config/dvb/"
cp include/config/dvb/*.h "${pkgdir}/usr/lib/modules/${_kernver}/build/include/config/dvb/"
# add dvb headers for http://mcentral.de/hg/~mrec/em28xx-new
# in reference to:
# http://bugs.archlinux.org/task/13146
mkdir -p "${pkgdir}/usr/lib/modules/${_kernver}/build/drivers/media/dvb-frontends/"
cp drivers/media/dvb-frontends/lgdt330x.h "${pkgdir}/usr/lib/modules/${_kernver}/build/drivers/media/dvb-frontends/"
mkdir -p "${pkgdir}/usr/lib/modules/${_kernver}/build/drivers/media/i2c/"
cp drivers/media/i2c/msp3400-driver.h "${pkgdir}/usr/lib/modules/${_kernver}/build/drivers/media/i2c/"
# add dvb headers
# in reference to:
# http://bugs.archlinux.org/task/20402
mkdir -p "${pkgdir}/usr/lib/modules/${_kernver}/build/drivers/media/usb/dvb-usb"
cp drivers/media/usb/dvb-usb/*.h "${pkgdir}/usr/lib/modules/${_kernver}/build/drivers/media/usb/dvb-usb/"
mkdir -p "${pkgdir}/usr/lib/modules/${_kernver}/build/drivers/media/dvb-frontends"
cp drivers/media/dvb-frontends/*.h "${pkgdir}/usr/lib/modules/${_kernver}/build/drivers/media/dvb-frontends/"
mkdir -p "${pkgdir}/usr/lib/modules/${_kernver}/build/drivers/media/tuners"
cp drivers/media/tuners/*.h "${pkgdir}/usr/lib/modules/${_kernver}/build/drivers/media/tuners/"
# add xfs and shmem for aufs building
mkdir -p "${pkgdir}/usr/lib/modules/${_kernver}/build/fs/xfs"
mkdir -p "${pkgdir}/usr/lib/modules/${_kernver}/build/mm"
# removed in 3.17 series
# cp fs/xfs/xfs_sb.h "${pkgdir}/usr/lib/modules/${_kernver}/build/fs/xfs/xfs_sb.h"
# copy in Kconfig files
for i in $(find . -name "Kconfig*"); do
mkdir -p "${pkgdir}"/usr/lib/modules/${_kernver}/build/`echo ${i} | sed 's|/Kconfig.*||'`
cp ${i} "${pkgdir}/usr/lib/modules/${_kernver}/build/${i}"
done
# add objtool for external module building and enabled VALIDATION_STACK option
if [ -f tools/objtool/objtool ]; then
mkdir -p "${pkgdir}/usr/lib/modules/${_kernver}/build/tools/objtool"
cp -a tools/objtool/objtool ${pkgdir}/usr/lib/modules/${_kernver}/build/tools/objtool/
fi
chown -R root.root "${pkgdir}/usr/lib/modules/${_kernver}/build"
find "${pkgdir}/usr/lib/modules/${_kernver}/build" -type d -exec chmod 755 {} \;
# strip scripts directory
find "${pkgdir}/usr/lib/modules/${_kernver}/build/scripts" -type f -perm -u+w 2>/dev/null | while read binary ; do
case "$(file -bi "${binary}")" in
*application/x-sharedlib*) # Libraries (.so)
/usr/bin/strip ${STRIP_SHARED} "${binary}";;
*application/x-archive*) # Libraries (.a)
/usr/bin/strip ${STRIP_STATIC} "${binary}";;
*application/x-executable*) # Binaries
/usr/bin/strip ${STRIP_BINARIES} "${binary}";;
esac
done
# remove unneeded architectures
rm -rf "${pkgdir}"/usr/lib/modules/${_kernver}/build/arch/{alpha,arc,arm,arm26,arm64,avr32,blackfin,c6x,cris,frv,h8300,hexagon,ia64,m32r,m68k,m68knommu,metag,mips,microblaze,mn10300,openrisc,parisc,powerpc,ppc,s390,score,sh,sh64,sparc,sparc64,tile,unicore32,um,v850,xtensa}
# remove a files already in linux-docs package
rm -f "${pkgdir}/usr/lib/modules/${_kernver}/build/Documentation/kbuild/Kconfig.recursion-issue-01"
rm -f "${pkgdir}/usr/lib/modules/${_kernver}/build/Documentation/kbuild/Kconfig.recursion-issue-02"
rm -f "${pkgdir}/usr/lib/modules/${_kernver}/build/Documentation/kbuild/Kconfig.select-break"
}
_package-docs() {
pkgdesc="Kernel hackers manual - HTML documentation that comes with the Linux kernel (git version)"
provides=('linux-docs')
cd "${_srcname}"
mkdir -p "${pkgdir}/usr/lib/modules/${_kernver}/build"
# git tracks the contents of the Documentation dir so do not change it when using the existing source dir for bisection
# it will be changed by the following chmod commands in conjuction with using hardlinks
# cp -al Documentation "${pkgdir}/usr/lib/modules/${_kernver}/build"
cp -a Documentation "${pkgdir}/usr/lib/modules/${_kernver}/build"
find "${pkgdir}" -type f -exec chmod 444 {} \;
find "${pkgdir}" -type d -exec chmod 755 {} \;
# remove a file already in linux package
rm -f "${pkgdir}/usr/lib/modules/${_kernver}/build/Documentation/DocBook/Makefile"
}
pkgname=("${pkgbase}" "${pkgbase}-headers" "${pkgbase}-docs")
for _p in ${pkgname[@]}; do
eval "package_${_p}() {
$(declare -f "_package${_p#${pkgbase}}")
_package${_p#${pkgbase}}
}"
done
# vim:set ts=8 sts=2 sw=2 et:
Offline
First confirm that this doesn't show up with the older kernel (simply downgrade it)
Is this related to S3 or S4 cycles?
Offline
First confirm that this doesn't show up with the older kernel (simply downgrade it)
Is this related to S3 or S4 cycles?
Ok guys - thanks for your kind help. I inform You that I downgraded into 4.12.3-1 and waiting for bug trigger up. Due to its nature occuring randomly (worked for couple of 10+ hours and then it suddenly occured locking completely one hdd out (ata controller disabled state)).
It doesnt show any abnormal smart entries either.
The first time a couple years ago, I wasn't sure its not a hdd itself dying up:
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (AF)
Device Model: WDC WD20EARS-00S8B1
Serial Number: WD-WCAVY5975648
LU WWN Device Id: 5 0014ee 25a73facd
Firmware Version: 80.00A80
User Capacity: 2,000,398,934,016 bytes [2,00 TB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 2.6, 3.0 Gb/s
Local Time is: Tue Aug 8 18:45:36 2017 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
But after discovered it was a kernel bug which didnt seem to affect the LTS kernel right then I gave it up and later (after few months) tried the stock kernel without any issue.
I dont understand your question about S3 and S4 cycles Seth, please roll it over.
Offline
The errors look more like a connection issue than a dying disk (unless it's indeed a kernel bug), ie. sth. on the pci bus or the cable.
S3 and S4 relate to suspending to RAM resp. disk (system sleep / hibernation)
Offline
The errors look more like a connection issue than a dying disk (unless it's indeed a kernel bug), ie. sth. on the pci bus or the cable.
Strongly seconded. I had something similar occur years ago due to a bad power adapter (legacy to SATA), and all too often see the exact same errors occur with poor eSATA connections.
Another thing is the drive model, WD Caviar Green. The "Green" line was plagued with issues due to aggressive idling, google "wd idle". Though admittedly I would have thought it would have been worked out by the time AF drives were released. Still, if all your drives are Greens that could have something to do with it. It might be helpful to see the entire SMART status with "smartctl -a <dev>", or at least the SMART attributes with "smartctl -A <dev>".
But whether the Constitution really be one thing, or another, this much is certain - that it has either authorized such a government as we have had, or has been powerless to prevent it. In either case, it is unfit to exist.
-Lysander Spooner
Offline
It happened during normal desktop usage in the active state as I spoke before. I dont use sleep states whatsoever. I know it seems to look like the connection issue, but everything looks fine under the hood. No heat, no loose connectors. Moreover it happened first time since a long time as I previosly wrote at the begining.
In sum - I dont use any of them : hibernation or Suspend to ram either.
Offline
seth wrote:The errors look more like a connection issue than a dying disk (unless it's indeed a kernel bug), ie. sth. on the pci bus or the cable.
Strongly seconded. I had something similar occur years ago due to a bad power adapter (legacy to SATA), and all too often see the exact same errors occur with poor eSATA connections.
Another thing is the drive model, WD Caviar Green. The "Green" line was plagued with issues due to aggressive idling, google "wd idle". Though admittedly I would have thought it would have been worked out by the time AF drives were released. Still, if all your drives are Greens that could have something to do with it. It might be helpful to see the entire SMART status with "smartctl -a <dev>", or at least the SMART attributes with "smartctl -A <dev>".
Thanks for your post alphaniner. No fancy stuff here just sata plugs into main Mb SATA controller. I've disabled parking headers using wdidle3 tool at the exact start of using those damn piece of crap drives and know that issue very well. Disabled it on all 3. Heres the smart output of all of them (lastly the last one has been affected):
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.12.3-1-ARCH] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (AF)
Device Model: WDC WD20EARS-00S8B1
Serial Number: WD-WCAVY5975520
LU WWN Device Id: 5 0014ee 25a73fa6e
Firmware Version: 80.00A80
User Capacity: 2,000,398,934,016 bytes [2,00 TB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 2.6, 3.0 Gb/s
Local Time is: Tue Aug 8 20:16:23 2017 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (40260) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 459) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x3031) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 145 140 021 Pre-fail Always - 9741
4 Start_Stop_Count 0x0032 098 098 000 Old_age Always - 2332
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 073 073 000 Old_age Always - 20430
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 098 098 000 Old_age Always - 2296
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 249
193 Load_Cycle_Count 0x0032 199 199 000 Old_age Always - 3320
194 Temperature_Celsius 0x0022 110 103 000 Old_age Always - 42
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.12.3-1-ARCH] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (AF)
Device Model: WDC WD20EARS-00S8B1
Serial Number: WD-WCAVY5973675
LU WWN Device Id: 5 0014ee 2051ec386
Firmware Version: 80.00A80
User Capacity: 2,000,398,934,016 bytes [2,00 TB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 2.6, 3.0 Gb/s
Local Time is: Tue Aug 8 20:16:27 2017 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x84) Offline data collection activity
was suspended by an interrupting command from host.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (40260) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 459) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x3031) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 145 144 021 Pre-fail Always - 9716
4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1928
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0
9 Power_On_Hours 0x0032 078 078 000 Old_age Always - 16720
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 1901
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 208
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 2544
194 Temperature_Celsius 0x0022 113 101 000 Old_age Always - 39
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 001 000 Old_age Always - 1223
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 7174 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.12.3-1-ARCH] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (AF)
Device Model: WDC WD20EARS-00S8B1
Serial Number: WD-WCAVY5975648
LU WWN Device Id: 5 0014ee 25a73facd
Firmware Version: 80.00A80
User Capacity: 2,000,398,934,016 bytes [2,00 TB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 2.6, 3.0 Gb/s
Local Time is: Tue Aug 8 20:16:31 2017 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x84) Offline data collection activity
was suspended by an interrupting command from host.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (41100) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 468) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x3031) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 147 144 021 Pre-fail Always - 9650
4 Start_Stop_Count 0x0032 098 098 000 Old_age Always - 2200
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 074 074 000 Old_age Always - 19438
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 098 098 000 Old_age Always - 2171
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 216
193 Load_Cycle_Count 0x0032 199 199 000 Old_age Always - 5021
194 Temperature_Celsius 0x0022 112 103 000 Old_age Always - 40
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Offline
Found some spare time and tried to bisect as suggested. Unfortunately thats where the real problem starts. As mentioned before due to this bug nature (booted 4.12.4 kernel at 9am and bug triggered at 6pm) Im unable to quickly aim which commit causing it. It will take a weeks until I find out. Any suggestions of code minded pros are very welcome, because those facts are very annoying.
Thanks for help.
Offline
dd if=/dev/sda1 of=/dev/null
Run a couple of those, maybe in a loop and see whether that accelerates things ... ;-)
Offline
I have a similar problem ( error is the same, using an ssd disk ).
Using an older kernel or another distro, this problem doesn't happen. When using 4.12.x kernel, it happens alot. ( I thought on trying older kernel/ another distro before sending disk to trash, because it started happening when 4.12 came to my system )
Offline
Thanks for reply in subject pereira_alex. Now Im 90% sure the 4.12.4-1-arch is causing the issue for me. Tested earlier versions and it appears to work flawlessly. 4.12.3 doesnt seem to trigger the bug either. Now trying to reproduce it without luck bisecting meanwhile. dd loops dont help at all. Time is money friend
Last edited by Al.Piotrowicz (2017-08-11 19:26:28)
Offline
you can try to connect your hard disk to another sata port,
if all sata port are used on your motherboard then you can switch your sata device :
for example if hard-disk#1 is on sata port#1 and hard-disk#2 is on sata port#2 then swap them : hard-disk#1 on sata port#2 and hard-disk#2 on sata port#1
if you have a slow device ( dvd-burner, dvd player ) then don't put the dvd player on the same sata controler used by a fast device as hard-disk, in the past I discovered that the kernel can trigger weird bugs if a slow device ( dvd player ) was put on the same sata controler used by an hard disk ( at startup or on reboot the kernel will freeze randomly ),
the workaround I found is to connect the DVD player on another sata controler ( a PCie sata controler, instead of the sata ports of the motherboard ),
this problem of ata bug can occur on old motherboards where SATA ports are emulated as ATA in the bios,
check if in the bios you can find a setting related to port sata, sometimes you can disable the ATA emulation in order to use an advanced mode for sata ports
Offline
My problem went away:
Solved with the help of this bugreport -> https://bugs.launchpad.net/elementaryos/+bug/1576634
Aparently its a TLP issue, commenting
#SATA_LINKPWR_ON_AC=max_performance
#SATA_LINKPWR_ON_BAT=min_power
solved it. ( at least for a full day with lots of disk testing )
Offline
My problem went away:
Solved with the help of this bugreport -> https://bugs.launchpad.net/elementaryos/+bug/1576634
Aparently its a TLP issue, commenting
#SATA_LINKPWR_ON_AC=max_performance
#SATA_LINKPWR_ON_BAT=min_powersolved it. ( at least for a full day with lots of disk testing )
I'd say that is not a TLP issue. I would be more inclined to say that it is a disk firmware problem or a combination of disk firmware + board chipset see [1,2].
[1] https://mjg59.dreamwidth.org/34868.html
[2] https://mjg59.dreamwidth.org/42156.html
R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K
Offline
pereira_alex wrote:My problem went away:
Solved with the help of this bugreport -> https://bugs.launchpad.net/elementaryos/+bug/1576634
Aparently its a TLP issue, commenting
#SATA_LINKPWR_ON_AC=max_performance
#SATA_LINKPWR_ON_BAT=min_powersolved it. ( at least for a full day with lots of disk testing )
I'd say that is not a TLP issue. I would be more inclined to say that it is a disk firmware problem or a combination of disk firmware + board chipset see [1,2].
[1] https://mjg59.dreamwidth.org/34868.html
[2] https://mjg59.dreamwidth.org/42156.html
Don't know .... Might be, might not be .... since my ssd is from a different vendor than the one of the other bug report. Hope it helps anyone with the same problem.
Offline
This affects several devices - thus the warning in https://wiki.archlinux.org/index.php/Po … Management
About yours being a "similar issue" to the OP - did you have mentions of CommWake in *your* dmesg errors?
Offline
This affects several devices - thus the warning in https://wiki.archlinux.org/index.php/Po … Management
About yours being a "similar issue" to the OP - did you have mentions of CommWake in *your* dmesg errors?
Had the error messages like the OP of this thread , not like the OP of the thread i linked.
I don't have a copy of the messages here, but if it is that important i can reenable the setting and copy it.
Thanks for the link to the wiki.
Offline
First of all, thanks for all the replies and your big effort in trying to help with this hard to track bug type. For now WD Green libata.c errors are gone due to change the SB PCI sata mode into AHCI (was IDE-NATIVE previously) in system BIOS.
I will inform about any related errors that might occur in this thread.
Thank you.
Offline
Pages: 1