You are not logged in.

#1 2024-08-09 18:02:08

GerBra
Forum Fellow
From: Bingen/Germany
Registered: 2007-05-10
Posts: 238

[solved] Kernel bisect - don't find a bad commit

First: Sorry for my bad english

Situation:

In german archlinux forum we have a thread on a problem with extra/hddtemp
With linux-6.10.2 this tool shows the temperature of drives, on 6.10.3 it says always: Drive sleeping
Myself could confirm this also on the zen-kernels
Good/bad output on my system shows:

bad:
/dev/sda: CT500MX500SSD1                          ...@: drive is sleeping
good:
/dev/sda: CT500MX500SSD1: 40°C

A strace shows a different status on HDIO_DRIVE_CMD

bad (status 0xe5)
ioctl(3, HDIO_DRIVE_CMD, {command=ATA_CMD_CHK_POWER, sector_number=0, feature=0, sector_count=0} => {status=0xe5, error=0, nsector=0}) = 0
good (status 0x50)
ioctl(3, HDIO_DRIVE_CMD, {command=ATA_CMD_CHK_POWER, sector_number=0, feature=0, sector_count=0} => {status=0x50, error=0, nsector=255}) = 0
Problem

So i thought i could do a kernel bisect to find the (maybe) regression. I think, i do something very wrong cause i always end as the bisect "first bad commit" that the culprit is Torvalds Tag commit, ex:

gerhard@ws01 linux-torvalds]$ git bisect good
83a7eefedc9b56fe7bfeff13b6c7356688ffa670 is the first bad commit
commit 83a7eefedc9b56fe7bfeff13b6c7356688ffa670 (tag: v6.10-rc3)
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Sun Jun 9 14:19:43 2024 -0700

Linux 6.10-rc3

 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Here is the bisect log from first try (good: v6.10-rc2 bad: v6.10.rc3)

[gerhard@ws01 linux-torvalds]$ git bisect log
git bisect start
# Status: warte auf guten und schlechten Commit
# good: [c3f38fa61af77b49866b006939479069cd451173] Linux 6.10-rc2
git bisect good c3f38fa61af77b49866b006939479069cd451173
# Status: warte auf schlechten Commit, 1 guter Commit bekannt
# bad: [83a7eefedc9b56fe7bfeff13b6c7356688ffa670] Linux 6.10-rc3
git bisect bad 83a7eefedc9b56fe7bfeff13b6c7356688ffa670
# good: [d30d0e49da71de8df10bf3ff1b3de880653af562] Merge tag 'net-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
git bisect good d30d0e49da71de8df10bf3ff1b3de880653af562
# good: [e60721bf3ccaebcaff8dec3548a2daa6578f9361] Merge tag 'gpio-fixes-for-v6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
git bisect good e60721bf3ccaebcaff8dec3548a2daa6578f9361
# good: [329f70c5beaefe0e1197b7919e776dc005213b59] Merge tag 'kbuild-fixes-v6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
git bisect good 329f70c5beaefe0e1197b7919e776dc005213b59
# good: [c5dbc2ed0006d1a910b5496202a280138ce596e4] Merge tag '6.10-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
git bisect good c5dbc2ed0006d1a910b5496202a280138ce596e4
# good: [d6283b160a12010b2113cc64726a3c9eda13dc5f] tools headers uapi: Sync linux/stat.h with the kernel sources to pick STATX_SUBVOL
git bisect good d6283b160a12010b2113cc64726a3c9eda13dc5f
# good: [637c2dfcd9f5e194ab2e879704460840edcde537] Merge tag 'edac_urgent_for_v6.10_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
git bisect good 637c2dfcd9f5e194ab2e879704460840edcde537
# good: [5b3cde198878b2f3269d5e7efbc0d514899b1fd8] Revert "perf record: Reduce memory for recording PERF_RECORD_LOST_SAMPLES event"
git bisect good 5b3cde198878b2f3269d5e7efbc0d514899b1fd8
# good: [b8481381d4e2549f06812eb6069198144696340c] Merge tag 'perf-tools-fixes-for-v6.10-2-2024-06-09' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
git bisect good b8481381d4e2549f06812eb6069198144696340c
# first bad commit: [83a7eefedc9b56fe7bfeff13b6c7356688ffa670] Linux 6.10-rc3

All builds don't show the problem with hddtemp
So i thought i used a too early TAG for the initial bad status, cause we a using the 6.10.3 source-base already.

I started a second approach using the previous commit from the first run as the "good" state. And git tag v6.10-rc4 as the bad one. This is the git bisect log from the second try:

gerhard@ws01 linux-torvalds]$ git bisect log 
git bisect start
# Status: warte auf guten und schlechten Commit
# good: [637c2dfcd9f5e194ab2e879704460840edcde537] Merge tag 'edac_urgent_for_v6.10_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
git bisect good 637c2dfcd9f5e194ab2e879704460840edcde537
# Status: warte auf schlechten Commit, 1 guter Commit bekannt
# bad: [6ba59ff4227927d3a8530fc2973b80e94b54d58f] Linux 6.10-rc4
git bisect bad 6ba59ff4227927d3a8530fc2973b80e94b54d58f
# good: [ac3cb72aea010510eaa1e19ab001a0d28c6eb4ab] Merge tag 'io_uring-6.10-20240614' of git://git.kernel.dk/linux
git bisect good ac3cb72aea010510eaa1e19ab001a0d28c6eb4ab
# good: [e39388e430d0b170fdaf319059e719d3c6875d07] Merge tag 'edac_urgent_for_v6.10_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
git bisect good e39388e430d0b170fdaf319059e719d3c6875d07
# good: [e12fa4dd64ace0d7cd461d2e0d4b0cffb1d7e8b8] Merge tag 'driver-core-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
git bisect good e12fa4dd64ace0d7cd461d2e0d4b0cffb1d7e8b8
# good: [22f00812862564b314784167a89f27b444f82a46] USB: class: cdc-wdm: Fix CPU lockup caused by excessive log messages
git bisect good 22f00812862564b314784167a89f27b444f82a46
# good: [ae01e52da244af5d650378ada1bfd2d946dc1b45] serial: drop debugging WARN_ON_ONCE() from uart_write()
git bisect good ae01e52da244af5d650378ada1bfd2d946dc1b45
# good: [b5beaa44747bddbabb338377340244f56465cd7d] Merge tag 'usb-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
git bisect good b5beaa44747bddbabb338377340244f56465cd7d
# good: [7e9bb0cb50fec5d287749a58de5bb32220881b46] Merge tag 'i2c-host-fixes-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current
git bisect good 7e9bb0cb50fec5d287749a58de5bb32220881b46
# good: [72d95924ee35c8cd16ef52f912483ee938a34d49] parisc: Try to fix random segmentation faults in package builds
git bisect good 72d95924ee35c8cd16ef52f912483ee938a34d49
# good: [6456c4256d1cf1591634b39e58bced37539d35b1] Merge tag 'parisc-for-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
git bisect good 6456c4256d1cf1591634b39e58bced37539d35b1
# first bad commit: [6ba59ff4227927d3a8530fc2973b80e94b54d58f] Linux 6.10-rc4

And again: i ended with Torvalds commit for Linux 6.10-rc4 TAG as the bad commit...

At this point i am sure that i do something seriously bad...

My steps to find the problem:
1. installed linux-mainline (from the miffe repository). Could prove that the problem with hddtemp is still there.
2. cloned linux-git
2a. (use modprobe-db and localmodconfig for only compile used modules)
3. add a boot entry in grub for vmlinuz-linux-git
4. build and kernel install steps (on each git bisect good|bad steps, after git bisect start and give good/bad tags)
a) $ cd build/linux-git/
b) $ cd src/linux-torvalds/
c) $ git bisect good
d) $ cd ../..
e) $ makepkg -efL
f) $ sudo pacman -Rn linux-git linux-git-headers  # remove the previous build package
g) $ ls -la *.pkg.tar.zst
h) sudo pacman -U linux-git-6.10.***** linux-git-headers-6.10.******* # always the last build versions
5. reboot and checked the hddtemp. Give always "good" output.
6. rebootet my normal kernel and repeat the bisect good|bad and build steps above.

As mentioned: I never found a build with "bad" status. Of course i double checked that each boot with linux-git started the special "new" build.

My thoughts on my mistakes, maybe:
a) Must i do a "make clean" on each step between git bisect good|bad commands?
As mentioned in the Note-Block in https://wiki.archlinux.org/title/Bisect … #Bisecting
I assumed not, cause the each bisect "changed" the source code, so the compiler build each module,binary,vmlinuz new if necessary.
b) Should i not give a explicit "bad" tag/commit when starting git bisect?
Cause the problem also exist in mainline (that's 6.11.x)
I assumed that a problem which start in linux-6.10.3 could be brought "down" when good=v6.10-rc2 and bad=v6.10.rc4

Sorry again for the long post, but this is my first bisecting. If i should provide further infos i am fine with.

Last edited by GerBra (2024-08-09 18:49:50)

Offline

#2 2024-08-09 18:09:35

gromit
Administrator
From: Germany
Registered: 2024-02-10
Posts: 1,334
Website

Re: [solved] Kernel bisect - don't find a bad commit

Did you have a look at https://lore.kernel.org/all/0bf3f2f0-0f … @heusel.eu already? This could be related from the changes there tongue

Offline

#3 2024-08-09 18:18:12

GerBra
Forum Fellow
From: Bingen/Germany
Registered: 2007-05-10
Posts: 238

Re: [solved] Kernel bisect - don't find a bad commit

@gromit:
Thanks, this looks similar.
I also checked hdparm, but forgot it. I only checked if smartctl still shows the temperatures.
Maybe i/we stepped into this report.

Could you also help me with my problem why i don't have found this during my bisect try's ?
I'm very angry on myself... ;-)

Offline

#4 2024-08-09 18:32:06

gromit
Administrator
From: Germany
Registered: 2024-02-10
Posts: 1,334
Website

Re: [solved] Kernel bisect - don't find a bad commit

Yeah so for me the issue also goes away on a kernel version that does not have the above issue tongue

# on bad kernel for the above issue
$ hddtemp /dev/sda
/dev/sda: WDC WD40EFRX-68N32N0                    : drive is sleeping

# on good kernel for the above issue
$ hddtemp /dev/sda                                
/dev/sda: WDC WD40EFRX-68N32N0: 31°C

So the two could indeed be related ...

The issues with your bisects seem to be that you have selected the wrong test range and then tested good all the way through so the git bisect assumes that your initially marked bad commit is the first one.
So note that the selected 6.10-rc3 is something completely different from 6.10.3, as the first one is a release candidate for a mainline release and the other one is a stable kernel as released by the stable team. See https://www.kernel.org/category/releases.html for a bit more information on that.
So what you should instead have done is work on the stable tree for and mark 6.10.3 as bad and 6.10.2 as good, but it get's a bit more involved there.

Offline

#5 2024-08-09 18:43:17

gromit
Administrator
From: Germany
Registered: 2024-02-10
Posts: 1,334
Website

Re: [solved] Kernel bisect - don't find a bad commit

I have now also added your issue to the thread on the Kernel mailing list so the developers are aware about this breakage aswell ..

Edit: This is the message https://lore.kernel.org/all/df43ed14-97 … heusel.eu/

Last edited by gromit (2024-08-09 18:45:08)

Offline

#6 2024-08-09 18:44:40

GerBra
Forum Fellow
From: Bingen/Germany
Registered: 2007-05-10
Posts: 238

Re: [solved] Kernel bisect - don't find a bad commit

gromit wrote:

So note that the selected 6.10-rc3 is something completely different from 6.10.3, as the first one is a release candidate for a mainline release and the other one is a stable kernel as released by the stable team.

Aah, that's something i also asked myself: would tag v6.10-rc3 the "same" as (arch)linux-6.10.3
Thanks for clarification...

In linux-git source tree i have found meanwhile the mentioned bad commit in git log, seems it was created in TAG 6.20-rc6
So my test range was always to short, as you say. For fun, i will start a new bisect on linux-git and give this tag (or master) as the "bad" state. Only to see if i would find the problem.

And i will try the same on the stable kernel tree, as you say.
I'm new to this, so i thought i MUST use the linux-git repo.

Thank you very much for "things to think about.." <g>

//Edit: And also thanks about bringing this to LKML!

Offline

#7 2024-08-09 19:52:40

bsdice
Member
Registered: 2016-08-06
Posts: 15

Re: [solved] Kernel bisect - don't find a bad commit

I saw the issue first with UDisks, which is also affected. I reverted three patches and problem went away on my 6.6.44 based on linux-lts:

https://github.com/storaged-project/udi … 2271733527

Offline

Board footer

Powered by FluxBB