You are not logged in.

#26 2023-05-14 18:33:38

frumble
Member
From: Germany
Registered: 2012-05-20
Posts: 162
Website

Re: AMDGPU freeze problems with Linux 6.1.4+, need kernel bisection [SVD]

This kernel is bad.

Offline

#27 2023-05-14 19:19:54

loqs
Member
Registered: 2014-03-06
Posts: 17,737

Re: AMDGPU freeze problems with Linux 6.1.4+, need kernel bisection [SVD]

Offline

#28 2023-05-14 19:44:50

frumble
Member
From: Germany
Registered: 2012-05-20
Posts: 162
Website

Re: AMDGPU freeze problems with Linux 6.1.4+, need kernel bisection [SVD]

Damn! I will try this current kernel another few days until the next freeze, then revert back to your last version.

Offline

#29 2023-05-24 16:23:27

frumble
Member
From: Germany
Registered: 2012-05-20
Posts: 162
Website

Re: AMDGPU freeze problems with Linux 6.1.4+, need kernel bisection [SVD]

I’ve used this current kernel for 10 more days now and it was always stable after standby. Mysterious, what might have caused the previous freeze.

Offline

#30 2023-05-27 12:24:07

frumble
Member
From: Germany
Registered: 2012-05-20
Posts: 162
Website

Re: AMDGPU freeze problems with Linux 6.1.4+, need kernel bisection [SVD]

Still good. To clarify: I’m not talking about the 6.3 kernel line but your last bisection kernel.

Offline

#31 2023-05-27 18:37:00

loqs
Member
Registered: 2014-03-06
Posts: 17,737

Re: AMDGPU freeze problems with Linux 6.1.4+, need kernel bisection [SVD]

frumble wrote:

Still good. To clarify: I’m not talking about the 6.3 kernel line but your last bisection kernel.

To further clarify linux-6.1.3.r151.g4def68cc15f3-1-x86_64.pkg.tar.zst is now testing good?

Offline

#32 2023-05-27 19:49:20

frumble
Member
From: Germany
Registered: 2012-05-20
Posts: 162
Website

Re: AMDGPU freeze problems with Linux 6.1.4+, need kernel bisection [SVD]

Yes, this one proves surprisingly stable nonetheless.

Offline

#33 2023-05-27 20:04:51

loqs
Member
Registered: 2014-03-06
Posts: 17,737

Re: AMDGPU freeze problems with Linux 6.1.4+, need kernel bisection [SVD]

$ git bisect good
Bisecting: 1 revision left to test after this (roughly 1 step)
[d988f0bcf579b4bcb0b7aba217a882ec150bcc2a] drm/connector: send hotplug uevent on connector cleanup

https://drive.google.com/file/d/1syqd4Q … share_link linux-6.1.3.r153.gd988f0bcf579-1-x86_64.pkg.tar.zst
https://drive.google.com/file/d/1uiSe55 … share_link linux-headers-6.1.3.r153.gd988f0bcf579-1-x86_64.pkg.tar.zst

Offline

#34 2023-06-02 17:33:08

frumble
Member
From: Germany
Registered: 2012-05-20
Posts: 162
Website

Re: AMDGPU freeze problems with Linux 6.1.4+, need kernel bisection [SVD]

This kernel is good.

Offline

#35 2023-06-02 19:42:41

loqs
Member
Registered: 2014-03-06
Posts: 17,737

Re: AMDGPU freeze problems with Linux 6.1.4+, need kernel bisection [SVD]

Offline

#36 2023-06-02 20:29:22

frumble
Member
From: Germany
Registered: 2012-05-20
Posts: 162
Website

Re: AMDGPU freeze problems with Linux 6.1.4+, need kernel bisection [SVD]

Hm, then let me test it longer… It should occur after roughly 2 days and it has been six days by now without freeze.

Offline

#37 2023-06-07 01:51:19

frumble
Member
From: Germany
Registered: 2012-05-20
Posts: 162
Website

Re: AMDGPU freeze problems with Linux 6.1.4+, need kernel bisection [SVD]

Hurray, a freeze! Finally, lol! After 12 days this time. Considering that the merge "drm/connector: send hotplug uevent on connector cleanup" sounds like the only likely to cause this, and now it’s proven to occur, do we have certainty here?
The strange previous kernel that froze after two days but then run 13 [10 + 3 in post here] more days perfectly stable still confuses me…

Offline

#38 2023-06-07 21:11:13

loqs
Member
Registered: 2014-03-06
Posts: 17,737

Re: AMDGPU freeze problems with Linux 6.1.4+, need kernel bisection [SVD]

Almost

$ git bisect bad
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[35fe1c238437155153c1aeeb94572b04fa60e0b5] device_cgroup: Roll back to original exceptions after copy failure

The last commit to check would be 35fe1c238437155153c1aeeb94572b04fa60e0b5.
I can not see how the issue is related to cgroups.  So assuming it is good

$ git bisect good
d988f0bcf579b4bcb0b7aba217a882ec150bcc2a is the first bad commit
commit d988f0bcf579b4bcb0b7aba217a882ec150bcc2a
Author: Simon Ser <contact@emersion.fr>
Date:   Mon Oct 17 15:32:01 2022 +0000

    drm/connector: send hotplug uevent on connector cleanup
    
    commit 6fdc2d490ea1369d17afd7e6eb66fecc5b7209bc upstream.
    
    A typical DP-MST unplug removes a KMS connector. However care must
    be taken to properly synchronize with user-space. The expected
    sequence of events is the following:
    
    1. The kernel notices that the DP-MST port is gone.
    2. The kernel marks the connector as disconnected, then sends a
       uevent to make user-space re-scan the connector list.
    3. User-space notices the connector goes from connected to disconnected,
       disables it.
    4. Kernel handles the IOCTL disabling the connector. On success,
       the very last reference to the struct drm_connector is dropped and
       drm_connector_cleanup() is called.
    5. The connector is removed from the list, and a uevent is sent to tell
       user-space that the connector disappeared.
    
    The very last step was missing. As a result, user-space thought the
    connector still existed and could try to disable it again. Since the
    kernel no longer knows about the connector, that would end up with
    EINVAL and confused user-space.
    
    Fix this by sending a hotplug uevent from drm_connector_cleanup().
    
    Signed-off-by: Simon Ser <contact@emersion.fr>
    Cc: stable@vger.kernel.org
    Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
    Cc: Lyude Paul <lyude@redhat.com>
    Cc: Jonas Ådahl <jadahl@redhat.com>
    Tested-by: Jonas Ådahl <jadahl@redhat.com>
    Reviewed-by: Lyude Paul <lyude@redhat.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20221017153150.60675-2-contact@emersion.fr
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 drivers/gpu/drm/drm_connector.c | 3 +++
 1 file changed, 3 insertions(+)
$ git bisect log
git bisect start
# status: waiting for both good and bad commits
# bad: [2cb8e624295ffa0c4d659fcec7d9e7a6c48de156] Linux 6.1.4
git bisect bad 2cb8e624295ffa0c4d659fcec7d9e7a6c48de156
# good: [4adc0fbe03a69d3189607bf74e82a79c29c08b4a] Linux 6.1.3
git bisect good 4adc0fbe03a69d3189607bf74e82a79c29c08b4a
# good: [e9f7a3bbaa5c0bc1c9dab5bf3ea5f2802034e50b] cifs: fix confusing debug message
git bisect good e9f7a3bbaa5c0bc1c9dab5bf3ea5f2802034e50b
# bad: [3650c063f22d03795026bd6f3d473e5bbdabb442] drm/mgag200: Fix PLL setup for G200_SE_A rev >=4
git bisect bad 3650c063f22d03795026bd6f3d473e5bbdabb442
# good: [abbb887da77408892c0c8fb4cbbc2a5bb03b140e] riscv: Fixup compile error with !MMU
git bisect good abbb887da77408892c0c8fb4cbbc2a5bb03b140e
# good: [17183187dc862a828f8e54380d0596eafa0b09f8] hugetlb: really allocate vma lock for all sharable vmas
git bisect good 17183187dc862a828f8e54380d0596eafa0b09f8
# good: [553bc5890ed96a8d006224c3a4673c47fee0d12a] parisc: Fix locking in pdc_iodc_print() firmware call
git bisect good 553bc5890ed96a8d006224c3a4673c47fee0d12a
# good: [4def68cc15f37287a6b3bb8ccaaaba2aee6c5185] parisc: Drop PMD_SHIFT from calculation in pgtable.h
git bisect good 4def68cc15f37287a6b3bb8ccaaaba2aee6c5185
# bad: [d988f0bcf579b4bcb0b7aba217a882ec150bcc2a] drm/connector: send hotplug uevent on connector cleanup
git bisect bad d988f0bcf579b4bcb0b7aba217a882ec150bcc2a
# good: [35fe1c238437155153c1aeeb94572b04fa60e0b5] device_cgroup: Roll back to original exceptions after copy failure
git bisect good 35fe1c238437155153c1aeeb94572b04fa60e0b5
# first bad commit: [d988f0bcf579b4bcb0b7aba217a882ec150bcc2a] drm/connector: send hotplug uevent on connector cleanup

If you want to test it I can build a kernel for 35fe1c238437155153c1aeeb94572b04fa60e0b5.  However I would suggest opening a bug report upstream at this point.
The low reproducibility may be related to an intermittent issue in the DP cable/connector that triggers random disconnect events.

Offline

#39 2023-06-09 23:52:49

frumble
Member
From: Germany
Registered: 2012-05-20
Posts: 162
Website

Re: AMDGPU freeze problems with Linux 6.1.4+, need kernel bisection [SVD]

I’m trusting in your expertise that this will be sufficient. Now, I’m trying once again the lastest stable kernel 6.3.6 to verify this isn’t fixed in upstream. When I get the freeze, I will submit a regular bug report. Thank you very much for your help in this exhausting process.

Offline

Board footer

Powered by FluxBB