You are not logged in.
Hello,
Suddenly, there seems to be a problem with my nvidia video driver on my ASUS A17 laptop.
- sddm won't start, the screen is black. when diabling sddm I can boot to a terminal login.
- Also, when booting the laptop (also after complete switch off) no post, bios or grub are shown..
From the commandline, I can start KDE by using startx.
I am suspecting a hardware problem ( rather than something with Arch) since also no bootmessages before grub are visible.....
However, the DMESG message seems to indicate it is 'correctable' :
Journalctl log: http://0x0.st/XkXR.txt
dmesg: http://0x0.st/XkXC.txt
lspci: http://0x0.st/XkKj.csv
Any thoughts on how ' correctable' this is, and on how to do that?
Thanks,
Alex.
Last edited by _lex_1234 (2024-11-29 09:28:31)
Offline
There's a metric shit-ton of
nov 11 18:15:04 alexarch kernel: pcieport 0000:00:01.1: AER: Correctable error message received from 0000:01:00.0
nov 11 18:15:04 alexarch kernel: nvidia 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Receiver ID)
nov 11 18:15:04 alexarch kernel: nvidia 0000:01:00.0: device [10de:28e0] error status/mask=00000040/0000a000
nov 11 18:15:04 alexarch kernel: nvidia 0000:01:00.0: [ 6] BadTLP
nov 11 18:14:44 alexarch kernel: amdgpu 0000:36:00.0: [drm] Cannot find any crtc or sizes
nov 11 18:14:50 alexarch sddm-greeter-qt6[835]: Adding view for ":0.0" QRect(0,0 0x0)
smells like there's no output attached to the amd GPU?
Is this a notebook?
nov 11 18:14:44 alexarch kernel: nvme0n1: p1 p2 p3
nov 11 18:14:44 alexarch kernel: nvme1n1: p1 p2 p3 p4 p5 p6
Is there a parallel windows installation?
From the commandline, I can start KDE by using startx.
Please post your Xorg log, https://wiki.archlinux.org/title/Xorg#General
Offline
Hi Seth, thanks for your reply.
Is this a notebook?
Yes. It is an ASUS TUF Gaming A17 laptop.
nov 11 18:14:44 alexarch kernel: nvme0n1: p1 p2 p3 nov 11 18:14:44 alexarch kernel: nvme1n1: p1 p2 p3 p4 p5 p6
Is there a parallel windows installation?
Yes, but i never use/boot it. Arch is installed on the nvme1n1.
From the commandline, I can start KDE by using startx.
Please post your Xorg log, https://wiki.archlinux.org/title/Xorg#General
There are two Xorg logs from yesterday in /var/log:
Xorg0.log: http://0x0.st/XkqH.txt
Xorg1.log: http://0x0.st/XkqX.txt
And then there is the ' rootless' one (currently in use) from using startx: http://0x0.st/XkqK.txt
Alex.
Last edited by _lex_1234 (2024-11-12 05:17:48)
Offline
Yes, but i never use/boot it.
Borderline irrelevant, see the 3rd link below. Mandatory.
Disable it (it's NOT the BIOS setting!) and reboot windows and linux twice for voodo reasons.
[ 9.887] (WW) modeset(0): Unable to find connected outputs - setting 1024x768 initial framebuffer
The system tries to run on the AMD APU, but finds no output.
[ 10.179] (--) NVIDIA(GPU-0): AU Optronics Corporation B173HAN04.9 (DFP-3): connected
[ 10.179] (--) NVIDIA(GPU-0): AU Optronics Corporation B173HAN04.9 (DFP-3): Internal DisplayPort
[ 10.179] (--) NVIDIA(GPU-0): AU Optronics Corporation B173HAN04.9 (DFP-3): 2670.0 MHz maximum pixel clock
[ 10.179] (--) NVIDIA(GPU-0):
[ 10.186] (II) NVIDIA(G0): Validated MetaModes:
[ 10.186] (II) NVIDIA(G0): "NULL"
[ 10.186] (II) NVIDIA(G0): Virtual screen size determined to be 640 x 480
[ 10.239] (WW) NVIDIA(G0): Cannot find size of first mode for AU Optronics Corporation
w/ nvidia there is an eDP? but no EDID.
This is in the SDDM and the startx log.
What kind of monitor do you use w/ the system and what's the BIOS config? Did you try to disable the APU or the GPU?
Eventually, from the startx run
[ 26.712] (--) NVIDIA(GPU-0): AU Optronics Corporation B173HAN04.9 (DFP-3): connected
[ 26.712] (--) NVIDIA(GPU-0): AU Optronics Corporation B173HAN04.9 (DFP-3): Internal DisplayPort
[ 26.712] (--) NVIDIA(GPU-0): AU Optronics Corporation B173HAN04.9 (DFP-3): 2670.0 MHz maximum pixel clock
[ 26.712] (--) NVIDIA(GPU-0):
[ 26.956] (II) modeset(0): Allocate new frame buffer 1920x1080 stride
[ 27.002] (II) NVIDIA(G0): Setting mode "DP-1-2: 1920x1080_144 @1920x1080 +0+0 {AllowGSYNC=Off, ViewPortIn=1920x1080, ViewPortOut=1920x1080+0+0}"
But that's probably kscreen setting a saved mode - SDDM doesn't benefit from that.
You can inject a mode or EDID, https://wiki.archlinux.org/title/Xrandr … esolutions & https://wiki.archlinux.org/title/Kernel … s_and_EDID but that ignores the nvidiaz related PCI errors and general "weirdness" of the situation.
You'd expect, in hybrid mode, the eDP being wired to the AMD APU and run the server on that.
Try to toggle the hybrid/dgpu mode(s) in the BIOS/UEFI (ie, to use the nvidiaz GPU only or the APU only or hybrid mode, anything - in doubt forth and back)
Offline
What kind of monitor do you use w/ the system and what's the BIOS config? Did you try to disable the APU or the GPU?
No monitor, just the laptop screen.
Try to toggle the hybrid/dgpu mode(s) in the BIOS/UEFI (ie, to use the nvidiaz GPU only or the APU only or hybrid mode, anything - in doubt forth and back)
I cannot enter the BIOS/UEFI ( i.e. no screen).
Would you have any possibility on how I can perform that from within Arch? I.e. force a certain setting on a reboot?
Since I do not see the bios screen or bootscreen or GRUB (all black). The first thing (after disabling sddm) is the command line login.
I contemplate later on opening the machine and removing the Arch-ssd , since that might force the system to boot from the other one.
You can inject a mode or EDID, https://wiki.archlinux.org/title/Xrandr … esolutions & https://wiki.archlinux.org/title/Kernel … s_and_EDID but that ignores the nvidiaz related PCI errors and general "weirdness" of the situation.
Looking into that. So far no EDID file is present, and extracting one using the tools mentioned ( read-edid) gives errors:
[root@alexarch ]# get-edid -m 0
0
This is read-edid version 3.0.2. Prepare for some fun.
Attempting to use i2c interface
No EDID on bus 0
No EDID on bus 1
No EDID on bus 2
No EDID on bus 3
No EDID on bus 4
No EDID on bus 5
No EDID on bus 6
No EDID on bus 7
No EDID on bus 8
No EDID on bus 9
No EDID on bus 10
No EDID on bus 11
No EDID on bus 12
No EDID on bus 13
No EDID on bus 14
No EDID on bus 15
No EDID on bus 16
No EDID on bus 17
No EDID on bus 18
No EDID on bus 19
No EDID on bus 20
Problem requesting slave address: Device or resource busy
No EDID on bus 22
No EDID on bus 24
No EDID on bus 25
No EDID on bus 26
No EDID on bus 27
1 potential busses found: 23
Bus 23 doesn't really have an EDID...
Couldn't find an accessible EDID on this computer.
Attempting to use the classical VBE interface
Performing real mode VBE call
Interrupt 0x10 ax=0x4f00 bx=0x0 cx=0x0
Function unsupported
Call failed
VBE version 0
VBE string at 0x0 "O"
VBE/DDC service about to be called
Report DDC capabilities
Performing real mode VBE call
Interrupt 0x10 ax=0x4f15 bx=0x0 cx=0x0
Function unsupported
Call failed
Reading next EDID block
VBE/DDC service about to be called
Read EDID
Performing real mode VBE call
Interrupt 0x10 ax=0x4f15 bx=0x1 cx=0x0
Function unsupported
Call failed
The EDID data should not be trusted as the VBE call failed
Error: output block unchanged
I'm sorry nothing was successful. Maybe try some other arguments
if you played with them, or send an email to Matthew Kern <pyrophobicman@gmail.com>.
Last edited by _lex_1234 (2024-11-12 20:06:00)
Offline
Would you have any possibility on how I can perform that from within Arch?
Nope. Do you have an external output you can (temporarily) attach?
For the EDID
for OUT in /sys/class/drm/card*; do echo $OUT; edid-decode $OUT/edid; echo "================="; done
You'll need https://aur.archlinux.org/packages/edid-decode-git but I doubt you'll get anything from there.
Offline
Nope. Do you have an external output you can (temporarily) attach?
I did attach a screen to the HDMI port now. This duplicates the screen ( or extends it), but does not show any earlier ' light' than the laptop screen itself. So bios,uefi/grub/boot is still dark.
For the EDID
for OUT in /sys/class/drm/card*; do echo $OUT; edid-decode $OUT/edid; echo "================="; done
You'll need https://aur.archlinux.org/packages/edid-decode-git but I doubt you'll get anything from there.
I installed edid-decode and I did now find an EDID file.
[alex@alexarch ~]$ for OUT in /sys/class/drm/card*; do echo $OUT; edid-decode $OUT/edid; echo "================="; done
/sys/class/drm/card0
/sys/class/drm/card0/edid: No such file or directory
=================
/sys/class/drm/card0-DP-1
EDID of '/sys/class/drm/card0-DP-1/edid' was empty.
=================
/sys/class/drm/card0-DP-2
EDID of '/sys/class/drm/card0-DP-2/edid' was empty.
=================
/sys/class/drm/card0-DP-3
EDID of '/sys/class/drm/card0-DP-3/edid' was empty.
=================
/sys/class/drm/card0-DP-4
EDID of '/sys/class/drm/card0-DP-4/edid' was empty.
=================
/sys/class/drm/card0-DP-5
EDID of '/sys/class/drm/card0-DP-5/edid' was empty.
=================
/sys/class/drm/card0-DP-6
EDID of '/sys/class/drm/card0-DP-6/edid' was empty.
=================
/sys/class/drm/card0-DP-7
EDID of '/sys/class/drm/card0-DP-7/edid' was empty.
=================
/sys/class/drm/card0-DP-8
EDID of '/sys/class/drm/card0-DP-8/edid' was empty.
=================
/sys/class/drm/card0-eDP-1
EDID of '/sys/class/drm/card0-eDP-1/edid' was empty.
=================
/sys/class/drm/card0-Writeback-1
EDID of '/sys/class/drm/card0-Writeback-1/edid' was empty.
=================
/sys/class/drm/card1
/sys/class/drm/card1/edid: No such file or directory
=================
/sys/class/drm/card1-DP-9
EDID of '/sys/class/drm/card1-DP-9/edid' was empty.
=================
/sys/class/drm/card1-eDP-2
edid-decode (hex):
00 ff ff ff ff ff ff 00 06 af a7 dd 00 00 00 00
e5 20 01 04 a5 26 16 78 03 70 75 93 58 5a 94 29
20 50 54 00 00 00 01 01 01 01 01 01 01 01 01 01
01 01 01 01 01 01 ec 3b 80 b6 70 38 88 40 30 20
a5 00 7e d7 10 00 00 18 00 00 00 fd 00 3c 90 b0
b0 25 01 0a 20 20 20 20 20 20 00 00 00 fe 00 41
55 4f 0a 20 20 20 20 20 20 20 20 20 00 00 00 fc
00 42 31 37 33 48 41 4e 30 34 2e 39 20 0a 01 a1
70 20 79 02 00 22 00 14 0b 9e 05 84 7f 07 b5 00
2f 80 1f 00 37 04 87 00 09 00 04 00 2b 00 06 27
00 3c 8f 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 90
----------------
Block 0, Base EDID:
EDID Structure Version & Revision: 1.4
Vendor & Product Identification:
Manufacturer: AUO
Model: 56743
Made in: week 229 of 2022
Basic Display Parameters & Features:
Digital display
Bits per primary color channel: 8
DisplayPort interface
Maximum image size: 38 cm x 22 cm
Gamma: 2.20
Supported color formats: RGB 4:4:4
First detailed timing includes the native pixel format and preferred refresh rate
Display supports continuous frequencies
Color Characteristics:
Red : 0.5751, 0.3466
Green: 0.3515, 0.5781
Blue : 0.1611, 0.1279
White: 0.3134, 0.3291
Established Timings I & II: none
Standard Timings: none
Detailed Timing Descriptors:
DTD 1: 1920x1080 60.014898 Hz 16:9 72.978 kHz 153.400000 MHz (382 mm x 215 mm)
Hfront 48 Hsync 32 Hback 102 Hpol N
Vfront 10 Vsync 5 Vback 121 Vpol N
Display Range Limits:
Monitor ranges (Range Limits Only): 60-144 Hz V, 176-176 kHz H, max dotclock 370 MHz
Alphanumeric Data String: 'AUO'
Display Product Name: 'B173HAN04.9 '
Extension blocks: 1
Checksum: 0xa1
----------------
Block 1, DisplayID Extension Block:
Version: 2.0
Extension Count: 0
Display Product Primary Use Case: None of the listed primary use cases; generic display
Video Timing Modes Type 7 - Detailed Timings Data Block:
DTD: 1920x1080 144.027931 Hz 16:9 175.138 kHz 368.140000 MHz (aspect 16:9, no 3D stereo, preferred)
Hfront 48 Hsync 32 Hback 102 Hpol P
Vfront 10 Vsync 5 Vback 121 Vpol N
Adaptive Sync Data Block:
Descriptor #1:
Native Panel Range
Fixed Average V-Total and Adaptive V-Total
Supports Seamless Transition
'Max Single Frame Duration Increase' field value without jitter impact
'Max Single Frame Duration Decrease' field value without jitter impact
Max Duration Increase: 0.00 ms
Max Duration Decrease: 0.00 ms
Min Refresh Rate: 60 Hz
Max Refresh Rate: 144 Hz
Checksum: 0x02
Checksum: 0x90
=================
/sys/class/drm/card1-HDMI-A-1
EDID of '/sys/class/drm/card1-HDMI-A-1/edid' was empty.
=================
[alex@alexarch ~]$
I copied the specific file for later usage, but I am now still trying to figure out how to use this specific file:
alex@alexarch edidfiles]$ cp /sys/class/drm/card1-eDP-2/edid ./edid-card1-eDP-2
( probably referencing it in the kernel commandline, but I do not want to mess up my grub file too much since it might prohibit me from booting)
Offline
https://wiki.archlinux.org/title/Kernel … s_and_EDID
Putting edid-card1-eDP-2 into /usr/lib/firmware/edid and also add it to the initramfs (FILES array in mkinitcpio.conf) and add "drm.edid_firmware=edid/edid-card1-eDP-2" to the kernel parameters.
You can edit … no you can't because you can't see anything.
You could at some point write it into /sys/kernel/debug/dri/1/eDP-2/edid_override but it shows up there anyway and you probably want it in dri/0/eDP-1 to begin with.
Hard resetting the BIOS (w/o access to it) seems to require to remove the cmos battery in that model, some models can apparently be reset by holding the power button for minutes to boot the system.
No idea whether that works.
Offline
https://wiki.archlinux.org/title/Kernel … s_and_EDID
Putting edid-card1-eDP-2 into /usr/lib/firmware/edid and also add it to the initramfs (FILES array in mkinitcpio.conf) and add "drm.edid_firmware=edid/edid-card1-eDP-2" to the kernel parameters.
You can edit … no you can't because you can't see anything.
I did this, but to no avail.... It is there after booting, so it was added:
[root@alexarch alex]# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-linux root=UUID=7c36495b-3393-4af4-a40c-41301c38e98c rw loglevel=3 drm.edid_firmware=edid/edid-card1-eDP-2
But loafing is unsuccessful ' Direct firmware load for edid/edid-card1-eDP-2 failed with error -2'
DMESG: http://0x0.st/Xkgy.txt
You can edit … no you can't because you can't see anything.
Right, indeed, usually I would just edit the grub entry to try..... no use here :-)
Hard resetting the BIOS (w/o access to it) seems to require to remove the cmos battery in that model, some models can apparently be reset by holding the power button for minutes to boot the system.
No idea whether that works.
Yeah, next step ( for tomorrow) is probably opening the laptop and doing a hard reset somehow.
Unless I have some other idea ( maybe I try booting from a different medium, but it needs to detect it first and I cant change the boot order...)
PS I now notice that also my keyboard backlights seem to have stopped working. They are turned off, and can't get them back on ( 'echo 1 > /sys/class/leds/asus\:\:kbd_backlight/brightness') . Mentioning it since it could be a coincidence, but maybe not.
Last edited by _lex_1234 (2024-11-14 22:15:47)
Offline
Ok, I connected the laptop via HDMI to an external device and again extracted the edid file.
Still the same problem:
[ 8.613769] nvidia 0000:01:00.0: Direct firmware load for edid/edid-card1-HDMI-A1 failed with error -2
[ 8.613776] nvidia 0000:01:00.0: [drm] *ERROR* [CONNECTOR:103:DP-9] Requesting EDID firmware "edid/edid-card1-HDMI-A1" failed (err=-2)
So this makes me wonder if I treat the edid-file in the right way...
Do I need to do something else with the file?
(which for me so far means copying it from /sys/class/drm/card1-something/edid to the /var/lib/firmware? )
To be honest, the whole concept of the EDID file was new to me earlier this week :-) .
See dmesg:
Alex.
Offline
Putting edid-card1-eDP-2 into /usr/lib/firmware/edid and
ie /usr/lib/firmware/edid/edid-card1-eDP-2 or in this case /usr/lib/firmware/edid/edid-card1-HDMI-A1
"-2" means the file doesn't exist, the parameter is the path in relation to /usr/lib/firmware
But the nvidia driver load really late anyway
[ 7.206624] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
and you'll also never get a BIOS or bootloader output this way.
Offline
I tried resetting by disconnecting the battery and pressing the power button for 60 seconds, but no succes. ( there is no separate CMOS battery I understand).
Unfortunately, that didn't help.
I now filed a question at ASUS on how to do this, since it seems really hard to find exact instructions on internet on how to ' hard reset cmos' for the Asus A17 707 . ( Will post any results here for the record).
seth wrote:Putting edid-card1-eDP-2 into /usr/lib/firmware/edid and
ie /usr/lib/firmware/edid/edid-card1-eDP-2 or in this case /usr/lib/firmware/edid/edid-card1-HDMI-A1
"-2" means the file doesn't exist, the parameter is the path in relation to /usr/lib/firmwareBut the nvidia driver load really late anyway
[ 7.206624] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
and you'll also never get a BIOS or bootloader output this way.
Yes, I understand this will not solve the ' main proble'm' ( no screen during boot, post, splash, grub, bios).
But, for me I would already be happy if i can make SDDM load in the regular manor to ensure it is no hardware problem.
Also, ' file doesn't exist' seems weird since the file is actually there, so i am wondering what is going on.
Best,
Alexander.
Offline
Iff the module is in the initramfs (what given the late load seems unlikely) and the edid isn't, you'll get this.
ls -l /usr/lib/firmware/edid
Edit: for X11 the nvidia driver has an option to load a custom edid and consider the GPU present (this should help you w/ SDDM)
Option "CustomEDID" "string"
This option forces the X driver to use the EDID specified in a file rather
than the display's EDID. You may specify a semicolon separated list of
display names and filename pairs. Valid display device names include
"CRT-0", "CRT-1", "DFP-0", "DFP-1", "TV-0", "TV-1", or one of the generic
names "CRT", "DFP", "TV", which apply the EDID to all devices of the
specified type. Additionally, if SLI Mosaic is enabled, this name can be
prefixed by a GPU name (e.g., "GPU-0.CRT-0"). The file contains a raw EDID
(e.g., a file generated by nvidia-settings).For example:
Option "CustomEDID" "CRT-0:/tmp/edid1.bin; DFP-0:/tmp/edid2.bin"
will assign the EDID from the file /tmp/edid1.bin to the display device
CRT-0, and the EDID from the file /tmp/edid2.bin to the display device
DFP-0. Note that a display device name must always be specified even if
only one EDID is specified.Caution: Specifying an EDID that doesn't exactly match your display may
damage your hardware, as it allows the driver to specify timings beyond
the capabilities of your display. Use with care.When this option is set for an X screen, it will be applied to all X
screens running on the same GPU.
Option "ConnectedMonitor" "string"
Allows you to override what the NVIDIA kernel module detects is connected
to your graphics card. This may be useful, for example, if you use a KVM
(keyboard, video, mouse) switch and you are switched away when X is
started. In such a situation, the NVIDIA kernel module cannot detect which
display devices are connected, and the NVIDIA X driver assumes you have a
single CRT.Valid values for this option are "CRT" (cathode ray tube) or "DFP"
(digital flat panel); if using multiple display devices, this option may
be a comma-separated list of display devices; e.g.: "CRT, CRT" or "CRT,
DFP".It is generally recommended to not use this option, but instead use the
"UseDisplayDevice" option.NOTE: anything attached to a 15 pin VGA connector is regarded by the
driver as a CRT. "DFP" should only be used to refer to digital flat panels
connected via DVI, HDMI, or DisplayPort.When this option is set for an X screen, it will be applied to all X
screens running on the same GPU.Default: string is NULL (the NVIDIA driver will detect the connected
display devices).
/usr/share/doc/nvidia/README
Last edited by seth (2024-11-16 21:40:22)
Offline
Hi all,
Seth, thanks for your help and for bearing with me on this!
Problem seems solved, the video (when switching on the laptop) works once again.
I am posting here for completeness and to document my attempts. However, I am not 100% sure what fixed the problem in the end, so this is a bit dissatisfying and unhelpful for others.
It looks a bit like the 'windows Voodoo' mentioned above by Seth did a miracle, but a slow one: Get into windows, try to update & finally reboot windows as well as linux couple of times..... and even then some more times.....
Best,
Alexander.
Overview
Current (solved) situation:
- I managed to get the visuals on startup of the laptop back, and i can enter the BIOS once more. Laptop seems to be in old working order.
- It is not 100% clear which of the actions now ' fixed' it, since no specific action led to the restoration of functionality in one attempt. Rather, after doing all this, at some point on a startup the 'splash screen' ('asus incredible') reappeared.
First:
- all steps above in this thread, a.o.:
- Hard-reset the computer (disconnect battery, 3 minuted button hold since it is unclear what time is actually needed).
- attached an external monitor
Start Windows (tricky):
- Attempt to start the windows installation on the original drive (without seeing anything) by pressing F8, and then waiting 10-15seconds, and then two times the down arrow and enter.
- Note that F8 opens the bootmenu, but without a working screen, getting the right timing and key combination took some trial and error. For example I did not know which item on the list is windows, so tried various combinations
Once I practiced enough so I could relatively reliably restart windows:
- update windows (which took multiple passes: update, reboot, update even more, etc)
- install 'MyAsus' , (create account....)
- update asus drivers, reboot when needed, update the rest.
- update BIOS & install fixes. Note that not all Asus/BIOS updates could be finished, since they required entering the BIOS (which was not possible)
- Reboot, back to arch, start sddm manually.
- and suddenly, the next day, the splash screen reappears, allowing me to finish the BIOS update. System now works again.
Current DMESG with working system: https://0x0.st/XRWU.txt
Some lessons:
- I originally planned to wipe the 500GB windows disk for additional space, but now decided to keep it on the laptop. ( in my case no real problem since I added a 2 TB NVME disk that I boot from for daily use with Arch).
- I am still not sure what actually fixed the problem (or even what caused the actual problem in the first place), but:
- It seems a BIOS/CMOS software problem, and managing to get into windows and gradually updating using some ASUS tools migth have done the trick.
Last edited by _lex_1234 (2024-11-29 09:34:04)
Offline