You are not logged in.

#1 2024-11-11 17:30:56

_lex_1234
Member
Registered: 2019-09-08
Posts: 30

[Solved] NVIDIA card error (hardware failure?)

Hello,

Suddenly, there seems to be a problem with my nvidia video driver on my ASUS A17 laptop.

- sddm won't start, the screen is black. when diabling sddm I can boot to a terminal login.
- Also, when booting the laptop  (also after complete switch off)  no post, bios or grub are shown..

From the commandline, I can start KDE by using startx.

I am suspecting a hardware problem ( rather than something with Arch) since also no bootmessages before grub are visible.....

However, the DMESG message seems to indicate it is 'correctable' :

Journalctl log:  http://0x0.st/XkXR.txt
dmesg:  http://0x0.st/XkXC.txt
lspci:  http://0x0.st/XkKj.csv

Any thoughts on how ' correctable'  this is, and on how to do that?

Thanks,

Alex.

Last edited by _lex_1234 (2024-11-29 09:28:31)

Offline

#2 2024-11-11 22:48:47

seth
Member
Registered: 2012-09-03
Posts: 59,042

Re: [Solved] NVIDIA card error (hardware failure?)

There's a metric shit-ton of

nov 11 18:15:04 alexarch kernel: pcieport 0000:00:01.1: AER: Correctable error message received from 0000:01:00.0
nov 11 18:15:04 alexarch kernel: nvidia 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Receiver ID)
nov 11 18:15:04 alexarch kernel: nvidia 0000:01:00.0:   device [10de:28e0] error status/mask=00000040/0000a000
nov 11 18:15:04 alexarch kernel: nvidia 0000:01:00.0:    [ 6] BadTLP                
nov 11 18:14:44 alexarch kernel: amdgpu 0000:36:00.0: [drm] Cannot find any crtc or sizes
nov 11 18:14:50 alexarch sddm-greeter-qt6[835]: Adding view for ":0.0" QRect(0,0 0x0)

smells like there's no output attached to the amd GPU?
Is this a notebook?

nov 11 18:14:44 alexarch kernel:  nvme0n1: p1 p2 p3
nov 11 18:14:44 alexarch kernel:  nvme1n1: p1 p2 p3 p4 p5 p6

Is there a parallel windows installation?

From the commandline, I can start KDE by using startx.

Please post your Xorg log, https://wiki.archlinux.org/title/Xorg#General

Offline

#3 2024-11-12 05:14:22

_lex_1234
Member
Registered: 2019-09-08
Posts: 30

Re: [Solved] NVIDIA card error (hardware failure?)

Hi Seth, thanks for your reply.

Is this a notebook?

Yes. It is an ASUS TUF Gaming A17 laptop.

nov 11 18:14:44 alexarch kernel:  nvme0n1: p1 p2 p3
nov 11 18:14:44 alexarch kernel:  nvme1n1: p1 p2 p3 p4 p5 p6

Is there a parallel windows installation?

Yes, but i never use/boot it. Arch is installed on the nvme1n1.

From the commandline, I can start KDE by using startx.

Please post your Xorg log, https://wiki.archlinux.org/title/Xorg#General


There are two Xorg logs from yesterday in /var/log:
Xorg0.log:  http://0x0.st/XkqH.txt
Xorg1.log: http://0x0.st/XkqX.txt
And then there is the ' rootless'  one (currently in use) from using startx: http://0x0.st/XkqK.txt


Alex.

Last edited by _lex_1234 (2024-11-12 05:17:48)

Offline

#4 2024-11-12 07:34:14

seth
Member
Registered: 2012-09-03
Posts: 59,042

Re: [Solved] NVIDIA card error (hardware failure?)

Yes, but i never use/boot it.

Borderline irrelevant, see the 3rd link below. Mandatory.
Disable it (it's NOT the BIOS setting!) and reboot windows and linux twice for voodo reasons.

[     9.887] (WW) modeset(0): Unable to find connected outputs - setting 1024x768 initial framebuffer

The system tries to run on the AMD APU, but finds no output.

[    10.179] (--) NVIDIA(GPU-0): AU Optronics Corporation B173HAN04.9 (DFP-3): connected
[    10.179] (--) NVIDIA(GPU-0): AU Optronics Corporation B173HAN04.9 (DFP-3): Internal DisplayPort
[    10.179] (--) NVIDIA(GPU-0): AU Optronics Corporation B173HAN04.9 (DFP-3): 2670.0 MHz maximum pixel clock
[    10.179] (--) NVIDIA(GPU-0): 
[    10.186] (II) NVIDIA(G0): Validated MetaModes:
[    10.186] (II) NVIDIA(G0):     "NULL"
[    10.186] (II) NVIDIA(G0): Virtual screen size determined to be 640 x 480
[    10.239] (WW) NVIDIA(G0): Cannot find size of first mode for AU Optronics Corporation

w/ nvidia there is an eDP? but no EDID.
This is in the SDDM and the startx log.

What kind of monitor do you use w/ the system and what's the BIOS config? Did you try to disable the APU or the GPU?

Eventually, from the startx run

[    26.712] (--) NVIDIA(GPU-0): AU Optronics Corporation B173HAN04.9 (DFP-3): connected
[    26.712] (--) NVIDIA(GPU-0): AU Optronics Corporation B173HAN04.9 (DFP-3): Internal DisplayPort
[    26.712] (--) NVIDIA(GPU-0): AU Optronics Corporation B173HAN04.9 (DFP-3): 2670.0 MHz maximum pixel clock
[    26.712] (--) NVIDIA(GPU-0): 
[    26.956] (II) modeset(0): Allocate new frame buffer 1920x1080 stride
[    27.002] (II) NVIDIA(G0): Setting mode "DP-1-2: 1920x1080_144 @1920x1080 +0+0 {AllowGSYNC=Off, ViewPortIn=1920x1080, ViewPortOut=1920x1080+0+0}"

But that's probably kscreen setting a saved mode - SDDM doesn't benefit from that.
You can inject a mode or EDID, https://wiki.archlinux.org/title/Xrandr … esolutions & https://wiki.archlinux.org/title/Kernel … s_and_EDID but that ignores the nvidiaz related PCI errors and general "weirdness" of the situation.

You'd expect, in hybrid mode, the eDP being wired to the AMD APU and run the server on that.
Try to toggle the hybrid/dgpu mode(s) in the BIOS/UEFI (ie, to use the nvidiaz GPU only or the APU only or hybrid mode, anything - in doubt forth and back)

Offline

#5 2024-11-12 20:03:59

_lex_1234
Member
Registered: 2019-09-08
Posts: 30

Re: [Solved] NVIDIA card error (hardware failure?)

What kind of monitor do you use w/ the system and what's the BIOS config? Did you try to disable the APU or the GPU?

No monitor, just the laptop screen.

Try to toggle the hybrid/dgpu mode(s) in the BIOS/UEFI (ie, to use the nvidiaz GPU only or the APU only or hybrid mode, anything - in doubt forth and back)

I cannot enter the BIOS/UEFI ( i.e. no screen).
Would you have any possibility on how I can perform that from within Arch? I.e. force a certain setting on a reboot?
Since I do not see the bios screen or bootscreen or GRUB (all black).  The first thing (after disabling sddm) is the command line login.

I contemplate later on opening the machine and removing the Arch-ssd , since that might force the system to boot from the other one.



You can inject a mode or EDID, https://wiki.archlinux.org/title/Xrandr … esolutions & https://wiki.archlinux.org/title/Kernel … s_and_EDID but that ignores the nvidiaz related PCI errors and general "weirdness" of the situation.

Looking into that. So far no EDID file is present,  and extracting one using the tools mentioned ( read-edid) gives errors:

[root@alexarch ]# get-edid -m 0 
0
This is read-edid version 3.0.2. Prepare for some fun.
Attempting to use i2c interface
No EDID on bus 0
No EDID on bus 1
No EDID on bus 2
No EDID on bus 3
No EDID on bus 4
No EDID on bus 5
No EDID on bus 6
No EDID on bus 7
No EDID on bus 8
No EDID on bus 9
No EDID on bus 10
No EDID on bus 11
No EDID on bus 12
No EDID on bus 13
No EDID on bus 14
No EDID on bus 15
No EDID on bus 16
No EDID on bus 17
No EDID on bus 18
No EDID on bus 19
No EDID on bus 20
Problem requesting slave address: Device or resource busy
No EDID on bus 22
No EDID on bus 24
No EDID on bus 25
No EDID on bus 26
No EDID on bus 27
1 potential busses found: 23
Bus 23 doesn't really have an EDID...
Couldn't find an accessible EDID on this computer.
Attempting to use the classical VBE interface

        Performing real mode VBE call
        Interrupt 0x10 ax=0x4f00 bx=0x0 cx=0x0
        Function unsupported
        Call failed

        VBE version 0
        VBE string at 0x0 "O"

VBE/DDC service about to be called
        Report DDC capabilities

        Performing real mode VBE call
        Interrupt 0x10 ax=0x4f15 bx=0x0 cx=0x0
        Function unsupported
        Call failed

Reading next EDID block

VBE/DDC service about to be called
        Read EDID

        Performing real mode VBE call
        Interrupt 0x10 ax=0x4f15 bx=0x1 cx=0x0
        Function unsupported
        Call failed

The EDID data should not be trusted as the VBE call failed
Error: output block unchanged
I'm sorry nothing was successful. Maybe try some other arguments
if you played with them, or send an email to Matthew Kern <pyrophobicman@gmail.com>.

Last edited by _lex_1234 (2024-11-12 20:06:00)

Offline

#6 2024-11-12 20:07:46

seth
Member
Registered: 2012-09-03
Posts: 59,042

Re: [Solved] NVIDIA card error (hardware failure?)

Would you have any possibility on how I can perform that from within Arch?

Nope. Do you have an external output you can (temporarily) attach?

For the EDID

for OUT in /sys/class/drm/card*; do echo $OUT; edid-decode $OUT/edid; echo "================="; done

You'll need https://aur.archlinux.org/packages/edid-decode-git but I doubt you'll get anything from there.

Offline

#7 2024-11-13 06:10:19

_lex_1234
Member
Registered: 2019-09-08
Posts: 30

Re: [Solved] NVIDIA card error (hardware failure?)

Nope. Do you have an external output you can (temporarily) attach?

I did attach a screen to the HDMI port now. This duplicates the screen  ( or extends it), but does not show any earlier ' light'  than the laptop screen itself. So bios,uefi/grub/boot is still dark.

For the EDID

for OUT in /sys/class/drm/card*; do echo $OUT; edid-decode $OUT/edid; echo "================="; done

You'll need https://aur.archlinux.org/packages/edid-decode-git but I doubt you'll get anything from there.

I installed edid-decode and I did now find an EDID file.

[alex@alexarch ~]$ for OUT in /sys/class/drm/card*; do echo $OUT; edid-decode $OUT/edid; echo "================="; done 
/sys/class/drm/card0
/sys/class/drm/card0/edid: No such file or directory
=================
/sys/class/drm/card0-DP-1
EDID of '/sys/class/drm/card0-DP-1/edid' was empty.
=================
/sys/class/drm/card0-DP-2
EDID of '/sys/class/drm/card0-DP-2/edid' was empty.
=================
/sys/class/drm/card0-DP-3
EDID of '/sys/class/drm/card0-DP-3/edid' was empty.
=================
/sys/class/drm/card0-DP-4
EDID of '/sys/class/drm/card0-DP-4/edid' was empty.
=================
/sys/class/drm/card0-DP-5
EDID of '/sys/class/drm/card0-DP-5/edid' was empty.
=================
/sys/class/drm/card0-DP-6
EDID of '/sys/class/drm/card0-DP-6/edid' was empty.
=================
/sys/class/drm/card0-DP-7
EDID of '/sys/class/drm/card0-DP-7/edid' was empty.
=================
/sys/class/drm/card0-DP-8
EDID of '/sys/class/drm/card0-DP-8/edid' was empty.
=================
/sys/class/drm/card0-eDP-1
EDID of '/sys/class/drm/card0-eDP-1/edid' was empty.
=================
/sys/class/drm/card0-Writeback-1
EDID of '/sys/class/drm/card0-Writeback-1/edid' was empty.
=================
/sys/class/drm/card1
/sys/class/drm/card1/edid: No such file or directory
=================
/sys/class/drm/card1-DP-9
EDID of '/sys/class/drm/card1-DP-9/edid' was empty.
=================
/sys/class/drm/card1-eDP-2
edid-decode (hex):

00 ff ff ff ff ff ff 00 06 af a7 dd 00 00 00 00
e5 20 01 04 a5 26 16 78 03 70 75 93 58 5a 94 29
20 50 54 00 00 00 01 01 01 01 01 01 01 01 01 01
01 01 01 01 01 01 ec 3b 80 b6 70 38 88 40 30 20
a5 00 7e d7 10 00 00 18 00 00 00 fd 00 3c 90 b0
b0 25 01 0a 20 20 20 20 20 20 00 00 00 fe 00 41
55 4f 0a 20 20 20 20 20 20 20 20 20 00 00 00 fc
00 42 31 37 33 48 41 4e 30 34 2e 39 20 0a 01 a1

70 20 79 02 00 22 00 14 0b 9e 05 84 7f 07 b5 00
2f 80 1f 00 37 04 87 00 09 00 04 00 2b 00 06 27
00 3c 8f 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 90

----------------

Block 0, Base EDID:
  EDID Structure Version & Revision: 1.4
  Vendor & Product Identification:
    Manufacturer: AUO
    Model: 56743
    Made in: week 229 of 2022
  Basic Display Parameters & Features:
    Digital display
    Bits per primary color channel: 8
    DisplayPort interface
    Maximum image size: 38 cm x 22 cm
    Gamma: 2.20
    Supported color formats: RGB 4:4:4
    First detailed timing includes the native pixel format and preferred refresh rate
    Display supports continuous frequencies
  Color Characteristics:
    Red  : 0.5751, 0.3466
    Green: 0.3515, 0.5781
    Blue : 0.1611, 0.1279
    White: 0.3134, 0.3291
  Established Timings I & II: none
  Standard Timings: none
  Detailed Timing Descriptors:
    DTD 1:  1920x1080   60.014898 Hz  16:9     72.978 kHz    153.400000 MHz (382 mm x 215 mm)
                 Hfront   48 Hsync  32 Hback  102 Hpol N
                 Vfront   10 Vsync   5 Vback  121 Vpol N
    Display Range Limits:
      Monitor ranges (Range Limits Only): 60-144 Hz V, 176-176 kHz H, max dotclock 370 MHz
    Alphanumeric Data String: 'AUO'
    Display Product Name: 'B173HAN04.9 '
  Extension blocks: 1
Checksum: 0xa1

----------------

Block 1, DisplayID Extension Block:
  Version: 2.0
  Extension Count: 0
  Display Product Primary Use Case: None of the listed primary use cases; generic display
  Video Timing Modes Type 7 - Detailed Timings Data Block:
    DTD:  1920x1080  144.027931 Hz  16:9    175.138 kHz    368.140000 MHz (aspect 16:9, no 3D stereo, preferred)
               Hfront   48 Hsync  32 Hback  102 Hpol P
               Vfront   10 Vsync   5 Vback  121 Vpol N
  Adaptive Sync Data Block:
    Descriptor #1:
      Native Panel Range
      Fixed Average V-Total and Adaptive V-Total
      Supports Seamless Transition
      'Max Single Frame Duration Increase' field value without jitter impact
      'Max Single Frame Duration Decrease' field value without jitter impact
      Max Duration Increase: 0.00 ms
      Max Duration Decrease: 0.00 ms
      Min Refresh Rate: 60 Hz
      Max Refresh Rate: 144 Hz
  Checksum: 0x02
Checksum: 0x90
=================
/sys/class/drm/card1-HDMI-A-1
EDID of '/sys/class/drm/card1-HDMI-A-1/edid' was empty.
=================
[alex@alexarch ~]$ 

I copied the specific file for later usage, but I am now still trying to figure out how to use this specific file:

 alex@alexarch edidfiles]$ cp /sys/class/drm/card1-eDP-2/edid ./edid-card1-eDP-2 

( probably referencing it in the kernel commandline, but I do not want to mess up my grub file too much since it might prohibit me from booting)

Offline

#8 2024-11-13 14:36:28

seth
Member
Registered: 2012-09-03
Posts: 59,042

Re: [Solved] NVIDIA card error (hardware failure?)

https://wiki.archlinux.org/title/Kernel … s_and_EDID
Putting edid-card1-eDP-2 into /usr/lib/firmware/edid and also add it to the initramfs (FILES array in mkinitcpio.conf) and add "drm.edid_firmware=edid/edid-card1-eDP-2" to the kernel parameters.
You can edit … no you can't because you can't see anything.

You could  at some point write it into /sys/kernel/debug/dri/1/eDP-2/edid_override but it shows up there anyway and you probably want it in dri/0/eDP-1 to begin with.
Hard resetting the BIOS (w/o access to it) seems to require to remove the cmos battery in that model, some models can apparently be reset by holding the power button for minutes to boot the system.
No idea whether that works.

Offline

#9 2024-11-14 21:33:08

_lex_1234
Member
Registered: 2019-09-08
Posts: 30

Re: [Solved] NVIDIA card error (hardware failure?)

seth wrote:

https://wiki.archlinux.org/title/Kernel … s_and_EDID
Putting edid-card1-eDP-2 into /usr/lib/firmware/edid and also add it to the initramfs (FILES array in mkinitcpio.conf) and add "drm.edid_firmware=edid/edid-card1-eDP-2" to the kernel parameters.
You can edit … no you can't because you can't see anything.

I did this, but to no avail.... It is there after booting, so it was added:

[root@alexarch alex]# cat /proc/cmdline 
BOOT_IMAGE=/vmlinuz-linux root=UUID=7c36495b-3393-4af4-a40c-41301c38e98c rw loglevel=3 drm.edid_firmware=edid/edid-card1-eDP-2

But loafing is unsuccessful ' Direct firmware load for edid/edid-card1-eDP-2 failed with error -2'

DMESG: http://0x0.st/Xkgy.txt


You can edit … no you can't because you can't see anything.

Right, indeed, usually I would just edit the grub entry to try..... no use here :-)

Hard resetting the BIOS (w/o access to it) seems to require to remove the cmos battery in that model, some models can apparently be reset by holding the power button for minutes to boot the system.
No idea whether that works.

Yeah, next step ( for tomorrow) is probably opening the laptop and doing a hard reset somehow.
Unless I have some other idea ( maybe I try booting from a different medium, but it needs to detect it first and I cant change the boot order...)



PS I now notice that also my keyboard backlights seem to have stopped working. They are turned off, and can't get them back on ( 'echo 1 > /sys/class/leds/asus\:\:kbd_backlight/brightness') . Mentioning it since it could be a coincidence, but maybe not.

Last edited by _lex_1234 (2024-11-14 22:15:47)

Offline

#10 2024-11-15 17:05:54

_lex_1234
Member
Registered: 2019-09-08
Posts: 30

Re: [Solved] NVIDIA card error (hardware failure?)

Ok, I connected the laptop via HDMI to an external device and again extracted the edid file.
Still the same problem:

[    8.613769] nvidia 0000:01:00.0: Direct firmware load for edid/edid-card1-HDMI-A1 failed with error -2
[    8.613776] nvidia 0000:01:00.0: [drm] *ERROR* [CONNECTOR:103:DP-9] Requesting EDID firmware "edid/edid-card1-HDMI-A1" failed (err=-2)

So this  makes me wonder if I treat the edid-file in the right way...
Do I need to do something else with the file?
(which for me so far means copying it from /sys/class/drm/card1-something/edid  to the /var/lib/firmware? )

To be honest, the whole concept of the EDID file was new to me earlier this week :-) .

See dmesg:

http://0x0.st/XkRV.txt


Alex.

Offline

#11 2024-11-15 21:02:52

seth
Member
Registered: 2012-09-03
Posts: 59,042

Re: [Solved] NVIDIA card error (hardware failure?)

seth wrote:

Putting edid-card1-eDP-2 into /usr/lib/firmware/edid and

ie /usr/lib/firmware/edid/edid-card1-eDP-2 or in this case /usr/lib/firmware/edid/edid-card1-HDMI-A1
"-2" means the file doesn't exist, the parameter is the path in relation to /usr/lib/firmware

But the nvidia driver load really late anyway

[    7.206624] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver

and you'll also never get a BIOS or bootloader output this way.

Offline

#12 2024-11-16 17:52:25

_lex_1234
Member
Registered: 2019-09-08
Posts: 30

Re: [Solved] NVIDIA card error (hardware failure?)

I tried resetting by disconnecting the battery and pressing the power button for 60 seconds, but no succes. ( there is no separate CMOS battery I understand).
Unfortunately, that didn't help.
I now filed a question at ASUS on how to do this, since it seems really hard to find exact instructions on internet on how to ' hard reset cmos'  for the Asus A17 707 . ( Will post any results here for the record).


seth wrote:
seth wrote:

Putting edid-card1-eDP-2 into /usr/lib/firmware/edid and

ie /usr/lib/firmware/edid/edid-card1-eDP-2 or in this case /usr/lib/firmware/edid/edid-card1-HDMI-A1
"-2" means the file doesn't exist, the parameter is the path in relation to /usr/lib/firmware

But the nvidia driver load really late anyway

[    7.206624] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver

and you'll also never get a BIOS or bootloader output this way.


Yes, I understand this will not solve the ' main proble'm'  ( no screen  during boot, post, splash, grub, bios).
But, for me I would already be happy if i can make SDDM load in the regular manor to ensure it is no hardware problem.
Also, ' file doesn't exist'  seems weird since the file is actually there, so i am wondering what is going on.

Best,

Alexander.

Offline

#13 2024-11-16 21:37:19

seth
Member
Registered: 2012-09-03
Posts: 59,042

Re: [Solved] NVIDIA card error (hardware failure?)

Iff the module is in the initramfs (what given the late load seems unlikely) and the edid isn't, you'll get this.

ls -l /usr/lib/firmware/edid

Edit: for X11 the nvidia driver has an option to load a custom  edid and consider the GPU present (this should help you w/ SDDM)

Option "CustomEDID" "string"

    This option forces the X driver to use the EDID specified in a file rather
    than the display's EDID. You may specify a semicolon separated list of
    display names and filename pairs. Valid display device names include
    "CRT-0", "CRT-1", "DFP-0", "DFP-1", "TV-0", "TV-1", or one of the generic
    names "CRT", "DFP", "TV", which apply the EDID to all devices of the
    specified type. Additionally, if SLI Mosaic is enabled, this name can be
    prefixed by a GPU name (e.g., "GPU-0.CRT-0"). The file contains a raw EDID
    (e.g., a file generated by nvidia-settings).

    For example:
   
        Option "CustomEDID" "CRT-0:/tmp/edid1.bin; DFP-0:/tmp/edid2.bin"
   
    will assign the EDID from the file /tmp/edid1.bin to the display device
    CRT-0, and the EDID from the file /tmp/edid2.bin to the display device
    DFP-0. Note that a display device name must always be specified even if
    only one EDID is specified.

    Caution: Specifying an EDID that doesn't exactly match your display may
    damage your hardware, as it allows the driver to specify timings beyond
    the capabilities of your display. Use with care.

    When this option is set for an X screen, it will be applied to all X
    screens running on the same GPU.

Option "ConnectedMonitor" "string"

    Allows you to override what the NVIDIA kernel module detects is connected
    to your graphics card. This may be useful, for example, if you use a KVM
    (keyboard, video, mouse) switch and you are switched away when X is
    started. In such a situation, the NVIDIA kernel module cannot detect which
    display devices are connected, and the NVIDIA X driver assumes you have a
    single CRT.

    Valid values for this option are "CRT" (cathode ray tube) or "DFP"
    (digital flat panel); if using multiple display devices, this option may
    be a comma-separated list of display devices; e.g.: "CRT, CRT" or "CRT,
    DFP".

    It is generally recommended to not use this option, but instead use the
    "UseDisplayDevice" option.

    NOTE: anything attached to a 15 pin VGA connector is regarded by the
    driver as a CRT. "DFP" should only be used to refer to digital flat panels
    connected via DVI, HDMI, or DisplayPort.

    When this option is set for an X screen, it will be applied to all X
    screens running on the same GPU.

    Default: string is NULL (the NVIDIA driver will detect the connected
    display devices).

/usr/share/doc/nvidia/README

Last edited by seth (2024-11-16 21:40:22)

Offline

#14 2024-11-29 09:26:20

_lex_1234
Member
Registered: 2019-09-08
Posts: 30

Re: [Solved] NVIDIA card error (hardware failure?)

Hi all,

Seth, thanks for your help and for bearing with me on this!
Problem seems solved, the video (when switching on the laptop) works once again.

I am posting here for completeness and to document my attempts. However, I am not 100% sure what fixed the problem in the end, so this is a bit dissatisfying and unhelpful for others.
It looks a bit like the 'windows Voodoo' mentioned above by Seth did a miracle, but a slow one: Get into windows, try to update & finally reboot windows as well as linux couple of times..... and even then some more times.....

Best,

Alexander.




Overview
Current (solved) situation:
- I managed to get the visuals on startup of the laptop back, and i can enter the BIOS once more. Laptop seems to be in old working order.
- It is not 100% clear which of the actions now ' fixed'  it, since no specific action led to the restoration of functionality in one attempt. Rather, after doing all this, at some point on a startup the 'splash screen' ('asus incredible') reappeared.


First:
- all steps above in this thread, a.o.:
- Hard-reset the computer (disconnect battery, 3 minuted button hold since it is unclear what time is actually needed).
- attached an external monitor

Start Windows (tricky): 
- Attempt to start the windows installation on the original drive (without seeing anything) by pressing F8, and then waiting 10-15seconds, and then two times the down arrow and enter.
- Note that F8 opens the bootmenu, but without a working screen, getting the right timing and key combination took some trial and error. For example I did not know which item on the list is windows, so tried various combinations

Once I practiced enough so I could relatively reliably restart windows:
- update windows (which took multiple passes: update, reboot, update even more, etc)
- install 'MyAsus' , (create account....)
- update asus drivers, reboot when needed, update the rest.
- update BIOS & install fixes. Note that not all Asus/BIOS updates could be finished, since they required entering the BIOS (which was not possible)
- Reboot, back to arch, start sddm manually.
- and suddenly, the next day, the splash screen reappears, allowing me to finish the BIOS update. System now works again.


Current DMESG with working system:  https://0x0.st/XRWU.txt

Some lessons:
- I originally planned to wipe the 500GB windows disk for additional space, but now decided to keep it on the laptop. ( in my case no real problem since I added a 2 TB NVME disk that I boot from for daily use with Arch).
- I am still not sure what actually fixed the problem (or even what caused the actual problem in the first place), but:
- It seems a BIOS/CMOS software problem, and  managing to get into windows and gradually updating using some ASUS tools migth have done the trick.

Last edited by _lex_1234 (2024-11-29 09:34:04)

Offline

Board footer

Powered by FluxBB