You are not logged in.

#51 2026-06-21 21:20:17

Al.Piotrowicz
Member
Registered: 2017-08-07
Posts: 178

Re: [PARTIALLY SOLVED] Nvidia system wide crash GPU Reset Required

Here is what happens after few days of uptime on the latest driver (610.43.02):

cze 21 22:54:05 testowy kernel: NVRM: GPU0 kgmmuInvalidateTlb_GM107: TLB invalidation failed waiting for completion (status=0x00000065) for PDB 0x10e63000, vaspaceFlags 0x84004020, scope 0x2, GFID 0
cze 21 22:54:05 testowy kernel: NVRM: GPU at PCI:0000:07:00: GPU-fb51809b-350a-99db-67b5-e16fc6d498ab
cze 21 22:54:05 testowy kernel: NVRM: Xid (PCI:0000:07:00): 62, 000121ea 00012246 00011d77 00015c6c 00016077 00014092 00000011 00000000
cze 21 22:54:05 testowy kernel: NVRM: GPU0 _kgspRpcGspEventPmuHalted: Received signal from GSP that PMU has halted.
cze 21 22:54:05 testowy kernel: NVRM: Xid (PCI:0000:07:00): 154, GPU recovery action changed from 0x0 (None) to 0x1 (PF FLR)
cze 21 22:54:09 testowy at-spi2-registryd[5043]: Disabling unresponsive app with pid 5112
cze 21 22:54:09 testowy at-spi2-registryd[5043]: Disabling unresponsive app with pid 1205682
cze 21 22:54:12 testowy accounts-daemon[5373]: Unable to open /etc/tcb: Nie ma takiego pliku ani katalogu
cze 21 22:54:12 testowy dbus-daemon[4133]: [system] Activating via systemd: service name='org.freedesktop.home1' unit='dbus-org.freedesktop.home1.service' requested by ':1.24' (uid=0 pid=5373 comm="/usr/lib/accounts-daemon")
cze 21 22:54:12 testowy dbus-daemon[4133]: [system] Activation via systemd failed for unit 'dbus-org.freedesktop.home1.service': Unit dbus-org.freedesktop.home1.service not found.
cze 21 22:54:12 testowy accounts-daemon[5373]: couldn't list homed users: GDBus.Error:org.freedesktop.systemd1.NoSuchUnit: Unit dbus-org.freedesktop.home1.service not found.
cze 21 22:54:35 testowy kernel: NVRM: kgmmuInvalidateTlb_GM107: TLB invalidation failed waiting for prior invalidate (status=0x00000065), vaspaceFlags 0x101, scope 0x2, GFID 0
cze 21 22:54:35 testowy kernel: NVRM: nvAssertFailedNoLog: Assertion failed: pEntries != NULL @ gmmu_walk.c:842
cze 21 22:54:35 testowy kernel: NVRM: nvAssertFailedNoLog: Assertion failed: progress == entryIndexHi - entryIndexLo + 1 @ mmu_walk_fill.c:130
cze 21 22:54:35 testowy kernel: NVRM: nvAssertFailedNoLog: Assertion failed: NV_OK == status @ mmu_walk.c:541
cze 21 22:54:35 testowy kernel: NVRM: mmuWalkUnmap: Failed to unmap VA Range 0x128800000 to 0x1289fffff. Status = 0x00000040
cze 21 22:54:35 testowy kernel: NVRM: nvAssertFailedNoLog: Assertion failed: 0 @ mmu_walk_unmap.c:62
cze 21 22:54:35 testowy kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from mmuWalkUnmap(userCtx.pGpuState->pWalk, vaLo, vaHi) @ gpu_vaspace.c:2080
cze 21 22:55:05 testowy kernel: NVRM: kgmmuInvalidateTlb_GM107: TLB invalidation failed waiting for prior invalidate (status=0x00000065), vaspaceFlags 0x101, scope 0x2, GFID 0
cze 21 22:55:05 testowy kernel: NVRM: nvAssertFailedNoLog: Assertion failed: pEntries != NULL @ gmmu_walk.c:842
cze 21 22:55:05 testowy kernel: NVRM: nvAssertFailedNoLog: Assertion failed: progress == entryIndexHi - entryIndexLo + 1 @ mmu_walk_fill.c:130
cze 21 22:55:05 testowy kernel: NVRM: nvAssertFailedNoLog: Assertion failed: NV_OK == status @ mmu_walk.c:541
cze 21 22:55:05 testowy kernel: NVRM: mmuWalkUnmap: Failed to unmap VA Range 0x128800000 to 0x1289fffff. Status = 0x00000040
cze 21 22:55:05 testowy kernel: NVRM: nvAssertFailedNoLog: Assertion failed: 0 @ mmu_walk_unmap.c:62
cze 21 22:55:05 testowy kernel: NVRM: nvAssertFailedNoLog: Assertion failed: NV_OK == status @ gpu_vaspace.c:4913

I tried to test the GPU by running cuda_memtest (flawless for few hours), so could it be dying GPU? Probably sounds related despite newer GPUs

Offline

Board footer

Powered by FluxBB