You are not logged in.
Hi everyone. This afternoon I wanted to ensure the sanity of my GPU (especially against hardware defects) so I have downloaded the GPU deployment kit from the NVIDIA website and I ran the health monitoring with the following output:
Using config file path: nvidia-healthmon.conf
Loading Config: SUCCESS
Global Tests
Black-Listed Modules: SKIPPED
Black-Listed Drivers: SUCCESS
Load NVML: SUCCESS
NVML Sanity
The driver version "440.82" does not contain a supported version of NVML.
Result: CRITICAL ERROR
Global Test Results: 5 success, 1 errors, 0 warnings, 4 did not run
System Results: 5 success, 1 errors, 0 warnings, 4 did not run
One or more tests didn't run.
One or more tests failed.
There is 1 error with NVML and I want to ask you if it is a common problem with that version of nvidia driver or it is something strange. I leave here also info about my gpu.
$ nvidia-smi -q
==============NVSMI LOG==============
Timestamp : Sat May 16 19:02:43 2020
Driver Version : 440.82
CUDA Version : 10.2
Attached GPUs : 1
GPU 00000000:26:00.0
Product Name : GeForce RTX 2060 SUPER
Product Brand : GeForce
Display Mode : Enabled
Display Active : Enabled
Persistence Mode : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : GPU-00d4d170-5b93-3b2b-3bc2-b36da9c3d91f
Minor Number : 0
VBIOS Version : 90.06.44.40.9F
MultiGPU Board : No
Board ID : 0x2600
GPU Part Number : N/A
Inforom Version
Image Version : G001.0000.02.04
OEM Object : 1.1
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x26
Device : 0x00
Domain : 0x0000
Device Id : 0x1F0610DE
Bus Id : 00000000:26:00.0
Sub System Id : 0x3FF81458
GPU Link Info
PCIe Generation
Max : 3
Current : 1
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 14000 KB/s
Rx Throughput : 203000 KB/s
Fan Speed : 0 %
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 7979 MiB
Used : 857 MiB
Free : 7122 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 8 MiB
Free : 248 MiB
Compute Mode : Default
Utilization
Gpu : 4 %
Memory : 3 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Aggregate
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Temperature
GPU Current Temp : 44 C
GPU Shutdown Temp : 96 C
GPU Slowdown Temp : 93 C
GPU Max Operating Temp : 89 C
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : Supported
Power Draw : 14.29 W
Power Limit : 225.00 W
Default Power Limit : 225.00 W
Enforced Power Limit : 225.00 W
Min Power Limit : 105.00 W
Max Power Limit : 300.00 W
Clocks
Graphics : 300 MHz
SM : 300 MHz
Memory : 405 MHz
Video : 540 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Max Clocks
Graphics : 2175 MHz
SM : 2175 MHz
Memory : 7001 MHz
Video : 1950 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes
Process ID : 642
Type : G
Name : /usr/lib/Xorg
Used GPU Memory : 358 MiB
Process ID : 838
Type : G
Name : /usr/bin/kwin_x11
Used GPU Memory : 87 MiB
Process ID : 844
Type : G
Name : /usr/bin/plasmashell
Used GPU Memory : 55 MiB
Process ID : 871
Type : G
Name : /usr/bin/latte-dock
Used GPU Memory : 36 MiB
Process ID : 1055
Type : G
Name : /usr/lib/brave-bin/brave --type=gpu-process --field-trial-handle=14782561975971967460,2234195610422329566,131072 --enable-features=AutoupgradeMixedContent,DnsOverHttps,MixedContentSiteSetting,PassiveMixedContentWarning,PasswordImport,SimplifyHttpsIndicator,WebUIDarkMode --disable-features=AllowPopupsDuringPageUnload,AudioServiceOutOfProcess,AutofillServerCommunication,LookalikeUrlNavigationSuggestionsUI,NotificationTriggers,SmsReceiver,TextFragmentAnchor,VideoPlaybackQuality --gpu-preferences=MAAAAAAAAAAgAAAAAAAAAAAAAAAAAAAAAABgAAAAAAAQAAAAAAAAAAAAAAAAAAAACAAAAAAAAAA= --shared-files
Used GPU Memory : 291 MiB
Process ID : 31182
Type : G
Name : /usr/bin/krunner
Used GPU Memory : 20 MiB
My current kernel:
$ uname -r
5.6.12-arch1-1
Last edited by Seooo (2020-05-16 17:26:23)
Offline
What exactly did you download? If I search for GPU deployment kit I'm referred to a page that tells me that this tool is deprecated and shouldn't be used and to use the CUDA Toolkit instead.
Last edited by V1del (2020-05-16 17:50:13)
Offline
What exactly did you download? If I search for GPU deployment kit I'm referred to a page that tells me that this tool is deprecated and shouldn't be used and to use the CUDA Toolkit instead.
I downloaded the first kit from here https://developer.nvidia.com/gpu-deployment-kit
Last edited by Seooo (2020-05-16 18:14:08)
Offline
Yes and that is 4 years old and unlikely to represent current state.
Offline
Yes and that is 4 years old and unlikely to represent current state.
it is written that the kit is now included with cuda but I can't find it inside the cuda folder.. I will do some additional checks. Thank you!
Last edited by Seooo (2020-05-16 18:25:33)
Offline