You are not logged in.
Pages: 1
Hi folks,
The fan for an NVIDIA T1000 8GB was idling for over a year, except on occasion when I'd run Lizzie and Leela for analyzing Go games. After stopping the software, the GPU fan would slip into idle again. Normally, I never hear any fan running. The case is an HDPLEX Fanless PC Chassis, so there's no CPU fan, only GPU.
The GPU fan activity changed after I installed EasyDiffusion:
https://easydiffusion.github.io/docs/installation/
The GPU fan will no longer return to idle even after deleting the software, upgrading the system, resetting the settings (nvidia-settings --load-config-only) and rebooting. The GPU temperature rarely goes above 60 C. However, the GPU fan is now at 2500 RPM when idle. When inside the BIOS settings, the GPU fan skyrockets, which never happened before.
The lowest I can set the fan myself is 33%, which I understand NVIDIA has done deliberately to prevent users from accidentally pooching their GPU.
$ uname -a
Linux hostname 6.4.12-arch1-1 #1 SMP PREEMPT_DYNAMIC Thu, 24 Aug 2023 00:38:14 +0000 x86_64 GNU/Linux
$ nvidia-settings --version
nvidia-settings: version 525.60.11
I've placed a copy of the NVIDIA bug report file at: https://easyupload.io/ddgj3f
Any ideas how I can diagnose the issue and return the system back to the way it was before running EasyDiffusion?
$ nvidia-smi
Thu Sep 7 23:36:36 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA T1000 8GB On | 00000000:0A:00.0 On | N/A |
| 40% 56C P8 N/A / 50W | 562MiB / 8192MiB | 14% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 633 G /usr/lib/Xorg 415MiB |
| 0 N/A N/A 810 G xfwm4 2MiB |
| 0 N/A N/A 1009 G /usr/lib/thunderbird/thunderbird 8MiB |
| 0 N/A N/A 1256 G /usr/lib/firefox/firefox 133MiB |
+---------------------------------------------------------------------------------------+Thank you!
Last edited by thangalin (2023-09-08 06:38:38)
Offline
The particular script doesn't seem to be root-run and "When inside the BIOS settings, the GPU fan skyrockets" isn't sth. you'd get out of the OS.
=> Likely coincidental
Was the previous behavior stock or did you play w/ https://wiki.archlinux.org/title/NVIDIA … nd_cooling ?
Is there a parallel OS (windows)?
This isn't a hybrid graphics system and you switched from the IGP to nvidia?
Does the fan shut down when you cool down the system externally (eg. w/ an external fan, many hairdryers have a mode for cold air)?
Online
Thanks for helping out!
The particular script doesn't seem to be root-run
I ran it as root; the script won't run as non-root:
$ nvidia-bug-report.sh
ERROR: Please run nvidia-bug-report.sh as root.and "When inside the BIOS settings, the GPU fan skyrockets" isn't sth. you'd get out of the OS.
=> Likely coincidental
Seems rather unlikely to be a coincidence? I've been rebooting this computer for years without the GPU fan going haywire on startup. It started happening on the first reboot after installing EasyDiffusion.
Was the previous behavior stock
I didn't even know about the GPU fan settings until it started meowing for attention. Everything was stock and ultra-quiet up until after running EasyDiffusion. At that point, I mucked with a few settings to try and shush the fan.
Is there a parallel OS (windows)?
No.
This isn't a hybrid graphics system and you switched from the IGP to nvidia?
Not to my knowledge.
Does the fan shut down when you cool down the system externally (eg. w/ an external fan, many hairdryers have a mode for cold air)?
I haven't tried that. There's no easy way to cool it down. An ice pack on the case, perhaps, but it probably wouldn't help the GPU. (I can't open the case, I don't have the right screwdriver.)
Offline
No, I meant the the EasyDiffusion script.
Seems rather unlikely to be a coincidence?
That's the very nature of a coincidence.
Not to my knowledge.
lspciI can't open the case, I don't have the right screwdriver.
Get one? Or rather bits, but it looks like it's just an Allan key?
Did you update the UEFI?
Do the fans blow up if you boot some live distro (grml)?
Online
No, I meant the the EasyDiffusion script.
Yes, ran it as a regular user.
lspci
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Root Complex
00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge
00:01.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
00:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge
00:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
00:07.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
00:07.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B
00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 59)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 3
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 5
00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 6
00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7
01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
02:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset USB 3.1 xHCI Controller (rev 02)
02:00.1 SATA controller: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset SATA Controller (rev 02)
02:00.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43b2 (rev 02)
03:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port (rev 02)
03:01.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port (rev 02)
03:04.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port (rev 02)
03:05.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port (rev 02)
03:06.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port (rev 02)
03:07.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port (rev 02)
08:00.0 Network controller: Intel Corporation Dual Band Wireless-AC 3168NGW [Stone Peak] (rev 10)
09:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03)
0a:00.0 VGA compatible controller: NVIDIA Corporation TU117GL [T1000 8GB] (rev a1)
0a:00.1 Audio device: NVIDIA Corporation Device 10fa (rev a1)
0b:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Raven/Raven2 PCIe Dummy Function
0b:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Platform Security Processor (PSP) 3.0 Device
0b:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) USB 3.0 Host Controller
0c:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Renoir PCIe Dummy Function
0c:00.2 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51)
0c:00.3 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) HD Audio ControllerGet one? Or rather bits, but it looks like it's just an Allan key?
It's one of those star-shaped Allan keys, but a peculiar size. I always get a confused reaction when I bring the case into a shop.
Did you update the UEFI?
There's no GPU fan setting in the UEFI. I did, afterwards, set the CPU fan to quiet mode. However, there's no CPU fan, so it shouldn't affect anything.
Do the fans blow up if you boot some live distro (grml)?
Good idea, I'll give that a try and see what happens, thank you.
Offline
There's only one VGA device, no hybrid graphics.
The "star-shaped Allan key" isn't just Torx??
https://en.wikipedia.org/wiki/Torx
Online
Pages: 1