You are not logged in.
System:
- Kernel: 6.10.6-zen1-1-zen (or Arch kernel, or vanilla kernel)
- Hardware: Lenovo T15G Gen1
- CPU: Intel i9-10885H (16) @ 4.301GHz
- GPU: NVIDIA GeForce RTX 2070 SUPER Mobile / Max-Q 90W (Nvidia-Open-dkms)
- RAM: 48GB
- Power Supply: Original NEW - PSU 230w - Always ON.
Scaling Driver:
- Scaling Driver: intel_pstate
- Power Manager: auto_cpufreq (tested most power managers, the issue is the same: powerdevil, kdepowermanager, etc.)
The Issue:
- Problem: When running a rendering job or power-demanding Stable Diffusion, it works well at first. Until the CPU decides to slow down to 800 MHz on all cores.
- Conditions: This happens on Turbo Mode (5.3GHz) (95°C), and even No Turbo Mode (2.4GHz max) (65°C).
Settings:
- Governor: Set to Performance
- Cores: Set to not more than 2.4 GHz and not lower than 2.4 GHz.
- Idle State: Cores remain at 2.4 GHz when the system is not under load.
Sensors:
- Thermal Readings: Sensors temperature on heavy loads is healthy, not only healthy but good. Verified using a laser thermometer to check hot spots. One anomaly with CPU core sensors, becoming offline?..
- Monitoring: Using NetData Agent, which shows all system statistics, including thermal throttling.
System Behavior Analysis:
Upon examining the CPU frequency behavior, it's evident that the CPU ignores the 2.4 GHz minimum limit and drops to 800 MHz, fluctuating between 2.4 GHz when this anomaly occurs. Notably, sensor data from some cores is missing during full load, as seen in the middle of the screen.
The Anomaly:
Before the abnormal core behavior started, 2 sensors went down.
Power Delivery:
- Power Supply: 230w
- Peak Power Draw:Up to 231w for short periods with everything maxed out
- Typical Power Draw: 100-190w
Important Note:
The CPU throttling is not related to temperature or power issues. The problem arises when sensors from some cores go down, causing the CPU to behave abnormally.
Question:
Can someone provide guidance on how to access low-level logs to determine exactly why the CPU frequency was throttled down to 800 MHz? Specifically, which kernel module decided to throttle the CPU, and what triggered this decision? Viewing frequencies and temperatures alone is not sufficient to understand the root cause of this issue.
Currently:
I removed the Scaling Driver via /boot/loader/entries/2024-08-26-zen.conf file with intel_pstate=disable.
So currently my laptop has no scaling driver, and tests being done.
Solved Solution:
If battery is fully charged: 95-100%
If battery is not fully charged 85%. The CPU behaves normally.
For now, it means using TLP to limit computers charge to 85%
The root cause is yet unknown, probably some false battery mode bug.
Last edited by polytect (2024-10-02 13:55:35)
Offline
Behaviour is the same even with intel_pstate=disable
This might mean that BIOS is responsible for Throttling CPU down.
Very hard puzzle to solve.
Offline
Another update using: acpi-cpufreq scaling driver, I will keep adding more info with more tests.
Problem remains the same.
With another test done, I realise that, this laptop when goes on Turbo mode 4.3 Ghz it works as normal, for few seconds, and then fluctuates from 800Mhz to 2.4 Ghz to 4.3 ghz. The problem is that 800Mhz. Regardless of what I set even when temperatures are normal, on system load it flickers from 800mhz to 4.3ghz.
Especially when used in combination with GPU, lets say gaming. Runs at 80 FPS, then 20 FPS, 80 PFS, 20 FPS, on and on.
This makes me wonder, if there is some another layer which confuses power states. Or this is pure hardware issue. Or some thermal issue (which is unlikely i think)
Because this happens way more often when GPU is involved.. All the Nvidia Drivers tested. nvidia, nvidia-dkms, nvidia-open, nvidia-open-dkms gives same result.
This laptop supports Hybrid Graphics, and but in bios I specified to use Dedicated GPU only.
What if I would enable Hybrid Graphics again, will this change anything?..
Last edited by polytect (2024-09-09 08:53:13)
Offline
Another update.
Hybrid graphics doesn't change anything.
The laptop still goes to 800mhz on load. Probably Lenovo to blame.
Offline
https://www.google.com/search?client=qu … o%20800MHz has a shit to of hits, so "yes", lenovo.
https://superuser.com/questions/1584717 … ugged-into seems to blame it on the AC - do you get the same on battery?
Offline
Seth,
has a shit to of hits, so "yes", lenovo. - This is lenovo's proprietary nightmare. I blame lenovo for not providing documentation over sensors, poor thermals, undocumented power delivery requirements.
do you get the same on battery? - This is a good question. What I noted is that when the 0.8 GHz happens, when I unplug the AC, it goes back to normal 2.4 GHz immediately on battery, and it works faster. But as this down-clock is not permanent, generally laptop works faster on AC without question.
1. If AC and Down-clocked to 0.8 GHz - Removing AC and On battery goes back to Normal to 2.4 GHz .
2. If AC and NOT Down-clocked 2.4 GHz - On battery stays to Normal 2.4 GHz
3. If On Battery Only and stress test - To be answered...
Now to fully answer the question: I don't know for sure, I will do an intensive testing while monitoring the CPU again, to see how far it can go without AC, ASAP
Offline
Hi,
I did an intensive test. The results don't make any sense. But let's see them.
The results:
1. AC Attached
CPU is fluctuating between 2.4GHz to 0.8GHz as seen in the picture. Temperatures stable.
2. AC Removed, battery only.
CPU is 2.4GHz, no change, stable
GPU is in Power state 1 on 30w - lower power mode due to being on battery. Temperatures lower.
3. AC Attached again, AC only.
CPU is 2.4GHz, no change, stable
I uncapped the CPU to 4.3GHz. Temperatures very HIGH, but stable.
GPU is in Power state 3 on 90w, stable.
I did everything possible to make sure it works under maximum stress.
Summary:
As you have seen, when AC is attached and GPU and CPU (turbo or no turbo) are utilized, the frequency fluctuates between 2.4GHz to 0.8GHz or 4.3GHz to 0.8GHz, unstable.
When removing the AC, the CPU becomes stable (while not on turbo at 2.4GHz), GPU was on power 1 at 30w, stable, but with lower performance.
When I attached the AC again, the CPU was stable at 2.4GHz, no turbo, or 4.3GHz turbo. Very high temperature, GPU was on power 3 at 90w. Stable, even with CPU Temperature Throttling turned on due to high temperature, which is normal.
Now what the hell is this result? What do you think?
To be honest, I never ran my laptop with battery only during a stress test, as I always wanted maximum performance.
Why does it work normally after running on battery for a while and then attaching the AC? This is the million-dollar question.
Any input appreciated.
Last edited by polytect (2024-10-02 09:53:40)
Offline
I found that when battery is charged 100% fully, the cpu becomes unstable fluctuating between 0.8 GHz to 2.4 GHz.
When the battery is not full: ~85%, laptop is very stable.
So what I did is: Using TLP, restricted the charging up to 80% of the battery. And laptop now works great.
Now I don't know what to blame, probably no one, probably coincidence. Why would 100% charged battery could destabilize the system? It's like searching the world for answers only to find the problem was right under my nose—turns out, my laptop just wanted a little less charge! Never seen this before. ?♂️
Offline
Once the system hits 100% it might interpret "not charging" falsely as "on battery" and move to some power saving mode.
You migth want to add your findings to the devices remarks at https://wiki.archlinux.org/title/Laptop/Lenovo#T_series
Offline
I will definitely try to add my findings to Arch wiki. I don't want anyone to go trough the road of confusion like I did with this model.
Thank you Seth.
Offline
I added the remarks and notes to https://wiki.archlinux.org/title/Laptop/Lenovo#T_series under ThinkPad T15g (Intel) Gen 1
Offline