You are not logged in.

#26 2010-02-28 19:53:08

me4tux
Member
Registered: 2010-02-28
Posts: 1

Re: ( FIXED ) Linux CPU Scheduler Optimized for Nehalem i7?

hunterthomson wrote:

Hum, after doing some more reading and getting no definitive answers yet.

< cut >

So, I guess Linux CPU scheduling is better then I thought.
It seems that all my problems are solved by running cpufreq with the Performance governor.

Before my system was running like crap because it was running at <1Ghz almost all the time. I was thinking then that Turbo Boost only worked if things were on as few cores as posable. I see now that is not the case. Now that the Turbo Boost is getting the CPU cores >2Ghz whenever it is working my system runs way faster.

Hello hunter,

I am interested in the Linux scheduler's behaviour on pcore freq and its relationship with throttling of associated core p-states.

I am just a newbie reading up on this topic, so I am really pleased with this discussion. Thanks for this discussion thread.

However, I am not clear if the linux scheduler is indeed optimized for Nehalem. I see your observations, but the whitepaper claims it is not. Is it?

http://www.cs.sfu.ca/~fedorova/papers/T … uation.pdf
Quote:"... We expected the frequency of idle cores to reduce to 1.5 GHz
during the sequential phases but the on-demand governor on
Linux is not aggressively making this adjustment.
From these results, we can conclude that Turbo Boost is
sensitive to changes in load which enables it to accelerate
sequential phases of the code. However, software power manager
is not aggressive enough at reducing the frequency of idle
cores to enable frequent and extended activation of TurboBoost."

Can you help clarify?

Thanks.

Offline

#27 2010-03-01 01:51:40

hunterthomson
Member
Registered: 2008-06-22
Posts: 794
Website

Re: ( FIXED ) Linux CPU Scheduler Optimized for Nehalem i7?

Intel Core i7-720QM
Rated speed 1.6GHz - Max Turbo Speed 2.8GHz

Bus clock frequency 133MHz

Lowest CPU Multiplier (7x) // Rated CPU multiplier (12x) // Max Turbo multiplier (21x)
   7x 133 = (931Mhz)       //   12x 133 = (1596MHz)      //   21x 133 = (2793MHz)


# (The folowing seems to be true.)
#
# MAX Multiplier that Turbo Boost can add = 9x
#
# 7x + 9x = 16x
#
# 12x + 9x = 21x
#
-----------------------------------------
I really don't know all that much ether. Just what I have read and observed from my Mobile i7-720QM Quad Core CPU

It seems not to be optimized for Nehalem in the sens of getting the most use out of the CPU. It seems to just truncate each thread as it comes in. Like the first thread go's to core 0 the second thread go's to core 1 the thread thread go's to core 2...... But doesn't try to get them on the Logical cores in any sort of optimized way. Like it doesn't try to put 4 threads on 4 Physical cores to make the most use of the L1,L2 and L3 cache nor dose it try to consolidate threads on the least # of cores to gain the highest frequency.

By default the CPU will sit at it's LOWEST clock multiplier,

7x for the i7-720QM   i.e. 7x 133MHz = (931Mhz)

Then when a Pysical Core is put under load the Turbo Boost will only add it's multiplier boost to that lowest possible multiplier of 7x. So it results in the CPU running BELOW Rated Clocks speed under load i.e. Under 12x 1.6Ghz

That SUCKS !.... However,

I run cpufreq with the default "Performance" governor. The "Performance" governor is suppose to run the CPU at Max reported clock speed all the time. That would be 1.6Ghz or 12x multiplier. However, It Dose Not Do That. When a core is unused the clock speed will drop all the way down to 7x 931Mhz. BUT if the core gets the slightest load at all, the clock speed for the physical core will jump up fast to 8x -> 21x.

What seems to be happening is that the cpufreq "Performance" governor will tell the CPU to run at Max reported clock speed of 12x 1.6Ghz. BUT then "Intel Speed Step" will Under clock the core if it is not in use back down to 7x. Then if the core is used "Intel Speed Step" will stop under clocking and set it to 12x. Then Turbo Boost will add it's multiplier to 12x. This results it the expected behavior of Clock Multipliers Higher then 12x on cores that are under load. HOWEVER, I still see clock multipliers between 7x and 12x like 8x,9x,10x, and 11x. So, there must be more to the story.

Last edited by hunterthomson (2010-03-01 01:51:59)


OpenBSD-current Thinkpad X230, i7-3520M, 16GB CL9 Kingston, Samsung 830 256GB
Contributor: linux-grsec

Offline

Board footer

Powered by FluxBB