Intel Core i7 Turbo Mode

Post by **Jonathan** » Wed Aug 20, 2008 5:41 pm

4:15PM - Turbo mode will be available in i7. This will allow the CPU to actually increase single threaded performance by increasing frequency of cores dynamically. When not all cores are in use, there will be more headroom (meaning single threaded apps will see a benefit from having cores not in use). Note that this is a different turbo mode from what we've had in Penryn, Intel is actually promising a performance boost here.

4:13PM - Power management on Nehalem will be better than today. Integrated Power Gate technology will take voltage down to zero rather than just decreasing power. Actually building a perfect silicon "power switch" was tough and required some new fabrication technology. The tech is pretty cool, each core can be shut off independently without resorting to multiple power planes. Each core gets its own PLL like Phenom, but there is some clever stuff going on here to make the implementation a bit more elegant.

http://anandtech.com/cpuchipsets/intel/ ... spx?i=3377

Will there be perf details? Only time will tell!

Post by **Jonathan** » Wed Aug 20, 2008 5:45 pm

http://arstechnica.com/news.ars/post/20 ... t-idf.html

VLSmooth · Post by **VLSmooth** » Wed Aug 20, 2008 5:52 pm

As expected, I'm much more interested in power gating (go go Energy Aware Computing, I wonder if Diana Marculescu contributed...) and the PCU.

Unfortunately, it sounds like the size of the PCU might make this power approach difficult/impossible for ultra-portable devices since the size is relatively fixed? (ie. doesn't scale with the number of cores).

Post by **Jonathan** » Wed Aug 20, 2008 6:00 pm

Define ultra-portable in terms of watts.

VLSmooth · Post by **VLSmooth** » Thu Aug 21, 2008 12:34 am

Unfortunately, I'm unfamiliar with how many watts typical portable devices consume, so I'll have to be vague.

Which of the following devices could realistically benefit from a PCU's power/size tradeoff?

A typical 17" LCD desktop replacement notebook
An ultraportable notebook (ex. the 13.3" LCD Lenovo Thinkpad X300)
An iPhone-like device

Random: Having a higher resolution (at least 1280x720p) iPhone that could play high-quality h264 encodes, yet sport insanely good idle power usage, would rock.

Post by **Jonathan** » Thu Aug 21, 2008 1:04 am

Nowadays mainstream desktops ship at 60W TDP for the processor. Total system power is probably about 200W. Your desknote probably has a 20W TDP for the CPU and 60W system power. Your ultraportable is probably 1-10W TDP and 20W system. Your handheld probably is ~.1W TDP and 1W system power.

"Which of the following devices could realistically benefit from a PCU's power/size tradeoff?"

The first two. PCU is not that big compared to a pair of out of order cores. Most handhelds (all?) are not yet multicore, though, and they all seem to be in-order.

11:31AM - 1080p decode on an MID. Not all of my desktop PCs can do that.

http://anandtech.com/cpuchipsets/intel/ ... spx?i=3378

VLSmooth · Post by **VLSmooth** » Sat Aug 23, 2008 8:52 am

I recently watched Rajesh Kumar's "Next Generation Intel Core Microarchitecture (Nehalem) Processors" presentation[1]:
http://intelstudios.edgesuite.net/fall_ ... 01/msm.htm

Here are some questions off the top of my head[2], mostly pertaining to Turbo mode.

Is turbo mode essentially dynamic overclocking with the ability to turn off cores due to low instruction level parallelism (ILP)?
Will the 266 MHz turbo mode overclock cap severely interfere with enthusiast overclocking?

AnandTech's Nehalem article (pg13) states coarse clock steps of 133 MHz each based on sensed TDP compared to the chip segment's TDP envelope. If someone slapped a better cooling system[3] on a Nehalem chip the sensed heat would be dramatically reduced, would the maximum clock speed increase still be capped to 266 MHz per core?
Provided increasing clock speed uses more power, are there on-die power draw sensors since TDP seems to only be a measure of temperature (please correct me if I'm wrong)? If there are power draw sensors, would this interfere with enthusiast overclocking?
What is the typical speed increase possible per active core as cores are turned off? (very similar in spirit to a question asked during the presentation)

For example, given four cores running at 1000 MHz each, if two cores are turned off, what can the two remaining active cores be run at (ignoring the artificial 266 MHz from before)? Ideally the total number of MHz (4000) is conserved and each active core can run at 2000 MHz, but I'm sure the rate of (heat generation) and (power consumption) is super-linearly related to clock speed, hence the attainable speed of the two active cores should be much less.

As a sanity check, a 4000 MHz single-threaded core should always perform better than or equal to 4 of the same single-threaded core running at 1000 MHz?
Is each active core independently overclocked or are they overclocked as a set? (ie. do active cores always run at the same clock speed?) I'm curious since independent overclocking could benefit performance depending on thread dependencies and lifetimes but should incur massive additional complexity and likely diminishing returns due to the super-linear relations mentioned before.
What's with the extra hiphop music starting at 51:19 and lasting to the end?

Also, woo for static CMOS instead of taking advantage of domino/capacitance based circuit solutions.

[1] Sucked that I couldn't watch it in full screen due to the slides, nor could I resize it with Firefox 3 since WMP video isn't scaled!
[2] I felt bad for the (vague)/(poorly worded) questions Rajesh Kumar had to answer
[3] Say a third-party heatsink by Thermalright plus some sweet high CFM low acoustic dB fans (or a peltier, but that's not exactly low power...)

quantus · Post by **quantus** » Sat Aug 23, 2008 9:13 am

ummm, go to sleep!

Post by **Jonathan** » Sat Aug 23, 2008 7:53 pm

VLSmooth wrote:I recently watched Rajesh Kumar's "Next Generation Intel Core Microarchitecture (Nehalem) Processors" presentation[1]:
http://intelstudios.edgesuite.net/fall_ ... 01/msm.htm

Here are some questions off the top of my head[2], mostly pertaining to Turbo mode.

Is turbo mode essentially dynamic overclocking with the ability to turn off cores due to low instruction level parallelism (ILP)?

????
You don't switch off a core if your workload has poor ILP. You switch off a core when the OS has nothing to schedule on it. This could be a function of TLP, but in client it's mostly multitasking, amirite?

Also, no. Overclocking, to me, implies running transistors faster than the design target by increasing the voltage or eating into the engineering margins. Turbo mode, on the other hand, is running a thermally constrained design at speeds within the design target at dynamic frequencies. Since Pentium 4, all Intel designs have been thermally constrained, not switching speed constrained, in frequency. We could have run Pentium 4s faster if we could have cooled the hotspots sufficiently. The same holds true for Nehalem. The design target was X GHz and all the speed paths in the core were fixed to that speed. However, there's not enough cooling available to run four cores at X GHz. When PCU senses that there is some thermal headroom due to either cores being shut off due to no load or cores being run at less than max power (lower frequency and/or low power workloads) then the PCU will bump up the frequency of cores that need it dynamically.

[*]Will the 266 MHz turbo mode overclock cap severely interfere with enthusiast overclocking?

The extreme edition part must be overclockable. It's part of the customer requirement for that particular market segment. In addition, turbo can be disabled via BIOS setup.

AnandTech's Nehalem article (pg13) states coarse clock steps of 133 MHz each based on sensed TDP compared to the chip segment's TDP envelope. If someone slapped a better cooling system[3] on a Nehalem chip the sensed heat would be dramatically reduced, would the maximum clock speed increase still be capped to 266 MHz per core?

Better cooling would lead to more time spent in Turbo mode and at higher frequencies, but the max number of turbo bins is fused per part. So for this hypothetical part you are discussing, it would spend more time at that +266 MHz bin.

[*]Provided increasing clock speed uses more power, are there on-die power draw sensors since TDP seems to only be a measure of temperature (please correct me if I'm wrong)? If there are power draw sensors, would this interfere with enthusiast overclocking?

As I said before, the system is thermally constrained, thus TDP is the appropriate limiter. I don't think our sensor arrays are public info yet.

[*]What is the typical speed increase possible per active core as cores are turned off? (very similar in spirit to a question asked during the presentation)

For example, given four cores running at 1000 MHz each, if two cores are turned off, what can the two remaining active cores be run at (ignoring the artificial 266 MHz from before)? Ideally the total number of MHz (4000) is conserved and each active core can run at 2000 MHz, but I'm sure the rate of (heat generation) and (power consumption) is super-linearly related to clock speed, hence the attainable speed of the two active cores should be much less.

There is no conservation of MHz. In large part, this question is unanswerable because the power dissipated by a workload varies wildly from workload to workload. Some tightly hand optimized floating point code will consume much more power than a test bound by memory latency, for example. Intel CPUs feature lots of clock gating so the part can dynamically shut off sections of the chip not in use. The LSD is a great example where we can shut off the entire front end if a tight loop fits into the queue. However, I can say that the maximum number of bins available for turbo differs depending on the number of active cores.

As a sanity check, a 4000 MHz single-threaded core should always perform better than or equal to 4 of the same single-threaded core running at 1000 MHz?

I'm not sure I understand your question. There's no negative scaling with frequency for the same design. Perf scales with frequency from anywhere between 0 and 100%. Such standard benchmarks as SPEC CPU2006 scale about 90% with frequency. If you have four threads to schedule, though, I could make up a workload that performs better on 4 cores at 1/4 the speed. For instance, four memory latency bound tests (transversal of a 1GB linked list of 1MB objects).

[*]Is each active core independently overclocked or are they overclocked as a set? (ie. do active cores always run at the same clock speed?) I'm curious since independent overclocking could benefit performance depending on thread dependencies and lifetimes but should incur massive additional complexity and likely diminishing returns due to the super-linear relations mentioned before.

I think it's safe to say that all four cores can operate at difference frequencies.

[*] What's with the extra hiphop music starting at 51:19 and lasting to the end? [/list]
Also, woo for static CMOS instead of taking advantage of domino/capacitance based circuit solutions.

[1] Sucked that I couldn't watch it in full screen due to the slides, nor could I resize it with Firefox 3 since WMP video isn't scaled!
[2] I felt bad for the (vague)/(poorly worded) questions Rajesh Kumar had to answer
[3] Say a third-party heatsink by Thermalright plus some sweet high CFM low acoustic dB fans (or a peltier, but that's not exactly low power...)

Intel is too funky for you.

Also, don't confuse system power with socket power. The latter is solely a function of the CPU, not the cooling or anything else, and that's the important quantity for the above discussion.

quantus · Post by **quantus** » Sat Aug 23, 2008 9:46 pm

VLSmooth wrote:Also, woo for static CMOS instead of taking advantage of domino/capacitance based circuit solutions.

I'm sure I pointed out to you when you took 322 that using non-CMOS design styles to cut down on transistors often lead to increased power use and/or potential circuit instability as the price to pay for the increased speed. That is why Intel will stick to CMOS instead. Domino logic shouldn't use more power. It just isn't stable at the voltages being used today because vdd is getting too close to the threshold voltage. I was under the impression that Intel did play with Domino logic for a while.

Post by **Jonathan** » Sat Aug 23, 2008 11:44 pm

Sure.

VLSmooth · Post by **VLSmooth** » Sun Aug 24, 2008 8:25 am

Thanks for the quick response!

Dwindlehop wrote:
VLSmooth wrote:As a sanity check, a 4000 MHz single-threaded core should always perform better than or equal to 4 of the same single-threaded core running at 1000 MHz?
I'm not sure I understand your question. There's no negative scaling with frequency for the same design. Perf scales with frequency from anywhere between 0 and 100%. Such standard benchmarks as SPEC CPU2006 scale about 90% with frequency. If you have four threads to schedule, though, I could make up a workload that performs better on 4 cores at 1/4 the speed. For instance, four memory latency bound tests (transversal of a 1GB linked list of 1MB objects).

I meant barring other bottlenecks, but I think we're on the same page. I was referring to a simplistic case of queuing/trunking theory. For example, if a grocery store has to choose between four checkout lanes servicing customers at rate R versus one checkout lane servicing customers at rate 4*R, the latter would be always be at least as efficient than the prior, all other factors held equal (the probability model requires more information). Of course, it's not like the latter is feasible. I vaguely remember erlangs having to do with traffic/capacity/QoS channel calculations for Wireless Transmission Tech (18-55(1|2)) and maybe Wireless Communications (18-7??). The question was to ensure my memory was correct.

Dwindlehop wrote:Also, don't confuse system power with socket power. The latter is solely a function of the CPU, not the cooling or anything else, and that's the important quantity for the above discussion.

I completely understand. The peltier comment was just pointing out irony in a bigger picture.

quantus wrote:I'm sure I pointed out to you when you took 322 that using non-CMOS design styles to cut down on transistors often lead to increased power use and/or potential circuit instability as the price to pay for the increased speed.

Yep. I just didn't realize how much extra power if would take until Diana Marculescu's Energy Aware Computing course (18-7??), which showed the frightening rate leakage power would increase at as voltages decreased for such circuits (5+ years ago is making a comeback!).

Post by **Jonathan** » Thu Aug 28, 2008 11:18 pm

Intel Core i7 can sense its power, current, and temperature. The temperature sensor is architecturally visible and available to software.

Post by **Jonathan** » Thu Aug 28, 2008 11:31 pm

Dwindlehop wrote:
[*]Is each active core independently overclocked or are they overclocked as a set? (ie. do active cores always run at the same clock speed?) I'm curious since independent overclocking could benefit performance depending on thread dependencies and lifetimes but should incur massive additional complexity and likely diminishing returns due to the super-linear relations mentioned before.

I think it's safe to say that all four cores can operate at difference frequencies.

Just to clarify, it's difficult to come up with a scenario or algorithm where system power/perf would benefit from cores running at different speeds. Cores are shut off if they have nothing to run. Cores can dynamically run at different frequencies, but I would expect in the steady state to find all active cores running at the same frequency.

VLSmooth · Post by **VLSmooth** » Thu Aug 28, 2008 11:51 pm

Dwindlehop wrote:Just to clarify, it's difficult to come up with a scenario or algorithm where system power/perf would benefit from cores running at different speeds. Cores are shut off if they have nothing to run. Cores can dynamically run at different frequencies, but I would expect in the steady state to find all active cores running at the same frequency.

Thank you for answering that since I was too lazy to ask for clarification. That's what I was suggesting, but now I've "unlazified" myself by responding.

Crap.

Post by **Jonathan** » Thu Oct 09, 2008 11:37 pm

Intel has not stated publicly what the Turbo benefit is.

Mohtalim

Intel Core i7 Turbo Mode

Intel Core i7 Turbo Mode

Re: Core i7 Turbo Mode

Re: Core i7 Turbo Mode

Re: Core i7 Turbo Mode

Re: Core i7 Turbo Mode

Re: Core i7 Turbo Mode

Re: Core i7 Turbo Mode

Re: Core i7 Turbo Mode

Re: Core i7 Turbo Mode

Re: Core i7 Turbo Mode

Re: Core i7 Turbo Mode

Re: Core i7 Turbo Mode

Re: Core i7 Turbo Mode

Re: Core i7 Turbo Mode

Re: Core i7 Turbo Mode

Re: Intel Core i7 Turbo Mode