VLSmooth wrote:I recently watched Rajesh Kumar's "Next Generation Intel Core Microarchitecture (Nehalem) Processors" presentation[1]:
http://intelstudios.edgesuite.net/fall_ ... 01/msm.htm
Here are some questions off the top of my head[2], mostly pertaining to Turbo mode.
- Is turbo mode essentially dynamic overclocking with the ability to turn off cores due to low instruction level parallelism (ILP)?
????
You don't switch off a core if your workload has poor ILP. You switch off a core when the OS has nothing to schedule on it. This could be a function of TLP, but in client it's mostly multitasking, amirite?
Also, no. Overclocking, to me, implies running transistors faster than the design target by increasing the voltage or eating into the engineering margins. Turbo mode, on the other hand, is running a thermally constrained design at speeds within the design target at dynamic frequencies. Since Pentium 4, all Intel designs have been thermally constrained, not switching speed constrained, in frequency. We could have run Pentium 4s faster if we could have cooled the hotspots sufficiently. The same holds true for Nehalem. The design target was X GHz and all the speed paths in the core were fixed to that speed. However, there's not enough cooling available to run four cores at X GHz. When PCU senses that there is some thermal headroom due to either cores being shut off due to no load or cores being run at less than max power (lower frequency and/or low power workloads) then the PCU will bump up the frequency of cores that need it dynamically.
[*]Will the 266 MHz turbo mode overclock cap severely interfere with enthusiast overclocking?
The extreme edition part must be overclockable. It's part of the customer requirement for that particular market segment. In addition, turbo can be disabled via BIOS setup.
AnandTech's Nehalem article (
pg13) states coarse clock steps of 133 MHz each based on sensed
TDP compared to the chip segment's TDP envelope. If someone slapped a better cooling system[3] on a Nehalem chip the sensed heat would be dramatically reduced, would the maximum clock speed increase still be capped to 266 MHz per core?
Better cooling would lead to more time spent in Turbo mode and at higher frequencies, but the max number of turbo bins is fused per part. So for this hypothetical part you are discussing, it would spend more time at that +266 MHz bin.
[*]Provided increasing clock speed uses more power, are there on-die power draw sensors since TDP seems to only be a measure of temperature (please correct me if I'm wrong)? If there are power draw sensors, would this interfere with enthusiast overclocking?
As I said before, the system is thermally constrained, thus TDP is the appropriate limiter. I don't think our sensor arrays are public info yet.
[*]What is the typical speed increase possible per active core as cores are turned off? (very similar in spirit to a question asked during the presentation)
For example, given four cores running at 1000 MHz each, if two cores are turned off, what can the two remaining active cores be run at (ignoring the artificial 266 MHz from before)? Ideally the total number of MHz (4000) is conserved and each active core can run at 2000 MHz, but I'm sure the rate of (heat generation) and (power consumption) is super-linearly related to clock speed, hence the attainable speed of the two active cores should be much less.
There is no conservation of MHz. In large part, this question is unanswerable because the power dissipated by a workload varies wildly from workload to workload. Some tightly hand optimized floating point code will consume much more power than a test bound by memory latency, for example. Intel CPUs feature lots of clock gating so the part can dynamically shut off sections of the chip not in use. The LSD is a great example where we can shut off the entire front end if a tight loop fits into the queue. However, I can say that the maximum number of bins available for turbo differs depending on the number of active cores.
As a sanity check, a 4000 MHz single-threaded core should always perform better than or equal to 4 of the same single-threaded core running at 1000 MHz?
I'm not sure I understand your question. There's no negative scaling with frequency for the same design. Perf scales with frequency from anywhere between 0 and 100%. Such standard benchmarks as SPEC CPU2006 scale about 90% with frequency. If you have four threads to schedule, though, I could make up a workload that performs better on 4 cores at 1/4 the speed. For instance, four memory latency bound tests (transversal of a 1GB linked list of 1MB objects).
[*]Is each active core independently overclocked or are they overclocked as a set? (ie. do active cores always run at the same clock speed?) I'm curious since independent overclocking could benefit performance depending on thread dependencies and lifetimes but should incur massive additional complexity and likely diminishing returns due to the super-linear relations mentioned before.
I think it's safe to say that all four cores can operate at difference frequencies.
[*] What's with the extra hiphop music starting at 51:19 and lasting to the end?

[/list]
Also, woo for static CMOS instead of taking advantage of domino/capacitance based circuit solutions.
[1] Sucked that I couldn't watch it in full screen due to the slides, nor could I resize it with Firefox 3 since WMP video isn't scaled!
[2] I felt bad for the (vague)/(poorly worded) questions Rajesh Kumar had to answer
[3] Say a third-party heatsink by Thermalright plus some sweet high CFM low acoustic dB fans (or a peltier, but that's not exactly low power...)
Intel is too funky for you.
Also, don't confuse system power with socket power. The latter is solely a function of the CPU, not the cooling or anything else, and that's the important quantity for the above discussion.
Disclaimer: The postings on this site are my own and don't necessarily represent Intel's positions, strategies, or opinions.