Barcelona

Post by **Jonathan** » Mon Jun 11, 2007 11:13 pm

http://www.anandtech.com/tradeshows/sho ... i=3006&p=2

quantus · Post by **quantus** » Mon Jun 11, 2007 11:45 pm

The reason for this is getting the server oriented Barcelona to work reliably in 2P or above configurations is a leading factor in the reduced core speeds expected at launch.

Do you know what sort of cache coherency, if any, they're using? It's either something related to the data cache or they messed up their simulation of the chip to chip interconnect somehow.

Post by **Jonathan** » Tue Jun 12, 2007 12:52 am

Caveat: I know diddly about AMD uarch. However, I still more know than you.

It very much sounds like an HT speedpath, because the chipset link and the remote sockets links are not the same. But jeez, it's a different clock. Lowering your core clock simply won't improve your HT frequency unless it's a heat problem. That doesn't make any sense. Perhaps they have some logic operating on a core clock that handles transactions...

The last level cache (L3, I suppose) on Barcelona is neither inclusive nor exclusive. It is a large victim cache holding data evicted from the core caches. I would assume it uses MESI, but if I've been shown the info I have forgotten it.

quantus · Post by **quantus** » Tue Jun 12, 2007 1:22 am

Dwindlehop wrote:Caveat: I know diddly about AMD uarch. However, I still more know than you.

It very much sounds like an HT speedpath, because the chipset link and the remote sockets links are not the same. But jeez, it's a different clock. Lowering your core clock simply won't improve your HT frequency unless it's a heat problem. That doesn't make any sense. Perhaps they have some logic operating on a core clock that handles transactions...

The last level cache (L3, I suppose) on Barcelona is neither inclusive nor exclusive. It is a large victim cache holding data evicted from the core caches. I would assume it uses MESI, but if I've been shown the info I have forgotten it.

Maybe it is a HT speed path and they're doing something funky like counting cycles for an invalidate signal to reach the other caches and declaring exclusive locally after that much time so they don't have to wait for an ACK. Slowing down the core gives HT more time. Anyways, at the top level, it still sounds like a coherency problem. The root cause will be pretty much impossible for us to diagnose here.

Aren't victim caches exclusive typically? As you said, you know better than me, but "holding data evicted from the core caches" sounds exclusive. Also, I would think that a large victim cache would be pretty damn slow since it would typically have high associativity, especially as the ratio of L3 to L1/2 increases... What are the sizes of the L1/2 and L3 caches?

Post by **Jonathan** » Tue Jun 12, 2007 1:48 am

64k il1/dl1
512k ul2
2mb per core l3, but even as I type it it seems high. maybe it's 1mb per core, but I'm pretty sure it's 2.

Post by **Jonathan** » Tue Jun 12, 2007 2:00 am

A victim cache needn't dump the line to a core on a hit, but an exclusive cache must. I could be wrong, but I recall their l3 being neither inclusive nor exclusive.

Post by **Jonathan** » Thu Jun 14, 2007 7:20 pm

http://theinquirer.net/default.aspx?article=40348
http://www.theinquirer.net/default.aspx?article=40347

Edit: I was on bad crack. Expect Phenoms for Christmas 2007

Post by **Jonathan** » Tue Jul 10, 2007 6:04 pm

http://online.wsj.com/article/SB1183100 ... YWORDS=AMD

Shipping parts in August, systems available in September. Launch frequency is 2.0 GHz.

Post by **Jonathan** » Tue Jul 10, 2007 6:08 pm

Dwindlehop wrote:64k il1/dl1
512k ul2
2mb per core l3, but even as I type it it seems high. maybe it's 1mb per core, but I'm pretty sure it's 2.

It's 2MB L3 (not 2MB per core). This is why the number was weird.

http://www.anandtech.com/cpuchipsets/sh ... i=2939&p=9

Post by **Jonathan** » Tue Jul 10, 2007 6:14 pm

Dwindlehop wrote:A victim cache needn't dump the line to a core on a hit, but an exclusive cache must. I could be wrong, but I recall their l3 being neither inclusive nor exclusive.

I am 100% right.

The new L3 cache, acts as a victim for the L2 cache. So when the small L2 cache fills up, evicted data is sent to the larger L3 cache where it is kept until space is needed. The algorithms that govern the L3 cache's operation are designed to accommodate data that is likely to be needed by multiple cores. If the CPU fetches a bit of code, a copy is left in the L3 cache since the code is likely to be shared among the four cores. Pure data load requests however go through a separate process. The cache controller looks at history and if the data has been shared before, a copy will be left in the L3 cache; otherwise it will be invalidated.

So new fetches fill into the L2 (not L3). They will eventually be evicted into the L3. If the line is hit from L3, sometimes it is invalidated and sometimes it isn't. Therefore, the L3 is neither exclusive nor inclusive.

quantus · Post by **quantus** » Wed Jul 11, 2007 12:33 am

Interesting, so it saves from having to go to each other core's cache directly, and allows you to just read out of the L3 instead. Then I guess the L3 manages the exclusivity for writing data by knowing which cores have a copy of the data?

Post by **Jonathan** » Wed Jul 11, 2007 1:12 am

Seems like the L3 can't have the line in E if a core does an RFO, so I doubt they bother with core valid bits. It's sorta like a third caching agent. If the line is in this L3, you don't need to snoop the cores. If it's not, you do. You couldn't really get any benefit from core valid bits since the MESI state sorts you out.

Now, in an inclusive cache, core valid bits would help because the cache could filter unnecessary snoops to the cores. The inclusivity would guarantee correctness.

Post by **Jonathan** » Fri Jul 13, 2007 12:17 am

http://www.dailytech.com/AMD+to+Can+Sin ... le7960.htm

New prices cuts and EOL announcements. If you have an existing AMD platform you want to upgrade, now is probably the time to pounce.

Post by **Jonathan** » Tue Sep 11, 2007 6:05 pm

Barcelona Opterons are out of NDA this week. AMD is already sampling 2.5 GHz Opterons, so I guess they worked out the kinks, just not in time to launch.

http://www.anandtech.com/cpuchipsets/sh ... i=3092&p=5

This is a review pitting the Barcelona versus the old Opteron on the same motherboard on desktop benchmarks. Barcelona gets about 15% perf per clock over the old Opteron on most apps. The games are some of the better outliers. Expectation is that this trend will hold true for Phenom launch.

In other news, Penryn Xeons launch in November. I don't think I've seen the date for desktop Penryns.