Barcelona

Just the urls, ma'am.
Post Reply
Jonathan
Grand Pooh-Bah
Posts: 6722
Joined: Tue Sep 19, 2006 8:45 pm
Location: Portland, OR
Contact:

Barcelona

Post by Jonathan »


quantus
Tenth Dan Procrastinator
Posts: 4891
Joined: Fri Jul 18, 2003 3:09 am
Location: San Jose, CA

Post by quantus »

The reason for this is getting the server oriented Barcelona to work reliably in 2P or above configurations is a leading factor in the reduced core speeds expected at launch.
Do you know what sort of cache coherency, if any, they're using? It's either something related to the data cache or they messed up their simulation of the chip to chip interconnect somehow.
Have you clicked today? Check status, then: People, Jobs or Roads

Jonathan
Grand Pooh-Bah
Posts: 6722
Joined: Tue Sep 19, 2006 8:45 pm
Location: Portland, OR
Contact:

Post by Jonathan »

Caveat: I know diddly about AMD uarch. However, I still more know than you.

It very much sounds like an HT speedpath, because the chipset link and the remote sockets links are not the same. But jeez, it's a different clock. Lowering your core clock simply won't improve your HT frequency unless it's a heat problem. That doesn't make any sense. Perhaps they have some logic operating on a core clock that handles transactions...

The last level cache (L3, I suppose) on Barcelona is neither inclusive nor exclusive. It is a large victim cache holding data evicted from the core caches. I would assume it uses MESI, but if I've been shown the info I have forgotten it.
Disclaimer: The postings on this site are my own and don't necessarily represent Intel's positions, strategies, or opinions.

quantus
Tenth Dan Procrastinator
Posts: 4891
Joined: Fri Jul 18, 2003 3:09 am
Location: San Jose, CA

Post by quantus »

Dwindlehop wrote:Caveat: I know diddly about AMD uarch. However, I still more know than you.

It very much sounds like an HT speedpath, because the chipset link and the remote sockets links are not the same. But jeez, it's a different clock. Lowering your core clock simply won't improve your HT frequency unless it's a heat problem. That doesn't make any sense. Perhaps they have some logic operating on a core clock that handles transactions...

The last level cache (L3, I suppose) on Barcelona is neither inclusive nor exclusive. It is a large victim cache holding data evicted from the core caches. I would assume it uses MESI, but if I've been shown the info I have forgotten it.
Maybe it is a HT speed path and they're doing something funky like counting cycles for an invalidate signal to reach the other caches and declaring exclusive locally after that much time so they don't have to wait for an ACK. Slowing down the core gives HT more time. Anyways, at the top level, it still sounds like a coherency problem. The root cause will be pretty much impossible for us to diagnose here.

Aren't victim caches exclusive typically? As you said, you know better than me, but "holding data evicted from the core caches" sounds exclusive. Also, I would think that a large victim cache would be pretty damn slow since it would typically have high associativity, especially as the ratio of L3 to L1/2 increases... What are the sizes of the L1/2 and L3 caches?
Have you clicked today? Check status, then: People, Jobs or Roads

Jonathan
Grand Pooh-Bah
Posts: 6722
Joined: Tue Sep 19, 2006 8:45 pm
Location: Portland, OR
Contact:

Post by Jonathan »

64k il1/dl1
512k ul2
2mb per core l3, but even as I type it it seems high. maybe it's 1mb per core, but I'm pretty sure it's 2.

Jonathan
Grand Pooh-Bah
Posts: 6722
Joined: Tue Sep 19, 2006 8:45 pm
Location: Portland, OR
Contact:

Post by Jonathan »

A victim cache needn't dump the line to a core on a hit, but an exclusive cache must. I could be wrong, but I recall their l3 being neither inclusive nor exclusive.

Jonathan
Grand Pooh-Bah
Posts: 6722
Joined: Tue Sep 19, 2006 8:45 pm
Location: Portland, OR
Contact:

Post by Jonathan »

Last edited by Jonathan on Wed Jul 11, 2007 1:18 am, edited 1 time in total.

Jonathan
Grand Pooh-Bah
Posts: 6722
Joined: Tue Sep 19, 2006 8:45 pm
Location: Portland, OR
Contact:

Post by Jonathan »

http://online.wsj.com/article/SB1183100 ... YWORDS=AMD

Shipping parts in August, systems available in September. Launch frequency is 2.0 GHz.

Jonathan
Grand Pooh-Bah
Posts: 6722
Joined: Tue Sep 19, 2006 8:45 pm
Location: Portland, OR
Contact:

Post by Jonathan »

Dwindlehop wrote:64k il1/dl1
512k ul2
2mb per core l3, but even as I type it it seems high. maybe it's 1mb per core, but I'm pretty sure it's 2.
It's 2MB L3 (not 2MB per core). This is why the number was weird.

http://www.anandtech.com/cpuchipsets/sh ... i=2939&p=9

Jonathan
Grand Pooh-Bah
Posts: 6722
Joined: Tue Sep 19, 2006 8:45 pm
Location: Portland, OR
Contact:

Post by Jonathan »

Dwindlehop wrote:A victim cache needn't dump the line to a core on a hit, but an exclusive cache must. I could be wrong, but I recall their l3 being neither inclusive nor exclusive.
I am 100% right.
The new L3 cache, acts as a victim for the L2 cache. So when the small L2 cache fills up, evicted data is sent to the larger L3 cache where it is kept until space is needed. The algorithms that govern the L3 cache's operation are designed to accommodate data that is likely to be needed by multiple cores. If the CPU fetches a bit of code, a copy is left in the L3 cache since the code is likely to be shared among the four cores. Pure data load requests however go through a separate process. The cache controller looks at history and if the data has been shared before, a copy will be left in the L3 cache; otherwise it will be invalidated.
So new fetches fill into the L2 (not L3). They will eventually be evicted into the L3. If the line is hit from L3, sometimes it is invalidated and sometimes it isn't. Therefore, the L3 is neither exclusive nor inclusive.

quantus
Tenth Dan Procrastinator
Posts: 4891
Joined: Fri Jul 18, 2003 3:09 am
Location: San Jose, CA

Post by quantus »

Interesting, so it saves from having to go to each other core's cache directly, and allows you to just read out of the L3 instead. Then I guess the L3 manages the exclusivity for writing data by knowing which cores have a copy of the data?
Have you clicked today? Check status, then: People, Jobs or Roads

Jonathan
Grand Pooh-Bah
Posts: 6722
Joined: Tue Sep 19, 2006 8:45 pm
Location: Portland, OR
Contact:

Post by Jonathan »

Seems like the L3 can't have the line in E if a core does an RFO, so I doubt they bother with core valid bits. It's sorta like a third caching agent. If the line is in this L3, you don't need to snoop the cores. If it's not, you do. You couldn't really get any benefit from core valid bits since the MESI state sorts you out.

Now, in an inclusive cache, core valid bits would help because the cache could filter unnecessary snoops to the cores. The inclusivity would guarantee correctness.

Jonathan
Grand Pooh-Bah
Posts: 6722
Joined: Tue Sep 19, 2006 8:45 pm
Location: Portland, OR
Contact:

Post by Jonathan »

http://www.dailytech.com/AMD+to+Can+Sin ... le7960.htm

New prices cuts and EOL announcements. If you have an existing AMD platform you want to upgrade, now is probably the time to pounce.

Jonathan
Grand Pooh-Bah
Posts: 6722
Joined: Tue Sep 19, 2006 8:45 pm
Location: Portland, OR
Contact:

Post by Jonathan »

Barcelona Opterons are out of NDA this week. AMD is already sampling 2.5 GHz Opterons, so I guess they worked out the kinks, just not in time to launch.

http://www.anandtech.com/cpuchipsets/sh ... i=3092&p=5

This is a review pitting the Barcelona versus the old Opteron on the same motherboard on desktop benchmarks. Barcelona gets about 15% perf per clock over the old Opteron on most apps. The games are some of the better outliers. Expectation is that this trend will hold true for Phenom launch.

In other news, Penryn Xeons launch in November. I don't think I've seen the date for desktop Penryns.
Disclaimer: The postings on this site are my own and don't necessarily represent Intel's positions, strategies, or opinions.

Post Reply