Diagonal SpMVM case study
-
- Grand Pooh-Bah
- Posts: 6722
- Joined: Tue Sep 19, 2006 8:45 pm
- Location: Portland, OR
- Contact:
Re: Diagonal SpMVM case study
They break through a bandwidth barrier by repurposing the texture sampler hardware to load & cache the matrix values.
Disclaimer: The postings on this site are my own and don't necessarily represent Intel's positions, strategies, or opinions.
-
- Tenth Dan Procrastinator
- Posts: 4891
- Joined: Fri Jul 18, 2003 3:09 am
- Location: San Jose, CA
Re: Diagonal SpMVM case study
That trick requires that there be no more than 256MB (8k*8k*4) of data that is used very often by pretty much every other calculation. I think that data has to be read-only as well. In this case, that allowed them to get rid of half the load operations and essentially double the bandwidth spent on loading the other larger dataset containing the diagonal values.
So, in summary, they got their first doubling of perf by vectorizing the computation and hitting the memory bandwidth limit. Then they figure out that they're wasting half the loads and cache that data instead to double the perf again.
So, in summary, they got their first doubling of perf by vectorizing the computation and hitting the memory bandwidth limit. Then they figure out that they're wasting half the loads and cache that data instead to double the perf again.