Page 1 of 1

Diagonal SpMVM case study

Posted: Tue May 25, 2010 6:55 pm
by Jonathan

Re: Diagonal SpMVM case study

Posted: Tue May 25, 2010 11:56 pm
by Jonathan
They break through a bandwidth barrier by repurposing the texture sampler hardware to load & cache the matrix values.

Re: Diagonal SpMVM case study

Posted: Wed May 26, 2010 12:47 am
by quantus
That trick requires that there be no more than 256MB (8k*8k*4) of data that is used very often by pretty much every other calculation. I think that data has to be read-only as well. In this case, that allowed them to get rid of half the load operations and essentially double the bandwidth spent on loading the other larger dataset containing the diagonal values.

So, in summary, they got their first doubling of perf by vectorizing the computation and hitting the memory bandwidth limit. Then they figure out that they're wasting half the loads and cache that data instead to double the perf again.