The Radeon RX 6000-series cards announced so far increased the size of the GPU up to 80 CUs from a more modest 40 while also increasing the operating frequency by up to 1.3x. That means it has a lot of processing power at its disposal, and is ever hungry for data streamed in from VRAM; if you can’t provide that data quickly enough then the render pipeline effectively bottlenecks, throttling performance.
There are two obvious solutions. The first is a natural progression of memory development - improved speed - which the RX 6000-series takes advantage of with new GDDR6 chips clocking up to 16Gbps (for 512 GB/s total bandwidth). The second is to widen the memory bus from 256-bits to 384 or 512-bits, but that is expensive in terms of development cost, silicon area and power consumption. AMD however took cues from Zen architecture design to devise a 3rd way.
Infinity Cache is a new oversized last-level cache situated between L2 and VRAM, roughly comparable to the L3 cache of AMD’s new Zen 3 processors. It’s also huge in capacity - 128MB on Navi 21 - dwarfing the 4MB L2 cache present on the GPU. However techniques developed for Zen’s server L3 cache design allows Infinity Cache to be far more dense than GPU L2, ensuring that the overall die size doesn’t bloat to a level that significantly impacts yields.
Like Zen 3, this large cache can connect to the GPU at Infinity Fabric speeds (of up to 1.94 GHz) and latencies far lower than VRAM, leading to an average reduction in overall latency of 34% and up to 4x the peak bandwidth of GDDR6 (when the memory is connected over a 256-bit bus). Furthermore, accessing data held in Infinity Cache is much cheaper power-wise than VRAM, making the process more efficient while also improving performance.
Another advantage of leveraging Infinity Fabric as a technology is that fabric frequencies can be opportunistically increased to meet higher demands, or tuned downwards when an application is not bandwidth constrained. This level of fine control once again improves power efficiency while not impacting performance, so long as the controller is in place to make fast (preferably predictive) changes to fabric clocks.
To put all that into context: when compared to a theoretical GDDR6 over a 384-bit bus (a potential alternative configuration considered for Big Navi) AMD project ~2.4x the effective bandwidth/watt through this implementation.
AMD SMART ACCESS MEMORY
With RDNA 2 AMD are also taking the opportunity to attack another aspect of the rendering pipeline: CPU access to GPU memory.In Windows environments, the CPU can typically only map a fraction of the Video Memory at once, determined by a core aspect of the PCI Express specification known as the Base Address Register (BAR). Modern PCs usually limit access to 256MB at once, far less than the 8GB or more available to an enthusiast-class card.
The value of this data and the CPU’s need to access it varies significantly by game, but it can be a substantial bottleneck to performance. That’s where AMD Smart Access Memory comes in.
Simply put, it unlocks CPU access to the entirety of GPU memory allowing it to map the entire pool and draw more data over the PCI-Express Bus. To enable this feature AMD have leveraged their desktop motherboard and CPU platform to unlock resizing of the BAR and removing GPU-side impediments.
At launch the feature is available on system configurations consisting of a RX 6000-series GPU, X570 motherboard and Ryzen 5000-series CPU updated to the latest BIOS release. BIOSes unlocking the feature are in development for B550 motherboards, and AMD have left open the possibility of making it available to other platforms.
There’s currently some debate whether this feature is likely to be exclusive to AMD for long; indeed their competition have already raised the idea of implementing it on their own graphics hardware. But it’s a sign that AMD’s unique position in the market makes novel solutions like this more feasible, rather than negotiating independent standards with competing brands.
AMD’s internal testing projects that this feature could provide benefits from 0 to up to 18%, but they are highly variable. Still, it’s a welcome addition for owners of an all-AMD system.
UEFI BIOSes unlocking this feature on X570 should now be available for all models from all vendors. Once installed, simply Enable both ‘Above 4G Encode and Re-Size BAR Support’ within the BIOS settings (location varies based on motherboard model & vendor, but it’s typically within the PCIe settings page).