At the top of the story, you can see an image of the Pentium M (Centrino/Dothan) chip the entire left side of the die is dedicated to a massive L2 cache. Past a certain point, it makes more sense to spend the chip’s power budget and transistor count on more execution units, better branch prediction, or additional cores. At six transistors per bit of SRAM (6T), cache is also expensive (in terms of die size, and therefore dollar cost). Larger caches are both slower and more expensive. It might seem logical, then, to devote huge amounts of on-die resources to cache - but it turns out there’s a diminishing marginal return to doing so.
![sl cache viewer most images broken sl cache viewer most images broken](https://cache.mrporter.com/variants/images/28941591746277045/in/w960_q60.jpg)
The red line is the chip with an L4 - note that for large file sizes, it’s still almost twice as fast as the other two Intel chips. Each stair step represents a new level of cache. This chart from Anandtech’s Haswell review is useful because it illustrates the performance impact of adding a huge (128MB) L4 cache as well as the conventional L1/L2/元 structures. Ryzen’s L1 instruction cache is 4-way associative, while the L1 data cache is 8-way set associative. An eight-way associative cache means that each block of main memory could be in one of eight cache blocks. A 2-way associative cache (Piledriver’s L1 is 2-way) means that each main memory block can map to one of two cache blocks. In between these two extremes are n-way associative caches. This type of cache can be searched extremely quickly, but since it maps 1:1 to memory locations, it has a low hit rate. A direct-mapped cache is a cache where each cache block can contain one and only one block of main memory. The advantage of such a system is that the hit rate is high, but the search time is extremely long - the CPU has to look through its entire cache to find out if the data is present before searching main memory.Īt the opposite end of the spectrum, we have direct-mapped caches. If a cache is fully associative, it means that any block of RAM data can be stored in any block of cache. The tag RAM is a record of all the memory locations that can map to any given block of cache. Every CPU contains a specific type of RAM called tag RAM. The next important topic is the set-associativity. Most modern L1 cache rates have hit rates far above the theoretical 50 percent shown here - Intel and AMD both typically field cache hit rates of 95 percent or higher. A larger, slower, cheaper L2 can provide all the benefits of a large L1, but without the die size and power consumption penalty. Note that the total hit rate goes up sharply as the size of the L2 increases. This chart shows the relationship between an L1 cache with a constant hit rate, but a larger L2 cache. If data can’t be found in the L2 cache, the CPU continues down the chain to 元 (typically still on-die), then L4 (if it exists) and main memory (DRAM).
![sl cache viewer most images broken sl cache viewer most images broken](https://els-jbs-prod-cdn.jbs.elsevierhealth.com/cms/attachment/edf161f3-fa7c-4569-ad65-cfff1338fa5d/gr3_lrg.jpg)
Some processors use an inclusive cache design (meaning data stored in the L1 cache is also duplicated in the L2 cache) while others are exclusive (meaning the two caches never share data). This is where the L2 cache comes into play - while it’s slower, it’s also much larger. The goal of the cache system is to ensure that the CPU has the next bit of data it will need already loaded into cache by the time it goes looking for it (also called a cache hit).Ī cache miss, on the other hand, means the CPU has to go scampering off to find the data elsewhere. Which information is loaded into cache depends on sophisticated algorithms and certain assumptions about programming code. How Caching WorksĬPU caches are small pools of memory that store information the CPU is most likely to need next. While it only runs up to 2000, the growing discrepancies of the 1980s led to the development of the first CPU caches.