2. Inclusive caches

locality of reference, the concept of caches was introduced in computers. The data/instruction from main memory is fetched into cache before accessed by the CPU. This improves the performance. The memory hierarchy thus runs from registers, caches, main memory, secondary memory, tertiary memory, etc. The performance of memory system is measured as average memory access time (AMAT). Various cache levels are prevalent in modern processors.

A processor cache is denoted by the tuple (C, k, L) where C is the capacity, k the associativity and L the line size. Based on the various values of k, three types of caches are known. These are direct mapped cache with k = 1, set associative cache with k > 1, fully associative cache with one set and n blocks. An address a is mapped to set given by a mod S, with tag value a div S for S sets. If a line is present in cache, it is cache hit, else it is cache miss. Cache misses are of three kinds: cold, capacity and conflict. A cache miss for first occurrence is called cold miss. The difference between misses in cache and fully associative cache of same capacity is the capacity miss. If a line maps to occupied set or way, it is called conflict miss. A computer system usually has many cache levels. A line can reside in cache level and higher cache levels in inclusive caches. A line resides in only one cache level in exclusive caches. Usually processors have caches dedicated to instructions and data separately. These are called instruction cache and data cache, respectively. Certain systems have same cache for both instruction and data. These are called unified caches. A system with caches of two or more kinds (direct mapped, set associative, fully associative) is called multilateral or hybrid cache. As the address mapping may vary with each cache level, the average memory access time (AMAT) is used to measure

As the number of computer components active increases, the energy consumed also increases. The power consumed by cache depends on number of active components. The energy is given

The performance of CPU caches is measured by execution time for various applications. Benchmarks are used to measure the CPU performance. The SPEC2000 benchmarks are one of standard benchmarks. The integer benchmark suites of SPEC2K are given as follows:

181.mcf Combinatorial optimization

300.twolf Place and route simulator 255.vortex Object-oriented database

175.vpr FPGA circuit placement and routing

The memory performance is improved by adding caches. The inclusive, exclusive and twotype data cache models are presented in this chapter. The proposed models are simulated using SPEC2000 benchmarks. The benchmarks are run using Simplescalar Toolkit for simula-

<sup>2</sup> cv<sup>2</sup>f for electronic component

as E = power time. The energy is given by the formula E = <sup>1</sup>

Name Description 256.bzip2 Compression

197.parser Word processing

where c is the capacity, v the voltage and f the frequency.

the cache performance.

268 Management of Information Systems

tions.

Consider cache system of n cache levels, main memory. Let the cache levels are L1, L2, ::, Ln. Let the cache be inclusive. Then, L1⊆L2⊆L3⊆…⊆Ln. Denote this system as Cincl. This is shown in Figure 1. The cache sizes grow with the level number [1, 2]. Consider three-level cache system.

Let the levels be L1, L2, L3. Let an address trace have R references. Let h1, h2, h<sup>3</sup> be number of hits in level one, level two, level three, respectively. Let t1, t2, t3, t21, t<sup>32</sup> be the access time to level one, level two, level three, transfer time between level two and level one and transfer time between level three and level two cache levels, respectively. Let M be the miss penalty. The average memory access time is given by

$$AMAT(\mathbf{C}\_{\rm ind}) = \frac{1}{R}(h\_1t\_1 + h\_2(t\_1 + t\_2 + t\_{21}) + h\_3(t\_1 + t\_2 + t\_3 + t\_{32} + t\_{21}) + (R - h\_1 - h\_2 - h\_3)M) \tag{1}$$

The first three terms in Eq. (1) are the access time of level one, level two and level three cache hits. The last term is the miss penalty. This expression can be extended to any number of cache levels.

Energy consumed in cache depends on number of active components. The individual lines can be selectively switched on in caches using certain software techniques or hardware circuits. The total power consumed can be reduced through this technique. Consider w-way set associative cache of S sets. Let the power consumed per line be p watts. The total power consumed is wpS. Consider a circuit which enables lines if occupied. This is shown in Figure 2. If the power consumed by the circuit is q watts, the number of occupied lines is y, the total power consumed is q + yp. An improvement in power consumption is observed if q + yp < wpS.

Power saving using software techniques involves mapping lines to fixed ways by address mapping techniques [4].


Figure 1. Inclusive cache of n levels.

Figure 2. Sequential circuit in cache way to save power consumption. For details, refer [3].
