

CS 271 Computer Architecture Purdue University Fort Wayne

#### Chapter 6 Objectives

- Master the concepts of hierarchical memory organization.
- Understand how each level of memory contributes to system performance, and how the performance is measured.
- Master the concepts behind cache memory, virtual memory, memory segmentation, paging and address translation.

#### 6.1 Introduction

- Memory lies at the heart of the stored-program computer.
- In previous chapters, we studied the components from which memory is built and the ways in which memory is accessed by various ISAs.
- In this chapter, we focus on memory organization. A clear understanding of these ideas is essential for the analysis of system performance.

3

#### Outline

- Types of memory and the memory hierarchy
- □ Cache memory
- Virtual memory

## 6.2 Types of Memory

- □ There are two kinds of main memory: *random* access memory, *RAM*, and read-only-memory, *ROM*.
- There are two types of RAM, dynamic RAM (DRAM) and static RAM (SRAM).
- DRAM consists of capacitors that slowly leak their charge over time. Thus, they must be refreshed every few milliseconds to prevent data loss.
- DRAM is "cheap" memory owing to its simple design.

### 6.2 Types of Memory

- □ SRAM consists of circuits similar to the D flip-flop that we studied in Chapter 3.
- □ SRAM is very fast memory and it doesn't need to be refreshed like DRAM does. It is used to build cache memory, which we will discuss in detail later.
- ROM also does not need to be refreshed, either. In fact, it needs very little charge to retain its memory.
- ROM is used to store permanent, or semi-permanent data that persists even while the system is turned off.

# 6.3 The Memory Hierarchy

- Generally speaking, faster memory is more expensive than slower memory.
- To provide the best performance at the lowest cost, memory is organized in a hierarchical fashion.
- Small, fast storage elements are kept in the CPU, larger, slower main memory is accessed through the data bus.
- Larger, (almost) permanent storage in the form of disk and tape drives is still further from the CPU.



# 6.3 The Memory Hierarchy

- □ We are most interested in the memory hierarchy that involves registers, cache, main memory, and virtual memory.
- Registers are storage locations available on the processor itself.
- Virtual memory is typically implemented using a hard drive; it extends the address space from RAM to the hard drive.
- Virtual memory provides more space: Cache memory provides speed.

#### 6.3 The Memory Hierarchy

- To access a particular piece of data, the CPU first sends a request to its nearest memory, usually cache.
- □ If the data is not in cache, then main memory is queried. If the data is not in main memory, then the request goes to disk.
- Once the data is located, then the data, and a number of its nearby data elements are fetched into cache memory.

#### Outline

- Types of memory and the memory hierarchy
- □ Cache memory
- Virtual memory

#### 6.4 Cache Memory

- The purpose of cache memory is to speed up accesses by storing recently used data closer to the CPU, instead of storing it in main memory.
- Although cache is much smaller than main memory, its access time is a fraction of that of main memory.
- □ Three types of cache:
  - Direct mapped cache
  - Fully associative cache
  - Set associative cache

13

14

#### 6.4 Cache Memory

- □ The simplest cache mapping scheme is *direct mapped cache*.
- □ In a direct mapped cache consisting of *N* blocks of cache, block *X* of main memory maps to cache block *Y* = *X* mod *N*.
- □ Thus, if we have 10 blocks of cache, block 7 of cache may hold blocks 7, 17, 27, 37, . . . of main memory.

The next slide illustrates this mapping.

















| 6.4 Cache Men                      | nory      |             |              |
|------------------------------------|-----------|-------------|--------------|
| EXAMPLE 6.3 Cont'd Th              | ne mappii | ng for memo | ry           |
| references is shown belo           | ow:       | •           | -            |
|                                    | 1 bit     | 2 bits      | 1 bit        |
|                                    | Tag       | Block       | Offset       |
|                                    | <         | 4 bits      |              |
| Main Memory                        | м         | laps To     | Cache        |
| (000) Block 0 (addresses 0x0, 0x1) |           |             | Block 0 (00) |
| (001) Block 1 (addresses 0x2, 0x3) |           |             | Block 1 (01) |
| 010) Block 2 (addresses 0x4, 0x5)  |           |             | Block 2 (10) |
| (011) Block 3 (addresses 0x6, 0x7) |           |             | Block 3 (11) |
| (100) Block 4 (addresses 0x8, 0x9) |           |             | Block 0 (00) |
| (101) Block 5 (addresses 0xA, 0xB) |           |             | Block 1 (01) |
| (110) Block 6 (addresses 0xC, 0xD) |           |             | Block 2 (10) |
| (111) Block 7 (addresses 0xE, 0xF) |           |             | Block 3 (11) |

| 6.4 0                                | Cache M                                                                   | 1emory                            |                                                            |    |  |  |  |
|--------------------------------------|---------------------------------------------------------------------------|-----------------------------------|------------------------------------------------------------|----|--|--|--|
| 64 blo<br>We ha<br>3 b<br>6 b<br>7 b | cks of cache<br>ve:<br>its for the of<br>its for the bl<br>its for the ta | where each bloch<br>ffset<br>lock | ory addresses and<br>k contains 8 bytes.<br>os as follows: |    |  |  |  |
| 0x0404 =                             | 0x0404 = 0000010 000000 100                                               |                                   |                                                            |    |  |  |  |
|                                      | Tag                                                                       | Block                             | Offset                                                     | 26 |  |  |  |

28



- In summary, direct mapped cache maps main memory blocks in a modular fashion to cache blocks. The mapping depends on:
- The number of bits in the main memory address (how many addresses exist in main memory)
- The number of blocks are in cache (which determines the size of the block field)
- How many addresses (either bytes or words) are in a block (which determines the size of the offset field)

27



- Suppose instead of placing memory blocks in specific cache locations based on memory address, we could allow a block to go anywhere in cache.
- In this way, cache would have to fill up before any blocks are evicted.
- □ This is how *fully associative* cache works.
- A memory address is partitioned into only two fields: the tag and the word.

<text><text><text><list-item><list-item>









#### Outline

- Types of memory and the memory hierarchy
- Cache memory
- Virtual memory

#### 6.5 Virtual Memory

- Cache memory enhances performance by providing faster memory access speed.
- Virtual memory enhances performance by providing greater memory capacity, without the expense of adding main memory.
- □ Instead, a portion of a disk drive serves as an extension of main memory.
- □ If a system uses paging, virtual memory partitions main memory into individually managed *page frames*, that are written (*or paged*) to disk when they are not immediately needed.

#### 6.5 Virtual Memory

- A physical address is the actual memory address of physical memory.
- Programs create virtual addresses that are mapped to physical addresses by the memory manager.
- □ *Page faults* occur when a logical address requires that a page be brought in from disk.
- Memory fragmentation occurs when the paging process results in the creation of small, unusable clusters of memory addresses.

57

55

#### 6.5 Virtual Memory

- Main memory and virtual memory are divided into equal sized pages.
- □ The entire address space required by a process need not be in memory at once. Some parts can be on disk, while others are in main memory.
- Further, the pages allocated to a process do not need to be stored contiguously -- either on disk or in memory.
- □ In this way, only the needed pages are in memory at any time, the unnecessary pages are in slower disk storage.

60



#### 6.5 Virtual Memory

- When a process generates a virtual address, the operating system translates it into a physical memory address.
- □ To accomplish this, the virtual address is divided into two fields: A *page* field, and an *offset* field.
- □ The page field determines the page location of the address, and the offset indicates the location of the address within the page.
- □ The logical page number is translated into a physical page frame through a lookup in the page table.

# 6.5 Virtual Memory

□ If the valid bit is zero in the page table entry for the logical address, this means that the page is not in memory and must be fetched from disk.

- This is a page fault.
- If necessary, a page is evicted from memory and is replaced by the page retrieved from disk, and the valid bit is set to 1.
- □ If the valid bit is 1, the virtual page number is replaced by the physical frame number.
- □ The data is then accessed by adding the offset to the physical frame number.

#### 6.5 Virtual Memory □ As an example, suppose a system has a virtual address space of 8K and a physical address space of 4K, and the system uses byte addressing. • We have $2^{13}/2^{10} = 2^3$ virtual pages. A virtual address has 13 bits (8K = 2<sup>13</sup>) with 3 bits for the page field and 10 for the offset, because the page size is 1024. □ A physical memory address requires 12 bits, the first two bits for the page frame and the trailing 10 bits the offset. Virtual Address Physical Address 13 12 Page Offset Frame Offset 10 10 à 2





| <ul> <li>6.5 Virtual Memory</li> <li>The address 101010101011<sub>2</sub> is converted to physical address 010101010011<sub>2</sub> = 0x553 because the page field 101 is replaced by frame number 01 through a lookup in the page table.</li> </ul> |            |              |      |                  |              |         |        |  |  |  |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|--------------|------|------------------|--------------|---------|--------|--|--|--|
|                                                                                                                                                                                                                                                      | Page Table |              |      |                  |              |         |        |  |  |  |
| Page                                                                                                                                                                                                                                                 | Frame      | Valid<br>Bit |      |                  | Add          | dresses |        |  |  |  |
| 0                                                                                                                                                                                                                                                    | -          | 0            | Page | Base             | 10           | Ba      | ise 16 |  |  |  |
| 1                                                                                                                                                                                                                                                    | 3          | 1            | 0:   | -                | 1023         | *       | 3FF    |  |  |  |
| 2                                                                                                                                                                                                                                                    | 0          | 1            |      | 1024 -           |              | 400 -   |        |  |  |  |
| 3                                                                                                                                                                                                                                                    | -          | 0            |      |                  | 3071         |         |        |  |  |  |
| 4                                                                                                                                                                                                                                                    | _          | 0            |      |                  | 4095         |         |        |  |  |  |
| 5                                                                                                                                                                                                                                                    | 1          | 1            |      | 4096 -<br>5120 - | 5119<br>6143 |         |        |  |  |  |
| - 6                                                                                                                                                                                                                                                  | 2          | 1            | 6.   | 6144 -           |              | 1800 -  |        |  |  |  |
| 7                                                                                                                                                                                                                                                    | -          | 0            | 7:   | 7168 -           |              |         | 1FFF   |  |  |  |

#### 6.5 Virtual Memory

□ What happens when the CPU generates address 100000000100<sub>2</sub>?

| Page | Page Tab<br>Frame | ole<br>Valid<br>Bit |     |    |      |    | Ado  | dresses |     |       |
|------|-------------------|---------------------|-----|----|------|----|------|---------|-----|-------|
| 0    | -                 | 0                   | Pag | ſe | Ba   | se | 10   |         | 3as | se 16 |
| 1    | 3                 | 1                   | 0   | :  | 0    | -  | 1023 | 0       | -   | 3FF   |
| 2    | 0                 | 1                   | 1   | :  | 1024 | -  | 2047 | 400     |     | 7ff   |
| 3    | _                 | 0                   | 2   | :  | 2048 | -  | 3071 | 800     | -   | BFF   |
| 4    |                   | 0                   | 3   | :  | 3072 | -  | 4095 | C00     | -   | FFF   |
|      | -                 | 0                   | 4   | :  | 4096 | -  | 5119 | 1000    | -   | 13FF  |
| 5    | 1                 | 1                   | 5   | :  | 5120 | -  | 6143 | 1400    | -   | 17FF  |
| 6    | 2                 | 1                   | 6   | :  | 6144 | -  | 7167 | 1800    | -   | 1BFF  |
| 7    | -                 | 0                   | 7   | :  | 7168 |    | 8191 | 1C00    |     | 1FFF  |
|      |                   |                     | -   |    |      |    |      |         |     |       |

68

# 6.5 Virtual Memory

- □ We said earlier that effective access time (EAT) takes all levels of memory into consideration.
- Thus, virtual memory is also a factor in the calculation, and we also have to consider page table access time.
- □ Suppose a main memory access takes 200ns, the page fault rate is 1%, and it takes 10ms to load a page from disk. We have:

EAT = 0.99(200ns + 200ns) + 0.01(10ms) = 100.396us.

67

#### 6.5 Virtual Memory

- Even if we had no page faults, the EAT would be 400ns because memory is always read twice: First to access the page table, and second to load the page from memory.
- Because page tables are read constantly, it makes sense to keep them in a special cache called a *translation look-aside buffer* (TLB).
- □ TLBs are a special associative cache that stores the mapping of virtual pages to physical pages.





# 6.5 Virtual Memory

- Another approach to virtual memory is the use of segmentation.
- Instead of dividing memory into equal-sized pages, virtual address space is divided into variable-length segments, often under the control of the programmer.
- A segment is located through its entry in a segment table, which contains the segment's memory location and a bounds limit that indicates its size.
- After a page fault, the operating system searches for a location in memory large enough to hold the segment that is retrieved from disk.

71

#### 6.5 Virtual Memory

- Both paging and segmentation can cause fragmentation.
- Paging is subject to *internal* fragmentation because a process may not need the entire range of addresses contained within the page. Thus, there may be many pages containing unused fragments of memory.
- Segmentation is subject to *external* fragmentation, which occurs when contiguous chunks of memory become broken up as segments are allocated and deallocated over time.

The next slides illustrate internal and external fragmentation.







| 6.5 Virtual Memory                      |                                             |          |    |  |  |  |  |  |  |
|-----------------------------------------|---------------------------------------------|----------|----|--|--|--|--|--|--|
| Despite the fact that there are         |                                             |          |    |  |  |  |  |  |  |
| enough free bytes<br>the fourth process | P1                                          | 4К<br>8К |    |  |  |  |  |  |  |
| one of the other th                     | ₽2                                          | 12K      |    |  |  |  |  |  |  |
| frames.                                 | because there are no unallocated<br>frames. |          |    |  |  |  |  |  |  |
| This is an example                      | ☐ This is an example of <i>internal</i>     |          |    |  |  |  |  |  |  |
| fragmentation.                          | fragmentation.                              |          |    |  |  |  |  |  |  |
|                                         | P3                                          | 24K      |    |  |  |  |  |  |  |
|                                         |                                             | 28K      |    |  |  |  |  |  |  |
|                                         | P3                                          | 32K      | 76 |  |  |  |  |  |  |



| 6.5 V               | ′irtua                                                                                                               | l Me          | mory     | ,   |  |  |  |
|---------------------|----------------------------------------------------------------------------------------------------------------------|---------------|----------|-----|--|--|--|
| All of the se showr | P1<br>S1                                                                                                             | 0<br>4K<br>8K |          |     |  |  |  |
| □ Segm<br>11K o     | <ul> <li>Segment S2 of process P2 requires<br/>11K of memory, and there is only 1K<br/>free, so it waits.</li> </ul> |               |          |     |  |  |  |
|                     | P1 S1 8K                                                                                                             |               |          |     |  |  |  |
|                     |                                                                                                                      | S2            | P1<br>S3 | 24K |  |  |  |
|                     | P2                                                                                                                   | S3<br>S1      | P2<br>S1 | 28K |  |  |  |
|                     | S2 11K                                                                                                               |               |          |     |  |  |  |





# 6.6 A Real-World Example

- The Pentium architecture supports both paging and segmentation, and they can be used in various combinations including unpaged unsegmented, segmented unpaged, and unsegmented paged.
- □ The processor supports two levels of cache (L1 and L2), both having a block size of 32 bytes.
- □ The L1 cache is next to the processor, and the L2 cache sits between the processor and memory.
- The L1 cache is in two parts: and instruction cache (lcache) and a data cache (D-cache).

The next slide shows this organization schematically.



### Chapter 6 Conclusion

- Computer memory is organized in a hierarchy, with the smallest, fastest memory at the top and the largest, slowest memory at the bottom.
- Cache memory gives faster access to main memory, while virtual memory uses disk storage to give the illusion of having a large main memory.
- □ Cache maps blocks of main memory to blocks of cache memory. Virtual memory maps page frames to virtual pages.
- There are three general types of cache: Direct mapped, fully associative and set associative.
- All virtual memory must deal with fragmentation, internal for paged memory, external for segmented memory.