Virtual Memory

Provides single system call interface for many file Systems

VFS

Pointers to functions to manipulate different file systems (e.g. umount, mount). Also stores all relevant file systems data

Vnode

Pointer to inode with pointers to functions to manipulate said inode

Segmentation

Worse than paging.
Program is divided into segments (e.g. different subroutines, its stack). We store a segment table which records where they start in memory and what their limit is (how big they is). Also stores the flags (rwx) and whether they're valid.

segmentation.png

Paging

Paging (or swapping) involves loading processes into memory when they're in use, storing them when they're not and swapping them back in when they're next needed. (All whilst during execution).

page1.png

If a process > main memory, we do partial swaps (e.g. we swap out instructions we're not currently using)

  • Partition memory into equal chunks (frames)
  • Divide the process' logical addresses (virtual address space) into same sized chunks (pages)
  • Make a page table, that stores the frame number for each page and some other info

paging2.png

  • Don't have to be contiguous in physical memory
  • We store them in main memory so that access is fast!
  • Multiple pages can point to same frame
  • E.g. if two processes run the same code they share same text section and their same page table will point to same frame

paging3.png

Page Size

Page size affects a lot of things! The optimal page size depends on the process, and ideally we would support multiple (e.g. large for code, small for stack). In practice this is hard, so we probably don't.

Increasing page size:

Increases Decreases
Internal Fragmentation Page table size
TLB Hits TLB Misses
IO Throughput (less seek/rotational delays) Number of pages
Page Fault Latency (have to read a bigger section before resuming process)

2-level Page Table

To make things more efficient we have one array which just stores pointers to actual page tables. We just split the virtual address in half (the actual address part) and use the first half as the index into the first array, and the second half is the index into the actual page table.

2%20level%20page%20table.png

Processes get as many pages as there are frames, but they might not be allowed to use them all. We could decide to let them grow organically, or we might give them a fixed allocation of how many they can use. This can be bad, because we get some processes constantly overusing and some underusing.

We can either allocate them globally (take what you need from all remaining spare frames) or locally (you get an amount allocated to you based on some criteria (re-evaluated every so-often) and you take frames from your own set, not others.

Inverted Page Table (Frame Table)

We store an array indexed in by frame number, which stores the page number for the frame. IPT grows with size of rom not virtual address space (cause it stores physical frames and only the ones that are used), so is space efficient.

To access it, we 'hash' a page number in a hashtable, which gives us an index for the frame table. If IPT[hash][pid] = page number then the index is our frame number. Hurrah. If it doesn't match, we jump along to the IPT[hash][next] index, which basically points us to where a collision got stored.

Physical vs Virtual Addresses

As we know, programs have virtual addresses. The code from userland will deal with this. We have to translate this to a physical address in order to get the actual content. E.g. we translate the virtual address into a page table index, and then go to the frame number stored there.

Memory Management Unit/Translation Lookaside Buffer

Is a bit of hardware. But can be filled by either hardware or software (through a tlb miss exception).

tlb.png

Each virtual memory reference can cause two physical memory accesses

  • One to fetch the page table entry
  • One to fetch/store the data

TLB = cache that stores page numbers (or virtual addresses) and frame numbers . When we are given a vaddr, we go to the TLB (stored in main memory) and see if the matching frame number is there. If yes (TLB HIT) then yolo just go right on over there. Else, (TLB MISS) translate the thing and find the page number and store it in there.

Instead of it being 2 hits per memory reference, it's probOfTLBHit * 1 + ((1-probOfTLBHit )* 2). So at most 2, at best 1.

As each process has its own page table in a context switch we have to make sure TLB is still valid. Either we store an address space ID in the tlb entrylo and only use ones that match the id, or we just flush the tlb and delete all entries. That way we don't have anything leftover inaccurately from the last process.

C0 Registers

  • C0_EPC; address to jump back to after exception
  • C0_status; KU/IM
  • C0_cause = exception cause
  • C0_badvaddr = fault address

EntryHi/EntryLo

  • C0_index = tlb index
  • C0_entryhi = page number and address space ID (vaddr = fault on a tlb_miss)
  • C0_entrylo = the frame number and the protection bit

On-Demand Paging/Segmentation

Unused pages/segments get stored to disk until we hit a page fault and need to write them back in. It's slower (page faults = several millisecond delay) but means we can give each process less physical memory.

Dirty bit tells us if a page/segment has been modified, which is useful in determining what to replace when writing a page back into main memory.

Principle of locality = good. A program spends 90% of its time in 10% of its code; programs tend to reuse data and instructions that were just used.
Temporal locality = just accessed memory will be accessed again soon.
Spatial locality = memory addresses that are nearby physically will be accessed close together timewise.

Fetching Policies

Demand: Load pages back after a page fault (so lots of loading in at the start of a process).
Pre-paging: Good idea, hard to get right. When disk is idle we pre-fetch pages we think we'll need. It can waste bandwidth if we don't use them, and t can be bad if we replace useful pages with non-useful ones.

Replacement Policies

When we have a page fault but a full physical memory set, we have to replace a page. Optimally one that won't be used for the longest time. This is impossible, but we try to get close. But how?

  • Least Recently Used
    • Requires updating of a time stamp on every access. Hard to do efficiently, but we like to approximate.
    • Works really well if locality holds.
  • Clock
    • Cycle through and set a flag for all as 'untouched'.
    • Any time we access the frames set them to touched (e.g. we use valid bit as indicator of reference bit)
      • Can be done with hardware
    • Cycle through again and find the first 'untouched' one.
  • First-in First-out
    • Cycle through them all, by storing a queue of the order we inserted them in.
    • Super easy, but necessarily a good heuristic for usefulness.

Cleaning

It's much quicker to replace a clean page than dirty (dirty has changes, have to write it to disk can't just stop using it).

Demand Cleaning

Write out pages when we need to replace them = lots of latency

Pre-Cleaning

Write out pages in batches every so often. Has IO overlap, but means we can replace clean pages with higher likelihood.

R3000 Address Space Layout

4 segments: Kseg0, kseg1, kseg2 and user space (kuseg).

  • Kseg0 =direct translation to physical memory
    • 0x80000000 - 0x9fffffff virtual = 0x00000000 - 0x1fffffff physical
    • Kernel code lives here
    • Cacheable, no TLB
  • Kuseg
    • 4kb page size, 2gb
    • user-mode and kernel mode can access it
  • Kseg1
    • 0xa0000000 - 0xbfffffff virtual = 0x00000000 - 0x1fffffff physica
    • Not cacheable, no TLB
    • Only kernel mode
    • 512 mb
  • Kseg2
    • TLB mapped, no direct translation
    • Cacheable
    • Stores page table probably
    • 1024mb

Amdahl's Law:
Make the common case fast; enhancements are only useful if they affect a frequently occurring case.

Thrashing

Thrashing happens when we can't find a runnable process because there is insufficient memory. Happens because CPU utilization tends to increase with number of processes in the system and many process's working sets won't fit in RAM. Lots of page faults and the system halts (IO limited, decreased CPU usage).
To fix it, we just suspend some processes (dumping their pages back into storage and freeing up space) until the space issue pressure thing eases up.

Working Set

The pages/segments required by an application in a time window (delta) is called its memory working set.
Working set is an approximation of a programs’ locality

working%20set.png