File Systems

user%20vs%20hood.png

Storage Stack

storage%20stack.png
  • Application: Does a syscall (e.g. open)
  • FD Table/OF Table: Store file descriptors and open files info (e.g. file pointer and mode - flags). All the info needed to access a file or create a new one. We usually store a global array of open files (with seek positions and pointers to the file's vnode), and each process has its own array of file descriptors which point to an entry in the global array. That way forked processes share the same file status info.
  • VFS: Allows multiple FS to be accessed uniformly in the same way (e.g. multiple hard drives/etc)
  • FS: Hides the physical location of things and allows us to use file names to access things, gives us security and file directory hierarchies
  • Buffer cache/disk scheduler: Optimise file access. Cache keeps frequently accessed files in a faster memory and disk scheduler performs operations from multiple processes for fairenss and efficiency.
  • Device driver: Hides device-specific stuff, and gives us a straight linear sequence of blocks to interface with.
  • Disk Controller: Hides bad sectors and physical geometry of disk (gives us straight linear sequences too).
  • Hard Disk: Exists, stores files, etc.

Storing Files

If we store them as streams of bytes then they end up different lengths and we can't just randomly index. Tera-bad (But no wasted space).

So we store them in fixed-size chunks. Great! We can store in a file descriptor the indices for which chunks are used in it.

Access Rights

  • None
    • User can't read directory file, doesn't get to know file exists
  • Knowledge
    • User can see that file exists and who owns it
  • Executing
    • User can run file
  • Reading
    • User can read file, even copy file
  • Appending
    • User can add new content, cannot modify existing content
  • Updating
    • User can update all content - e.g. file is writeable. Can delete or modify existing and add new
  • Changing protection
  • Deleting
rwx.png D = is it a directory
Three sets of rwx;user, group, other

Why do we have different file management systems

E.g. fat32, ext3

Because different ones are optimised for different hardware (e.g. flash disks vs magnetic or memory under 2gb), and some are closed (requiring people to invent their own).

A file system must:

  • Know what blocks belong to what files, in what order
  • Know what blocks are free for allocation

Allocation strategies:

  • Contiguous
    • Easy mode, but have to know how big the file will be
    • Deleting files leaves stuff fragmented = bad
  • Dynamic
    • Allocate fix sized blocks whenever they're needed
    • No external fragmentation, but some internal (partially filled blocks)
    • Complicated to maintain record of what is where
  • Linked list
    • Each block contains a pointer to the next block in the chain.
    • Free blocks are also linked in a chain.
    • Not sequential and bad for random access
  • Table
    • Table entry contains number of next block in file, with last block and empty blocks having default values.
    • Store the table in memory = fast to access
    • Free block lookup is slow.
  • I nodes
    • Stores file attributes and a list of the addresses of each chunk of the file
    • Stores pointer to another block containing more addresses
    • Also need to store free nodes somehow, perhaps in a linked list of free blocks
    • Directory is just another kind of file, a list of directory entries. A directory entry contains file name, attributes, and the file i-node number

Fragmentation

  • External fragmentation: Chunks of free space are frequent but small, scattered throughout memory
    • "external to the allocated memory regions" - space exists but isn't contiguous
  • Internal fragmentation: Blocks are not filled and so lots of wasted unuseable free space
    • "internal to the allocated memory regions" - allocated memory > request memory = wasted space

Larger blocks = more internal fragmentation, but less disk management overhead and fewer IO operations (load one block from memory, not 3)