DO NOT MAIL: xv6 web pages

f53494c2 · rsc · ee3f75f2 · f53494c2 · f53494c2 · f53494c2
--- a/web/Makefile
+++ b/web/Makefile
+index.html: index.txt mkhtml
+	mkhtml index.txt >_$@ && mv _$@ $@
--- a/web/index.html
+++ b/web/index.html
--- a/web/index.txt
+++ b/web/index.txt
--- a/web/l-bugs.html
+++ b/web/l-bugs.html
+<title>OS Bugs</title>
+<html>
+<head>
+</head>
+<body>
+<h1>OS Bugs</h1>
+<p>Required reading: Bugs as deviant behavior
+<h2>Overview</h2>
+<p>Operating systems must obey many rules for correctness and
+performance.  Examples rules:
+<ul>
+<li>Do not call blocking functions with interrupts disabled or spin
+lock held
+<li>check for NULL results
+<li>Do not allocate large stack variables
+<li>Do no re-use already-allocated memory
+<li>Check user pointers before using them in kernel mode
+<li>Release acquired locks
+</ul>
+<p>In addition, there are standard software engineering rules, like
+use function results in consistent ways.
+<p>These rules are typically not checked by a compiler, even though
+they could be checked by a compiler, in principle.  The goal of the
+meta-level compilation project is to allow system implementors to
+write system-specific compiler extensions that check the source code
+for rule violations.
+<p>The results are good: many new bugs found (500-1000) in Linux
+alone.  The paper for today studies these bugs and attempts to draw
+lessons from these bugs.
+<p>Are kernel error worse than user-level errors?  That is, if we get
+the kernel correct, then we won't have system crashes?
+<h2>Errors in JOS kernel</h2>
+<p>What are unstated invariants in the JOS?
+<ul>
+<li>Interrupts are disabled in kernel mode
+<li>Only env 1 has access to disk
+<li>All registers are saved & restored on context switch
+<li>Application code is never executed with CPL 0
+<li>Don't allocate an already-allocated physical page
+<li>Propagate error messages to user applications (e.g., out of
+resources)
+<li>Map pipe before fd
+<li>Unmap fd before pipe
+<li>A spawned program should have open only file descriptors 0, 1, and 2.
+<li>Pass sometimes size in bytes and sometimes in block number to a
+given file system function.
+<li>User pointers should be run through TRUP before used by the kernel
+</ul>
+<p>Could these errors have been caught by metacompilation?  Would
+metacompilation have caught the pipe race condition? (Probably not,
+it happens in only one place.)
+<p>How confident are you that your code is correct?  For example,
+are you sure interrupts are always disabled in kernel mode?  How would
+you test?
+<h2>Metacompilation</h2>
+<p>A system programmer writes the rule checkers in a high-level,
+state-machine language (metal).  These checkers are dynamically linked
+into an extensible version of g++, xg++.  Xg++ applies the rule
+checkers to every possible execution path of a function that is being
+compiled.
+<p>An example rule from
+the <a
+href="http://www.stanford.edu/~engler/exe-ccs-06.pdf">OSDI
+paper</a>:
+<pre>
+sm check_interrupts {
+   decl { unsigned} flags;
+   pat enable = { sti(); } | {restore_flags(flags);} ;
+   pat disable = { cli(); };
+   is_enabled: disable ==> is_disabled | enable ==> { err("double
+      enable")};
+   ...
+</pre>
+A more complete version found 82 errors in the Linux 2.3.99 kernel.
+<p>Common mistake:
+<pre>
+get_free_buffer ( ... ) {
+   ....
+   save_flags (flags);
+   cli ();
+   if ((bh = sh->buffer_pool) == NULL)
+      return NULL;
+   ....
+}
+</pre>
+<p>(Figure 2 also lists a simple metarule.)
+<p>Some checkers produce false positives, because of limitations of
+both static analysis and the checkers, which mostly use local
+analysis.
+<p>How does the <b>block</b> checker work?  The first pass is a rule
+that marks functions as potentially blocking.  After processing a
+function, the checker emits the function's flow graph to a file
+(including, annotations and functions called). The second pass takes
+the merged flow graph of all function calls, and produces a file with
+all functions that have a path in the control-flow-graph to a blocking
+function call.  For the Linux kernel this results in 3,000 functions
+that potentially could call sleep.  Yet another checker like
+check_interrupts checks if a function calls any of the 3,000 functions
+with interrupts disabled. Etc.
+<h2>This paper</h2>
+<p>Writing rules is painful. First, you have to write them.  Second,
+how do you decide what to check?  Was it easy to enumerate all
+conventions for JOS?
+<p>Insight: infer programmer "beliefs" from code and cross-check
+for contradictions.  If <i>cli</i> is always followed by <i>sti</i>,
+except in one case, perhaps something is wrong.  This simplifies
+life because we can write generic checkers instead of checkers
+that specifically check for <i>sti</i>, and perhaps we get lucky
+and find other temporal ordering conventions.
+<p>Do we know which case is wrong?  The 999 times or the 1 time that
+<i>sti</i> is absent?  (No, this method cannot figure what the correct
+sequence is but it can flag that something is weird, which in practice
+useful.)  The method just detects inconsistencies.
+<p>Is every inconsistency an error?  No, some inconsistency don't
+indicate an error.  If a call to function <i>f</i> is often followed
+by call to function <i>g</i>, does that imply that f should always be
+followed by g?  (No!)
+<p>Solution: MUST beliefs and MAYBE beliefs.  MUST beliefs are
+invariants that must hold; any inconsistency indicates an error.  If a
+pointer is dereferences, then the programmer MUST believe that the
+pointer is pointing to something that can be dereferenced (i.e., the
+pointer is definitely not zero).  MUST beliefs can be checked using
+"internal inconsistencies".
+<p>An aside, can zero pointers pointers be detected during runtime?
+(Sure, unmap the page at address zero.)  Why is metacompilation still
+valuable?  (At runtime you will find only the null pointers that your
+test code dereferenced; not all possible dereferences of null
+pointers.)  An even more convincing example for Metacompilation is
+tracking user pointers that the kernel dereferences.  (Is this a MUST
+belief?)
+<p>MAYBE beliefs are invariants that are suggested by the code, but
+they maybe coincidences.  MAYBE beliefs are ranked by statistical
+analysis, and perhaps augmented with input about functions names
+(e.g., alloc and free are important).  Is it computationally feasible
+to check every MAYBE belief?  Could there be much noise?
+<p>What errors won't this approach catch?
+<h2>Paper discussion</h2>
+<p>This paper is best discussed by studying every code fragment.  Most
+code fragments are pieces of code from Linux distributions; these
+mistakes are real!
+<p>Section 3.1.  what is the error?  how does metacompilation catch
+it?
+<p>Figure 1.  what is the error? is there one?
+<p>Code fragments from 6.1.  what is the error?  how does metacompilation catch
+it?
+<p>Figure 3.  what is the error?  how does metacompilation catch
+it?
+<p>Section 8.3.  what is the error?  how does metacompilation catch
+it?
+</body>
--- a/web/l-coordination.html
+++ b/web/l-coordination.html
+<title>L9</title>
+<html>
+<head>
+</head>
+<body>
+<h1>Coordination and more processes</h1>
+<p>Required reading: remainder of proc.c, sys_exec, sys_sbrk,
+  sys_wait, sys_exit, and sys_kill.
+<h2>Overview</h2>
+<p>Big picture: more programs than processors.  How to share the
+  limited number of processors among the programs?  Last lecture
+  covered basic mechanism: threads and the distinction between process
+  and thread.  Today expand: how to coordinate the interactions
+  between threads explicitly, and some operations on processes.
+<p>Sequence coordination.  This is a diferrent type of coordination
+  than mutual-exclusion coordination (which has its goal to make
+  atomic actions so that threads don't interfere).  The goal of
+  sequence coordination is for threads to coordinate the sequences in
+  which they run.  
+<p>For example, a thread may want to wait until another thread
+  terminates. One way to do so is to have the thread run periodically,
+  let it check if the other thread terminated, and if not give up the
+  processor again.  This is wasteful, especially if there are many
+  threads. 
+<p>With primitives for sequence coordination one can do better.  The
+  thread could tell the thread manager that it is waiting for an event
+  (e.g., another thread terminating).  When the other thread
+  terminates, it explicitly wakes up the waiting thread.  This is more
+  work for the programmer, but more efficient.
+<p>Sequence coordination often interacts with mutual-exclusion
+  coordination, as we will see below.
+<p>The operating system literature has a rich set of primivites for
+  sequence coordination.  We study a very simple version of condition
+  variables in xv6: sleep and wakeup, with a single lock.
+<h2>xv6 code examples</h2>
+<h3>Sleep and wakeup - usage</h3>
+Let's consider implementing a producer/consumer queue
+(like a pipe) that can be used to hold a single non-null char pointer:
+<pre>
+struct pcq {
+    void *ptr;
+};
+void*
+pcqread(struct pcq *q)
+{
+    void *p;
+    while((p = q-&gt;ptr) == 0)
+        ;
+    q-&gt;ptr = 0;
+    return p;
+}
+void
+pcqwrite(struct pcq *q, void *p)
+{
+    while(q-&gt;ptr != 0)
+        ;
+    q-&gt;ptr = p;
+}
+</pre>
+<p>Easy and correct, at least assuming there is at most one
+reader and at most one writer at a time.
+<p>Unfortunately, the while loops are inefficient.
+Instead of polling, it would be great if there were
+primitives saying ``wait for some event to happen''
+and ``this event happened''.
+That's what sleep and wakeup do.
+<p>Second try:
+<pre>
+void*
+pcqread(struct pcq *q)
+{
+    void *p;
+    if(q-&gt;ptr == 0)
+        sleep(q);
+    p = q-&gt;ptr;
+    q-&gt;ptr = 0;
+    wakeup(q);  /* wake pcqwrite */
+    return p;
+}
+void
+pcqwrite(struct pcq *q, void *p)
+{
+    if(q-&gt;ptr != 0)
+        sleep(q);
+    q-&gt;ptr = p;
+    wakeup(q);  /* wake pcqread */
+    return p;
+}
+</pre>
+That's better, but there is still a problem.
+What if the wakeup happens between the check in the if
+and the call to sleep?
+<p>Add locks:
+<pre>
+struct pcq {
+    void *ptr;
+    struct spinlock lock;
+};
+void*
+pcqread(struct pcq *q)
+{
+    void *p;
+    acquire(&amp;q->lock);
+    if(q-&gt;ptr == 0)
+        sleep(q, &amp;q->lock);
+    p = q-&gt;ptr;
+    q-&gt;ptr = 0;
+    wakeup(q);  /* wake pcqwrite */
+    release(&amp;q->lock);
+    return p;
+}
+void
+pcqwrite(struct pcq *q, void *p)
+{
+    acquire(&amp;q->lock);
+    if(q-&gt;ptr != 0)
+        sleep(q, &amp;q->lock);
+    q-&gt;ptr = p;
+    wakeup(q);  /* wake pcqread */
+    release(&amp;q->lock);
+    return p;
+}
+</pre>
+This is okay, and now safer for multiple readers and writers,
+except that wakeup wakes up everyone who is asleep on chan,
+not just one guy.
+So some of the guys who wake up from sleep might not
+be cleared to read or write from the queue.  Have to go back to looping:
+<pre>
+struct pcq {
+    void *ptr;
+    struct spinlock lock;
+};
+void*
+pcqread(struct pcq *q)
+{
+    void *p;
+    acquire(&amp;q->lock);
+    while(q-&gt;ptr == 0)
+        sleep(q, &amp;q->lock);
+    p = q-&gt;ptr;
+    q-&gt;ptr = 0;
+    wakeup(q);  /* wake pcqwrite */
+    release(&amp;q->lock);
+    return p;
+}
+void
+pcqwrite(struct pcq *q, void *p)
+{
+    acquire(&amp;q->lock);
+    while(q-&gt;ptr != 0)
+        sleep(q, &amp;q->lock);
+    q-&gt;ptr = p;
+    wakeup(q);  /* wake pcqread */
+    release(&amp;q->lock);
+    return p;
+}
+</pre>
+The difference between this an our original is that
+the body of the while loop is a much more efficient way to pause.
+<p>Now we've figured out how to use it, but we
+still need to figure out how to implement it.
+<h3>Sleep and wakeup - implementation</h3>
+<p>
+Simple implementation:
+<pre>
+void
+sleep(void *chan, struct spinlock *lk)
+{
+    struct proc *p = curproc[cpu()];
+    release(lk);
+    p-&gt;chan = chan;
+    p-&gt;state = SLEEPING;
+    sched();
+}
+void
+wakeup(void *chan)
+{
+    for(each proc p) {
+        if(p-&gt;state == SLEEPING &amp;&amp; p-&gt;chan == chan)
+            p-&gt;state = RUNNABLE;
+    }	
+}
+</pre>
+<p>What's wrong?  What if the wakeup runs right after
+the release(lk) in sleep?
+It still misses the sleep.
+<p>Move the lock down:
+<pre>
+void
+sleep(void *chan, struct spinlock *lk)
+{
+    struct proc *p = curproc[cpu()];
+    p-&gt;chan = chan;
+    p-&gt;state = SLEEPING;
+    release(lk);
+    sched();
+}
+void
+wakeup(void *chan)
+{
+    for(each proc p) {
+        if(p-&gt;state == SLEEPING &amp;&amp; p-&gt;chan == chan)
+            p-&gt;state = RUNNABLE;
+    }	
+}
+</pre>
+<p>This almost works.  Recall from last lecture that we also need
+to acquire the proc_table_lock before calling sched, to
+protect p-&gt;jmpbuf.
+<pre>
+void
+sleep(void *chan, struct spinlock *lk)
+{
+    struct proc *p = curproc[cpu()];
+    p-&gt;chan = chan;
+    p-&gt;state = SLEEPING;
+    acquire(&amp;proc_table_lock);
+    release(lk);
+    sched();
+}
+</pre>
+<p>The problem is that now we're using lk to protect
+access to the p-&gt;chan and p-&gt;state variables
+but other routines besides sleep and wakeup 
+(in particular, proc_kill) will need to use them and won't
+know which lock protects them.
+So instead of protecting them with lk, let's use proc_table_lock:
+<pre>
+void
+sleep(void *chan, struct spinlock *lk)
+{
+    struct proc *p = curproc[cpu()];
+    acquire(&amp;proc_table_lock);
+    release(lk);
+    p-&gt;chan = chan;
+    p-&gt;state = SLEEPING;
+    sched();
+}
+void
+wakeup(void *chan)
+{
+    acquire(&amp;proc_table_lock);
+    for(each proc p) {
+        if(p-&gt;state == SLEEPING &amp;&amp; p-&gt;chan == chan)
+            p-&gt;state = RUNNABLE;
+    }
+    release(&amp;proc_table_lock);
+}
+</pre>
+<p>One could probably make things work with lk as above,
+but the relationship between data and locks would be 
+more complicated with no real benefit.  Xv6 takes the easy way out 
+and says that elements in the proc structure are always protected
+by proc_table_lock.
+<h3>Use example: exit and wait</h3>
+<p>If proc_wait decides there are children to be waited for,
+it calls sleep at line 2462.
+When a process exits, we proc_exit scans the process table
+to find the parent and wakes it at 2408.
+<p>Which lock protects sleep and wakeup from missing each other?
+Proc_table_lock.  Have to tweak sleep again to avoid double-acquire:
+<pre>
+if(lk != &amp;proc_table_lock) {
+    acquire(&amp;proc_table_lock);
+    release(lk);
+}
+</pre>
+<h3>New feature: kill</h3>
+<p>Proc_kill marks a process as killed (line 2371).
+When the process finally exits the kernel to user space,
+or if a clock interrupt happens while it is in user space,
+it will be destroyed (line 2886, 2890, 2912).
+<p>Why wait until the process ends up in user space?
+<p>What if the process is stuck in sleep?  It might take a long
+time to get back to user space.
+Don't want to have to wait for it, so make sleep wake up early
+(line 2373).
+<p>This means all callers of sleep should check
+whether they have been killed, but none do.
+Bug in xv6.
+<h3>System call handlers</h3>
+<p>Sheet 32
+<p>Fork: discussed copyproc in earlier lectures.
+Sys_fork (line 3218) just calls copyproc
+and marks the new proc runnable.
+Does fork create a new process or a new thread?
+Is there any shared context?
+<p>Exec: we'll talk about exec later, when we talk about file systems.
+<p>Sbrk: Saw growproc earlier.  Why setupsegs before returning?
--- a/web/l-fs.html
+++ b/web/l-fs.html
+<title>L10</title>
+<html>
+<head>
+</head>
+<body>
+<h1>File systems</h1>
+<p>Required reading: iread, iwrite, and wdir, and code related to
+  these calls in fs.c, bio.c, ide.c, file.c, and sysfile.c
+<h2>Overview</h2>
+<p>The next 3 lectures are about file systems:
+<ul>
+<li>Basic file system implementation
+<li>Naming
+<li>Performance
+</ul>
+<p>Users desire to store their data durable so that data survives when
+the user turns of his computer.  The primary media for doing so are:
+magnetic disks, flash memory, and tapes.  We focus on magnetic disks
+(e.g., through the IDE interface in xv6).
+<p>To allow users to remember where they stored a file, they can
+assign a symbolic name to a file, which appears in a directory.
+<p>The data in a file can be organized in a structured way or not.
+The structured variant is often called a database.  UNIX uses the
+unstructured variant: files are streams of bytes.  Any particular
+structure is likely to be useful to only a small class of
+applications, and other applications will have to work hard to fit
+their data into one of the pre-defined structures. Besides, if you
+want structure, you can easily write a user-mode library program that
+imposes that format on any file.  The end-to-end argument in action.
+(Databases have special requirements and support an important class of
+applications, and thus have a specialized plan.)
+<p>The API for a minimal file system consists of: open, read, write,
+seek, close, and stat.  Dup duplicates a file descriptor. For example:
+<pre>
+  fd = open("x", O_RDWR);
+  read (fd, buf, 100);
+  write (fd, buf, 512);
+  close (fd)
+</pre>
+<p>Maintaining the file offset behind the read/write interface is an
+  interesting design decision . The alternative is that the state of a
+  read operation should be maintained by the process doing the reading
+  (i.e., that the pointer should be passed as an argument to read).
+  This argument is compelling in view of the UNIX fork() semantics,
+  which clones a process which shares the file descriptors of its
+  parent. A read by the parent of a shared file descriptor (e.g.,
+  stdin, changes the read pointer seen by the child).  On the other
+  hand the alternative would make it difficult to get "(data; ls) > x"
+  right.
+<p>Unix API doesn't specify that the effects of write are immediately
+  on the disk before a write returns. It is up to the implementation
+  of the file system within certain bounds. Choices include (that
+  aren't non-exclusive):
+<ul>
+<li>At some point in the future, if the system stays up (e.g., after
+  30 seconds);
+<li>Before the write returns;
+<li>Before close returns;
+<li>User specified (e.g., before fsync returns).
+</ul>
+<p>A design issue is the semantics of a file system operation that
+  requires multiple disk writes.  In particular, what happens if the
+  logical update requires writing multiple disks blocks and the power
+  fails during the update?  For example, to create a new file,
+  requires allocating an inode (which requires updating the list of
+  free inodes on disk), writing a directory entry to record the
+  allocated i-node under the name of the new file (which may require
+  allocating a new block and updating the directory inode).  If the
+  power fails during the operation, the list of free inodes and blocks
+  may be inconsistent with the blocks and inodes in use. Again this is
+  up to implementation of the file system to keep on disk data
+  structures consistent:
+<ul>
+<li>Don't worry about it much, but use a recovery program to bring
+  file system back into a consistent state.
+<li>Journaling file system.  Never let the file system get into an
+  inconsistent state.
+</ul>
+<p>Another design issue is the semantics are of concurrent writes to
+the same data item.  What is the order of two updates that happen at
+the same time? For example, two processes open the same file and write
+to it.  Modern Unix operating systems allow the application to lock a
+file to get exclusive access.  If file locking is not used and if the
+file descriptor is shared, then the bytes of the two writes will get
+into the file in some order (this happens often for log files).  If
+the file descriptor is not shared, the end result is not defined. For
+example, one write may overwrite the other one (e.g., if they are
+writing to the same part of the file.)
+<p>An implementation issue is performance, because writing to magnetic
+disk is relatively expensive compared to computing. Three primary ways
+to improve performance are: careful file system layout that induces
+few seeks, an in-memory cache of frequently-accessed blocks, and
+overlap I/O with computation so that file operations don't have to
+wait until their completion and so that that the disk driver has more
+data to write, which allows disk scheduling. (We will talk about
+performance in detail later.)
+<h2>xv6 code examples</h2>
+<p>xv6 implements a minimal Unix file system interface. xv6 doesn't
+pay attention to file system layout. It overlaps computation and I/O,
+but doesn't do any disk scheduling.  Its cache is write-through, which
+simplifies keep on disk datastructures consistent, but is bad for
+performance.
+<p>On disk files are represented by an inode (struct dinode in fs.h),
+and blocks.  Small files have up to 12 block addresses in their inode;
+large files use files the last address in the inode as a disk address
+for a block with 128 disk addresses (512/4).  The size of a file is
+thus limited to 12 * 512 + 128*512 bytes.  What would you change to
+support larger files? (Ans: e.g., double indirect blocks.)
+<p>Directories are files with a bit of structure to them. The file
+contains of records of the type struct dirent.  The entry contains the
+name for a file (or directory) and its corresponding inode number.
+How many files can appear in a directory?
+<p>In memory files are represented by struct inode in fsvar.h. What is
+the role of the additional fields in struct inode?
+<p>What is xv6's disk layout?  How does xv6 keep track of free blocks
+  and inodes? See balloc()/bfree() and ialloc()/ifree().  Is this
+  layout a good one for performance?  What are other options?
+<p>Let's assume that an application created an empty file x with
+  contains 512 bytes, and that the application now calls read(fd, buf,
+  100), that is, it is requesting to read 100 bytes into buf.
+  Furthermore, let's assume that the inode for x is is i. Let's pick
+  up what happens by investigating readi(), line 4483.
+<ul>
+<li>4488-4492: can iread be called on other objects than files?  (Yes.
+  For example, read from the keyboard.)  Everything is a file in Unix.
+<li>4495: what does bmap do?
+<ul>
+<li>4384: what block is being read?
+</ul>
+<li>4483: what does bread do?  does bread always cause a read to disk?
+<ul>
+<li>4006: what does bget do?  it implements a simple cache of
+  recently-read disk blocks.
+<ul>
+<li>How big is the cache?  (see param.h)
+<li>3972: look if the requested block is in the cache by walking down
+  a circular list.
+<li>3977: we had a match.
+<li>3979: some other process has "locked" the block, wait until it
+  releases.  the other processes releases the block using brelse().
+Why lock a block?
+<ul>
+<li>Atomic read and update.  For example, allocating an inode: read
+  block containing inode, mark it allocated, and write it back.  This
+  operation must be atomic.
+</ul>
+<li>3982: it is ours now.
+<li>3987: it is not in the cache; we need to find a cache entry to
+  hold the block.
+<li>3987: what is the cache replacement strategy? (see also brelse())
+<li>3988: found an entry that we are going to use.
+<li>3989: mark it ours but don't mark it valid (there is no valid data
+  in the entry yet).
+</ul>
+<li>4007: if the block was in the cache and the entry has the block's
+  data, return.
+<li>4010: if the block wasn't in the cache, read it from disk. are
+  read's synchronous or asynchronous?
+<ul>
+<li>3836: a bounded buffer of outstanding disk requests.
+<li>3809: tell the disk to move arm and generate an interrupt.
+<li>3851: go to sleep and run some other process to run. time sharing
+  in action.
+<li>3792: interrupt: arm is in the right position; wakeup requester.
+<li>3856: read block from disk.
+<li>3860: remove request from bounded buffer.  wakeup processes that
+  are waiting for a slot.
+<li>3864: start next disk request, if any. xv6 can overlap I/O with
+computation.
+</ul>
+<li>4011: mark the cache entry has holding the data.
+</ul>
+<li>4498: To where is the block copied?  is dst a valid user address?
+</ul>
+<p>Now let's suppose that the process is writing 512 bytes at the end
+  of the file a. How many disk writes will happen?
+<ul>
+<li>4567: allocate a new block
+<ul>
+<li>4518: allocate a block: scan block map, and write entry
+<li>4523: How many disk operations if the process would have been appending
+  to a large file? (Answer: read indirect block, scan block map, write
+  block map.)
+</ul>
+<li>4572: read the block that the process will be writing, in case the
+  process writes only part of the block.
+<li>4574: write it. is it synchronous or asynchronous? (Ans:
+  synchronous but with timesharing.)
+</ul>
+<p>Lots of code to implement reading and writing of files. How about
+  directories?
+<ul>
+<li>4722: look for the directory, reading directory block and see if a
+  directory entry is unused (inum == 0).
+<li>4729: use it and update it.
+<li>4735: write the modified block.
+</ul>
+<p>Reading and writing of directories is trivial.
+</body>
--- a/web/l-interrupt.html
+++ b/web/l-interrupt.html
+<html>
+<head><title>Lecture 6: Interrupts &amp; Exceptions</title></head>
+<body>
+<h1>Interrupts &amp; Exceptions</h1>
+<p>
+Required reading: xv6 <code>trapasm.S</code>, <code>trap.c</code>, <code>syscall.c</code>, <code>usys.S</code>.
+<br>
+You will need to consult
+<a href="../readings/ia32/IA32-3.pdf">IA32 System
+Programming Guide</a> chapter 5 (skip 5.7.1, 5.8.2, 5.12.2).
+<h2>Overview</h2>
+<p>
+Big picture: kernel is trusted third-party that runs the machine.
+Only the kernel can execute privileged instructions (e.g.,
+changing MMU state).
+The processor enforces this protection through the ring bits
+in the code segment.
+If a user application needs to carry out a privileged operation
+or other kernel-only service,
+it must ask the kernel nicely.
+How can a user program change to the kernel address space?
+How can the kernel transfer to a user address space?
+What happens when a device attached to the computer 
+needs attention?
+These are the topics for today's lecture.
+<p>
+There are three kinds of events that must be handled
+by the kernel, not user programs:
+(1) a system call invoked by a user program,
+(2) an illegal instruction or other kind of bad processor state (memory fault, etc.).
+and 
+(3) an interrupt from a hardware device.
+<p>
+Although these three events are different, they all use the same
+mechanism to transfer control to the kernel.
+This mechanism consists of three steps that execute as one atomic unit.
+(a) change the processor to kernel mode;
+(b) save the old processor somewhere (usually the kernel stack);
+and (c) change the processor state to the values set up as
+the &ldquo;official kernel entry values.&rdquo;
+The exact implementation of this mechanism differs
+from processor to processor, but the idea is the same.
+<p>
+We'll work through examples of these today in lecture.
+You'll see all three in great detail in the labs as well.
+<p>
+A note on terminology: sometimes we'll
+use interrupt (or trap) to mean both interrupts and exceptions.
+<h2>
+Setting up traps on the x86
+</h2>
+<p>
+See handout Table 5-1, Figure 5-1, Figure 5-2.
+<p>
+xv6 Sheet 07: <code>struct gatedesc</code> and <code>SETGATE</code>.
+<p>
+xv6 Sheet 28: <code>tvinit</code> and <code>idtinit</code>.
+Note setting of gate for <code>T_SYSCALL</code>
+<p>
+xv6 Sheet 29: <code>vectors.pl</code> (also see generated <code>vectors.S</code>).
+<h2>
+System calls
+</h2>
+<p>
+xv6 Sheet 16: <code>init.c</code> calls <code>open("console")</code>.
+How is that implemented?
+<p>
+xv6 <code>usys.S</code> (not in book).
+(No saving of registers.  Why?)
+<p>
+Breakpoint <code>0x1b:"open"</code>,
+step past <code>int</code> instruction into kernel.
+<p>
+See handout Figure 9-4 [sic].
+<p>
+xv6 Sheet 28: in <code>vectors.S</code> briefly, then in <code>alltraps</code>.
+Step through to <code>call trap</code>, examine registers and stack.
+How will the kernel find the argument to <code>open</code>?
+<p>
+xv6 Sheet 29: <code>trap</code>, on to <code>syscall</code>.
+<p>
+xv6 Sheet 31: <code>syscall</code> looks at <code>eax</code>,
+calls <code>sys_open</code>.
+<p>
+(Briefly)
+xv6 Sheet 52: <code>sys_open</code> uses <code>argstr</code> and <code>argint</code>
+to get its arguments.  How do they work?
+<p>
+xv6 Sheet 30: <code>fetchint</code>, <code>fetcharg</code>, <code>argint</code>,
+<code>argptr</code>, <code>argstr</code>.
+<p>
+What happens if a user program divides by zero
+or accesses unmapped memory?
+Exception.  Same path as system call until <code>trap</code>.
+<p>
+What happens if kernel divides by zero or accesses unmapped memory?
+<h2>
+Interrupts
+</h2>
+<p>
+Like system calls, except:
+devices generate them at any time,
+there are no arguments in CPU registers,
+nothing to return to,
+usually can't ignore them.
+<p>
+How do they get generated?
+Device essentially phones up the 
+interrupt controller and asks to talk to the CPU.
+Interrupt controller then buzzes the CPU and
+tells it, &ldquo;keyboard on line 1.&rdquo;
+Interrupt controller is essentially the CPU's
+<strike>secretary</strike> administrative assistant,
+managing the phone lines on the CPU's behalf.
+<p>
+Have to set up interrupt controller.
+<p>
+(Briefly) xv6 Sheet 63: <code>pic_init</code> sets up the interrupt controller,
+<code>irq_enable</code> tells the interrupt controller to let the given
+interrupt through. 
+<p>
+(Briefly) xv6 Sheet 68: <code>pit8253_init</code> sets up the clock chip,
+telling it to interrupt on <code>IRQ_TIMER</code> 100 times/second.
+<code>console_init</code> sets up the keyboard, enabling <code>IRQ_KBD</code>.
+<p>
+In Bochs, set breakpoint at 0x8:"vector0"
+and continue, loading kernel.
+Step through clock interrupt, look at 
+stack, registers.
+<p>
+Was the processor executing in kernel or user mode
+at the time of the clock interrupt?
+Why?  (Have any user-space instructions executed at all?)
+<p>
+Can the kernel get an interrupt at any time?
+Why or why not?  <code>cli</code> and <code>sti</code>,
+<code>irq_enable</code>.
+</body>
+</html>
--- a/web/l-lock.html
+++ b/web/l-lock.html
--- a/web/l-mkernel.html
+++ b/web/l-mkernel.html
--- a/web/l-name.html
+++ b/web/l-name.html
+<title>L11</title>
+<html>
+<head>
+</head>
+<body>
+<h1>Naming in file systems</h1>
+<p>Required reading: nami(), and all other file system code.
+<h2>Overview</h2>
+<p>To help users to remember where they stored their data, most
+systems allow users to assign their own names to their data.
+Typically the data is organized in files and users assign names to
+files.  To deal with many files, users can organize their files in
+directories, in a hierarchical manner.  Each name is a pathname, with
+the components separated by "/".
+<p>To avoid that users have to type long abolute names (i.e., names
+starting with "/" in Unix), users can change their working directory
+and use relative names (i.e., naming that don't start with "/").
+<p>User file namespace operations include create, mkdir, mv, ln
+(link), unlink, and chdir. (How is "mv a b" implemented in xv6?
+Answer: "link a b"; "unlink a".)  To be able to name the current
+directory and the parent directory every directory includes two
+entries "." and "..".  Files and directories can reclaimed if users
+cannot name it anymore (i.e., after the last unlink).
+<p>Recall from last lecture, all directories entries contain a name,
+followed by an inode number. The inode number names an inode of the
+file system.  How can we merge file systems from different disks into
+a single name space?
+<p>A user grafts new file systems on a name space using mount.  Umount
+removes a file system from the name space.  (In DOS, a file system is
+named by its device letter.)  Mount takes the root inode of the
+to-be-mounted file system and grafts it on the inode of the name space
+entry where the file system is mounted (e.g., /mnt/disk1). The
+in-memory inode of /mnt/disk1 records the major and minor number of
+the file system mounted on it.  When namei sees an inode on which a
+file system is mounted, it looks up the root inode of the mounted file
+system, and proceeds with that inode.
+<p>Mount is not a durable operation; it doesn't surive power failures.
+After a power failure, the system administrator must remount the file
+system (i.e., often in a startup script that is run from init).
+<p>Links are convenient, because with users can create synonyms for
+  file names.  But, it creates the potential of introducing cycles in
+  the naning tree.  For example, consider link("a/b/c", "a").  This
+  makes c a synonym for a. This cycle can complicate matters; for
+  example:
+<ul>
+<li>If a user subsequently calls unlink ("a"), then the user cannot
+  name the directory "b" and the link "c" anymore, but how can the
+  file system decide that?
+</ul>
+<p>This problem can be solved by detecting cycles.  The second problem
+  can be solved by computing with files are reacheable from "/" and
+  reclaim all the ones that aren't reacheable.  Unix takes a simpler
+  approach: avoid cycles by disallowing users to create links for
+  directories.  If there are no cycles, then reference counts can be
+  used to see if a file is still referenced. In the inode maintain a
+  field for counting references (nlink in xv6's dinode). link
+  increases the reference count, and unlink decreases the count; if
+  the count reaches zero the inode and disk blocks can be reclaimed.
+<p>How to handle symbolic links across file systems (i.e., from one
+  mounted file system to another)?  Since inodes are not unique across
+  file systems, we cannot create a link across file systems; the
+  directory entry only contains an inode number, not the inode number
+  and the name of the disk on which the inode is located.  To handle
+  this case, Unix provides a second type of link, which are called
+  soft links.
+<p>Soft links are a special file type (e.g., T_SYMLINK).  If namei
+  encounters a inode of type T_SYMLINK, it resolves the the name in
+  the symlink file to an inode, and continues from there.  With
+  symlinks one can create cycles and they can point to non-existing
+  files.
+<p>The design of the name system can have security implications. For
+  example, if you tests if a name exists, and then use the name,
+  between testing and using it an adversary can have change the
+  binding from name to object.  Such problems are called TOCTTOU.
+<p>An example of TOCTTOU is follows.  Let's say root runs a script
+  every night to remove file in /tmp.  This gets rid off the files
+  that editors might left behind, but we will never be used again. An
+  adversary can exploit this script as follows:
+<pre>
+    Root                         Attacker
+                                 mkdir ("/tmp/etc")
+				 creat ("/tmp/etc/passw")
+    readdir ("tmp");
+    lstat ("tmp/etc");
+    readdir ("tmp/etc");
+                                 rename ("tmp/etc", "/tmp/x");
+				 symlink ("etc", "/tmp/etc");
+    unlink ("tmp/etc/passwd");
+</pre>
+Lstat checks whether /tmp/etc is not symbolic link, but by the time it
+runs unlink the attacker had time to creat a symbolic link in the
+place of /tmp/etc, with a password file of the adversary's choice.
+<p>This problem could have been avoided if every user or process group
+  had its own private /tmp, or if access to the shared one was
+  mediated.
+<h2>V6 code examples</h2>
+<p> namei (sheet 46) is the core of the Unix naming system. namei can
+  be called in several ways: NAMEI_LOOKUP (resolve a name to an inode
+  and lock inode), NAMEI_CREATE (resolve a name, but lock parent
+  inode), and NAMEI_DELETE (resolve a name, lock parent inode, and
+  return offset in the directory).  The reason is that namei is
+  complicated is that we want to atomically test if a name exist and
+  remove/create it, if it does; otherwise, two concurrent processes
+  could interfere with each other and directory could end up in an
+  inconsistent state.
+<p>Let's trace open("a", O_RDWR), focussing on namei:
+<ul>
+<li>5263: we will look at creating a file in a bit.
+<li>5277: call namei with NAMEI_LOOKUP
+<li>4629: if path name start with "/", lookup root inode (1).
+<li>4632: otherwise, use inode for current working directory.
+<li>4638: consume row of "/", for example in "/////a////b"
+<li>4641: if we are done with NAMEI_LOOKUP, return inode (e.g.,
+  namei("/")).
+<li>4652: if the inode we are searching for a name isn't of type
+  directory, give up.
+<li>4657-4661: determine length of the current component of the
+  pathname we are resolving.
+<li>4663-4681: scan the directory for the component.
+<li>4682-4696: the entry wasn't found. if we are the end of the
+  pathname and NAMEI_CREATE is set, lock parent directory and return a
+  pointer to the start of the component.  In all other case, unlock
+  inode of directory, and return 0.
+<li>4701: if NAMEI_DELETE is set, return locked parent inode and the
+  offset of the to-be-deleted component in the directory.
+<li>4707: lookup inode of the component, and go to the top of the loop.
+</ul>
+<p>Now let's look at creating a file in a directory:
+<ul>
+<li>5264: if the last component doesn't exist, but first part of the
+  pathname resolved to a directory, then dp will be 0, last will point
+  to the beginning of the last component, and ip will be the locked
+  parent directory.
+<li>5266: create an entry for last in the directory.
+<li>4772: mknod1 allocates a new named inode and adds it to an
+  existing directory.
+<li>4776: ialloc. skan inode block, find unused entry, and write
+  it. (if lucky 1 read and 1 write.)
+<li>4784: fill out the inode entry, and write it. (another write)
+<li>4786: write the entry into the directory (if lucky, 1 write)
+</ul>
+</ul>
+Why must the parent directory be locked?  If two processes try to
+create the same name in the same directory, only one should succeed
+and the other one, should receive an error (file exist).
+<p>Link, unlink, chdir, mount, umount could have taken file
+descriptors instead of their path argument. In fact, this would get
+rid of some possible race conditions (some of which have security
+implications, TOCTTOU). However, this would require that the current
+working directory be remembered by the process, and UNIX didn't have
+good ways of maintaining static state shared among all processes
+belonging to a given user. The easiest way is to create shared state
+is to place it in the kernel.
+<p>We have one piece of code in xv6 that we haven't studied: exec.
+  With all the ground work we have done this code can be easily
+  understood (see sheet 54).
+</body>
--- a/web/l-okws.txt
+++ b/web/l-okws.txt
+Security
+-------------------
+I. 2 Intro Examples
+II. Security Overview
+III. Server Security: Offense + Defense
+IV. Unix Security + POLP
+V. Example: OKWS
+VI. How to Build a Website
+I. Intro Examples
+--------------------
+1. Apache + OpenSSL 0.9.6a (CAN 2002-0656)
+  - SSL = More security!
+  unsigned int j;
+  p=(unsigned char *)s->init_buf->data;
+  j= *(p++);
+  s->session->session_id_length=j;
+  memcpy(s->session->session_id,p,j);
+ - the result: an Apache worm
+2. SparkNotes.com 2000:
+ - New profile feature that displays "public" information about users
+   but bug that made e-mail addresses "public" by default.
+ - New program for getting that data:
+     http://www.sparknotes.com/getprofile.cgi?id=1343
+II. Security Overview
+----------------------
+What Is Security?
+ - Protecting your system from attack.
+ What's an attack?
+ - Stealing data
+ - Corrupting data
+ - Controlling resources
+ - DOS
+ Why attack?
+ - Money
+ - Blackmail / extortion
+ - Vendetta
+ - intellectual curiosity
+ - fame
+Security is a Big topic
+ - Server security -- today's focus.  There's some machine sitting on the
+   Internet somewhere, with a certain interface exposed, and attackers
+   want to circumvent it.
+    - Why should you trust your software?
+ - Client security
+    - Clients are usually servers, so they have many of the same issues.
+    - Slight simplification: people across the network cannot typically
+      initiate connections.
+    - Has a "fallible operator":
+        - Spyware
+        - Drive-by-Downloads
+ - Client security turns out to be much harder -- GUI considerations,
+   look inside the browser and the applications.
+ - Systems community can more easily handle server security.
+ - We think mainly of servers.
+III. Server Security: Offense and Defense
+-----------------------------------------
+ - Show picture of a Web site.
+ Attacks			|	Defense
+----------------------------------------------------------------------------
+ 1. Break into DB from net	| 1. FW it off
+ 2. Break into WS on telnet	| 2. FW it off
+ 3. Buffer overrun in Apache	| 3. Patch apache / use better lang?
+ 4. Buffer overrun in our code  | 4. Use better lang / isolate it
+ 5. SQL injection		| 5. Better escaping / don't interpret code.
+ 6. Data scraping.              | 6. Use a sparse UID space.
+ 7. PW sniffing			| 7.  ???
+ 8. Fetch /etc/passwd and crack | 8. Don't expose /etc/passwd
+     PW				|
+ 9. Root escalation from apache | 9. No setuid programs available to Apache
+10. XSS                         |10. Filter JS and input HTML code.
+11. Keystroke recorded on sys-  |11. Client security
+    admin's desktop (planetlab) |
+12. DDOS                        |12.  ???
+Summary:
+ - That we want private data to be available to right people makes
+   this problem hard in the first place. Internet servers are there
+   for a reason.
+ - Security != "just encrypt your data;" this in fact can sometimes
+   make the problem worse.
+ - Best to prevent break-ins from happening in the first place.
+ - If they do happen, want to limit their damage (POLP).
+ - Security policies are difficult to express / package up neatly.
+IV. Design According to POLP (in Unix)
+---------------------------------------
+ - Assume any piece of a system can be compromised, by either bad
+   programming or malicious attack.
+ - Try to limit the damage done by such a compromise (along the lines
+   of the 4 attack goals).
+ <Draw a picture of a server process on Unix, w/ other processes>
+What's the goal on Unix?
+ - Keep processes from communicating that don't have to:
+    - limit FS, IPC, signals, ptrace
+ - Strip away unneeded privilege 
+    - with respect to network, FS.
+ - Strip away FS access.
+How on Unix?
+ - setuid/setgid
+ - system call interposition
+ - chroot (away from setuid executables, /etc/passwd, /etc/ssh/..)
+ <show Code snippet>
+How do you write chroot'ed programs?
+ - What about shared libraries?
+ - /etc/resolv.conf?
+ - Can chroot'ed programs access the FS at all? What if they need
+   to write to the FS or read from the FS?
+ - Fd's are *capabilities*; can pass them to chroot'ed services,
+   thereby opening new files on its behalf.
+ - Unforgeable - can only get them from the kernel via open/socket, etc.
+Unix Shortcomings (round 1)
+ - It's bad to run as root!
+ - Yet, need root for:
+    - chroot
+    - setuid/setgid to a lower-privileged user
+    - create a new user ID
+ - Still no guarantee that we've cut off all channels
+    - 200 syscalls!
+    - Default is to give most/all privileges.
+ - Can "break out" of chroot jails?
+ - Can still exploit race conditions in the kernel to escalate privileges.
+Sidebar
+ - setuid / setuid misunderstanding
+ - root / root misunderstanding
+ - effective vs. real vs. saved set-user-ID
+V. OKWS
+-------
+- Taking these principles as far as possible.
+- C.f. Figure 1 From the paper..
+- Discussion of which privileges are in which processes
+<Table of how to hack, what you get, etc...>
+- Technical details: how to launch a new service
+- Within the launcher (running as root):
+<on board:>
+    // receive FDs from logger, pubd, demux
+    fork ();
+    chroot ("/var/okws/run");
+    chdir ("/coredumps/51001");
+    setgid (51001);
+    setuid (51001);
+    exec ("login", fds ... );
+- Note no chroot -- why not?
+- Once launched, how does a service get new connections?
+- Note the goal - minimum tampering with each other in the 
+  case of a compromise.
+Shortcoming of Unix (2)
+- A lot of plumbing involved with this system.  FDs flying everywhere.
+- Isolation still not fine enough.  If a service gets taken over,
+  can compromise all users of that service.
+VI. Reflections on Building Websites
+---------------------------------
+- OKWS interesting "experiment"
+- Need for speed; also, good gzip support.
+- If you need compiled code, it's a good way to go.
+- RPC-like system a must for backend communication
+- Connection-pooling for free
+Biggest difficulties:
+- Finding good C++ programmers.
+- Compile times.
+- The DB is still always the problem.
+Hard to Find good Alternatives
+- Python / Perl - you might spend a lot of time writing C code / 
+  integrating with lower level languages.
+- Have to worry about DB pooling.
+- Java -- must viable, and is getting better.  Scary you can't peer
+  inside.
+- .Net / C#-based system might be the way to go.
+=======================================================================
+Extra Material:
+Capabilities (From the Eros Paper in SOSP 1999)
+ - "Unforgeable pair made up of an object ID and a set of authorized 
+   operations (an interface) on that object."
+   - c.f. Dennis and van Horn. "Programming semantics for multiprogrammed
+     computations," Communications of the ACM 9(3):143-154, Mar 1966.
+ - Thus:
+      <object ID, set of authorized OPs on that object>
+ - Examples:
+      "Process X can write to file at inode Y"
+      "Process P can read from file at inode Z"
+ - Familiar example: Unix file descriptors
+ - Why are they secure?
+    - Capabilities are "unforgeable"
+    - Processes can get them only through authorized interfaces
+    - Capabilities are only given to processes authorized to hold them
+ - How do you get them?
+    - From the kernel (e.g., open)
+    - From other applications (e.g., FD passing)
+ - How do you use them?
+    - read (fd), write(fd).
+ - How do you revoke them once granted?
+   - In Unix, you do not.
+   - In some systems, a central authority ("reference monitor") can revoke.
+ - How do you store them persistently?
+    - Can have circular dependencies (unlike an FS).
+    - What happens when the system starts up? 
+    - Revert to checkpointed state.
+    - Often capability systems chose a single-level store.
+ - Capability systems, a historical prospective:
+    - KeyKOS, Eros, Cyotos (UP research)
+        - Never saw any applications
+    - IBM Systems (System 38, later AS/400, later 'i Series')
+        - Commercially viable
+ - Problems:
+    - All bets are off when a capability is sent to the wrong place.
+    - Firewall analogy?
--- a/web/l-plan9.html
+++ b/web/l-plan9.html
+<html>
+<head>
+<title>Plan 9</title>
+</head>
+<body>
+<h1>Plan 9</h1>
+<p>Required reading: Plan 9 from Bell Labs</p>
+<h2>Background</h2>
+<p>Had moved away from the ``one computing system'' model of
+Multics and Unix.</p>
+<p>Many computers (`workstations'), self-maintained, not a coherent whole.</p>
+<p>Pike and Thompson had been batting around ideas about a system glued together 
+by a single protocol as early as 1984.
+Various small experiments involving individual pieces (file server, OS, computer)
+tried throughout 1980s.</p>
+<p>Ordered the hardware for the ``real thing'' in beginning of 1989,
+built up WORM file server, kernel, throughout that year.</p>
+<p>Some time in early fall 1989, Pike and Thompson were
+trying to figure out a way to fit the window system in.
+On way home from dinner, both independently realized that
+needed to be able to mount a user-space file descriptor,
+not just a network address.</p>
+<p>Around Thanksgiving 1989, spent a few days rethinking the whole
+thing, added bind, new mount, flush, and spent a weekend 
+making everything work again.  The protocol at that point was
+essentially identical to the 9P in the paper.</p>
+<p>In May 1990, tried to use system as self-hosting.
+File server kept breaking, had to keep rewriting window system.
+Dozen or so users by then, mostly using terminal windows to
+connect to Unix.</p>
+<p>Paper written and submitted to UKUUG in July 1990.</p>
+<p>Because it was an entirely new system, could take the
+time to fix problems as they arose, <i>in the right place</i>.</p>
+<h2>Design Principles</h2>
+<p>Three design principles:</p>
+<p>
+1. Everything is a file.<br>
+2. There is a standard protocol for accessing files.<br>
+3. Private, malleable name spaces (bind, mount).
+</p>
+<h3>Everything is a file.</h3>
+<p>Everything is a file (more everything than Unix: networks, graphics).</p>
+<pre>
+% ls -l /net
+% lp /dev/screen
+% cat /mnt/wsys/1/text
+</pre>
+<h3>Standard protocol for accessing files</h3>
+<p>9P is the only protocol the kernel knows: other protocols
+(NFS, disk file systems, etc.) are provided by user-level translators.</p>
+<p>Only one protocol, so easy to write filters and other
+converters.  <i>Iostats</i> puts itself between the kernel
+and a command.</p>
+<pre>
+% iostats -xvdfdf /bin/ls
+</pre>
+<h3>Private, malleable name spaces</h3>
+<p>Each process has its own private name space that it
+can customize at will.  
+(Full disclosure: can arrange groups of
+processes to run in a shared name space.  Otherwise how do
+you implement <i>mount</i> and <i>bind</i>?)</p>
+<p><i>Iostats</i> remounts the root of the name space
+with its own filter service.</p>
+<p>The window system mounts a file system that it serves
+on <tt>/mnt/wsys</tt>.</p>
+<p>The network is actually a kernel device (no 9P involved)
+but it still serves a file interface that other programs
+use to access the network.
+Easy to move out to user space (or replace) if necessary:
+<i>import</i> network from another machine.</p>
+<h3>Implications</h3>
+<p>Everything is a file + can share files =&gt; can share everything.</p>
+<p>Per-process name spaces help move toward ``each process has its own
+private machine.''</p>
+<p>One protocol: easy to build custom filters to add functionality
+(e.g., reestablishing broken network connections).
+<h3>File representation for networks, graphics, etc.</h3>
+<p>Unix sockets are file descriptors, but you can't use the
+usual file operations on them.  Also far too much detail that
+the user doesn't care about.</p>
+<p>In Plan 9: 
+<pre>dial("tcp!plan9.bell-labs.com!http");
+</pre>
+(Protocol-independent!)</p>
+<p>Dial more or less does:<br>
+write to /net/cs: tcp!plan9.bell-labs.com!http
+read back: /net/tcp/clone 204.178.31.2!80
+write to /net/tcp/clone: connect 204.178.31.2!80
+read connection number: 4
+open /net/tcp/4/data
+</p>
+<p>Details don't really matter.  Two important points:
+protocol-independent, and ordinary file operations
+(open, read, write).</p>
+<p>Networks can be shared just like any other files.</p>
+<p>Similar story for graphics, other resources.</p>
+<h2>Conventions</h2>
+<p>Per-process name spaces mean that even full path names are ambiguous
+(<tt>/bin/cat</tt> means different things on different machines,
+or even for different users).</p>
+<p><i>Convention</i> binds everything together.  
+On a 386, <tt>bind /386/bin /bin</tt>.
+<p>In Plan 9, always know where the resource <i>should</i> be
+(e.g., <tt>/net</tt>, <tt>/dev</tt>, <tt>/proc</tt>, etc.),
+but not which one is there.</p>
+<p>Can break conventions: on a 386, <tt>bind /alpha/bin /bin</tt>, just won't
+have usable binaries in <tt>/bin</tt> anymore.</p>
+<p>Object-oriented in the sense of having objects (files) that all
+present the same interface and can be substituted for one another
+to arrange the system in different ways.</p>
+<p>Very little ``type-checking'': <tt>bind /net /proc; ps</tt>.
+Great benefit (generality) but must be careful (no safety nets).</p>
+<h2>Other Contributions</h2>
+<h3>Portability</h3>
+<p>Plan 9 still is the most portable operating system.
+Not much machine-dependent code, no fancy features
+tied to one machine's MMU, multiprocessor from the start (1989).</p>
+<p>Many other systems are still struggling with converting to SMPs.</p>
+<p>Has run on MIPS, Motorola 68000, Nextstation, Sparc, x86, PowerPC, Alpha, others.</p>
+<p>All the world is not an x86.</p>
+<h3>Alef</h3>
+<p>New programming language: convenient, but difficult to maintain.
+Retired when author (Winterbottom) stopped working on Plan 9.</p>
+<p>Good ideas transferred to C library plus conventions.</p>
+<p>All the world is not C.</p>
+<h3>UTF-8</h3>
+<p>Thompson invented UTF-8.  Pike and Thompson
+converted Plan 9 to use it over the first weekend of September 1992,
+in time for X/Open to choose it as the Unicode standard byte format
+at a meeting the next week.</p>
+<p>UTF-8 is now the standard character encoding for Unicode on
+all systems and interoperating between systems.</p>
+<h3>Simple, easy to modify base for experiments</h3>
+<p>Whole system source code is available, simple, easy to 
+understand and change.
+There's a reason it only took a couple days to convert to UTF-8.</p>
+<pre>
+  49343  file server kernel
+ 181611  main kernel
+  78521    ipaq port (small kernel)
+  20027      TCP/IP stack
+  15365      ipaq-specific code
+  43129      portable code
+1326778  total lines of source code
+</pre>
+<h3>Dump file system</h3>
+<p>Snapshot idea might well have been ``in the air'' at the time.
+(<tt>OldFiles</tt> in AFS appears to be independently derived,
+use of WORM media was common research topic.)</p>
+<h3>Generalized Fork</h3>
+<p>Picked up by other systems: FreeBSD, Linux.</p>
+<h3>Authentication</h3>
+<p>No global super-user.
+Newer, more Plan 9-like authentication described in later paper.</p>
+<h3>New Compilers</h3>
+<p>Much faster than gcc, simpler.</p>
+<p>8s to build acme for Linux using gcc; 1s to build acme for Plan 9 using 8c (but running on Linux)</p>
+<h3>IL Protocol</h3>
+<p>Now retired.  
+For better or worse, TCP has all the installed base.
+IL didn't work very well on asymmetric or high-latency links
+(e.g., cable modems).</p>
+<h2>Idea propagation</h2>
+<p>Many ideas have propagated out to varying degrees.</p>
+<p>Linux even has bind and user-level file servers now (FUSE),
+but still not per-process name spaces.</p>
+</body>
--- a/web/l-scalablecoord.html
+++ b/web/l-scalablecoord.html
+<title>Scalable coordination</title>
+<html>
+<head>
+</head>
+<body>
+<h1>Scalable coordination</h1>
+<p>Required reading: Mellor-Crummey and Scott, Algorithms for Scalable
+  Synchronization on Shared-Memory Multiprocessors, TOCS, Feb 1991.
+<h2>Overview</h2>
+<p>Shared memory machines are bunch of CPUs, sharing physical memory.
+Typically each processor also mantains a cache (for performance),
+which introduces the problem of keep caches coherent.  If processor 1
+writes a memory location whose value processor 2 has cached, then
+processor 2's cache must be updated in some way.  How?
+<ul>
+<li>Bus-based schemes.  Any CPU can access "dance with" any memory
+equally ("dance hall arch"). Use "Snoopy" protocols: Each CPU's cache
+listens to the memory bus. With write-through architecture, invalidate
+copy when see a write. Or can have "ownership" scheme with write-back
+cache (E.g., Pentium cache have MESI bits---modified, exclusive,
+shared, invalid). If E bit set, CPU caches exclusively and can do
+write back. But bus places limits on scalability.
+<li>More scalability w. NUMA schemes (non-uniform memory access). Each
+CPU comes with fast "close" memory. Slower to access memory that is
+stored with another processor. Use a directory to keep track of who is
+caching what.  For example, processor 0 is responsible for all memory
+starting with address "000", processor 1 is responsible for all memory
+starting with "001", etc.
+<li>COMA - cache-only memory architecture.  Each CPU has local RAM,
+treated as cache. Cache lines migrate around to different nodes based
+on access pattern. Data only lives in cache, no permanent memory
+location. (These machines aren't too popular any more.)
+</ul>
+<h2>Scalable locks</h2>
+<p>This paper is about cost and scalability of locking; what if you
+have 10 CPUs waiting for the same lock?  For example, what would
+happen if xv6 runs on an SMP with many processors?
+<p>What's the cost of a simple spinning acquire/release?  Algorithm 1
+*without* the delays, which is like xv6's implementation of acquire
+and release (xv6 uses XCHG instead of test_and_set):
+<pre>
+  each of the 10 CPUs gets the lock in turn
+  meanwhile, remaining CPUs in XCHG on lock
+  lock must be X in cache to run XCHG
+    otherwise all might read, then all might write
+  so bus is busy all the time with XCHGs!
+  can we avoid constant XCHGs while lock is held?
+</pre>
+<p>test-and-test-and-set
+<pre>
+  only run expensive TSL if not locked
+  spin on ordinary load instruction, so cache line is S
+  acquire(l)
+    while(1){
+      while(l->locked != 0) { }
+      if(TSL(&l->locked) == 0)
+        return;
+    }
+</pre>
+<p>suppose 10 CPUs are waiting, let's count cost in total bus
+  transactions
+<pre>
+  CPU1 gets lock in one cycle
+    sets lock's cache line to I in other CPUs
+  9 CPUs each use bus once in XCHG
+    then everyone has the line S, so they spin locally
+  CPU1 release the lock
+  CPU2 gets the lock in one cycle
+  8 CPUs each use bus once...
+  So 10 + 9 + 8 + ... = 50 transactions, O(n^2) in # of CPUs!
+  Look at "test-and-test-and-set" in Figure 6
+</pre>
+<p>  Can we have <i>n</i> CPUs acquire a lock in O(<i>n</i>) time?
+<p>What is the point of the exponential backoff in Algorithm 1?
+<pre>
+  Does it buy us O(n) time for n acquires?
+  Is there anything wrong with it?
+  may not be fair
+  exponential backoff may increase delay after release
+</pre>
+<p>What's the point of the ticket locks, Algorithm 2?
+<pre>
+  one interlocked instruction to get my ticket number
+  then I spin on now_serving with ordinary load
+  release() just increments now_serving
+</pre>
+<p>why is that good?
+<pre>
+  + fair
+  + no exponential backoff overshoot
+  + no spinning on 
+</pre>
+<p>but what's the cost, in bus transactions?
+<pre>
+  while lock is held, now_serving is S in all caches
+  release makes it I in all caches
+  then each waiters uses a bus transaction to get new value
+  so still O(n^2)
+</pre>
+<p>What's the point of the array-based queuing locks, Algorithm 3?
+<pre>
+    a lock has an array of "slots"
+    waiter allocates a slot, spins on that slot
+    release wakes up just next slot
+  so O(n) bus transactions to get through n waiters: good!
+  anderson lines in Figure 4 and 6 are flat-ish
+    they only go up because lock data structures protected by simpler lock
+  but O(n) space *per lock*!
+</pre>
+<p>Algorithm 5 (MCS), the new algorithm of the paper, uses
+compare_and_swap:
+<pre>
+int compare_and_swap(addr, v1, v2) {
+  int ret = 0;
+  // stop all memory activity and ignore interrupts
+  if (*addr == v1) {
+    *addr = v2;
+    ret = 1;
+  }
+  // resume other memory activity and take interrupts
+  return ret;
+}
+</pre>
+<p>What's the point of the MCS lock, Algorithm 5?
+<pre>
+  constant space per lock, rather than O(n)
+  one "qnode" per thread, used for whatever lock it's waiting for
+  lock holder's qnode points to start of list
+  lock variable points to end of list
+  acquire adds your qnode to end of list
+    then you spin on your own qnode
+  release wakes up next qnode
+</pre>
+<h2>Wait-free or non-blocking data structures</h2>
+<p>The previous implementations all block threads when there is
+  contention for a lock.  Other atomic hardware operations allows one
+  to build implementation wait-free data structures.  For example, one
+  can make an insert of an element in a shared list that don't block a
+  thread.  Such versions are called wait free. 
+<p>A linked list with locks is as follows:
+<pre>
+Lock list_lock;
+insert(int x) {
+  element *n = new Element;
+  n->x = x;
+  acquire(&list_lock);
+  n->next = list;
+  list = n;
+  release(&list_lock);
+}
+</pre>
+<p>A wait-free implementation is as follows:
+<pre>
+insert (int x) {
+  element *n = new Element;
+  n->x = x;
+  do {
+     n->next = list;
+  } while (compare_and_swap (&list, n->next, n) == 0);
+}
+</pre>
+<p>How many bus transactions with 10 CPUs inserting one element in the
+list? Could you do better?
+<p><a href="http://www.cl.cam.ac.uk/netos/papers/2007-cpwl.pdf">This
+ paper by Fraser and Harris</a> compares lock-based implementations
+ versus corresponding non-blocking implementations of a number of data
+ structures.
+<p>It is not possible to make every operation wait-free, and there are
+  times we will need an implementation of acquire and release.
+  research on non-blocking data structures is active; the last word
+  isn't said on this topic yet.
+</body>
--- a/web/l-schedule.html
+++ b/web/l-schedule.html
--- a/web/l-threads.html
+++ b/web/l-threads.html
--- a/web/l-vm.html
+++ b/web/l-vm.html
--- a/web/l-xfi.html
+++ b/web/l-xfi.html
+<html>
+<head>
+<title>XFI</title>
+</head>
+<body>
+<h1>XFI</h1>
+<p>Required reading: XFI: software guards for system address spaces.
+<h2>Introduction</h2>
+<p>Problem: how to use untrusted code (an "extension") in a trusted
+program?
+<ul>
+<li>Use untrusted jpeg codec in Web browser
+<li>Use an untrusted driver in the kernel
+</ul>
+<p>What are the dangers?
+<ul>
+<li>No fault isolations: extension modifies trusted code unintentionally
+<li>No protection: extension causes a security hole
+<ul>
+<li>Extension has a buffer overrun problem
+<li>Extension calls trusted program's functions
+<li>Extensions calls a trusted program's functions that is allowed to
+  call, but supplies "bad" arguments
+<li>Extensions calls privileged hardware instructions (when extending
+  kernel)
+<li>Extensions reads data out of trusted program it shouldn't.
+</ul>
+</ul>
+<p>Possible solutions approaches:
+<ul>
+<li>Run extension in its own address space with minimal
+  privileges. Rely on hardware and operating system protection
+  mechanism.
+<li>Restrict the language in which the extension is written:
+<ul>
+<li>Packet filter language.  Language is limited in its capabilities,
+  and it easy to guarantee "safe" execution.
+<li>Type-safe language. Language runtime and compiler guarantee "safe"
+execution.
+</ul>
+<li>Software-based sandboxing.
+</ul>
+<h2>Software-based sandboxing</h2>
+<p>Sandboxer. A compiler or binary-rewriter sandboxes all unsafe
+  instructions in an extension by inserting additional instructions.
+  For example, every indirect store is preceded by a few instructions
+  that compute and check the target of the store at runtime.
+<p>Verifier. When the extension is loaded in the trusted program, the
+  verifier checks if the extension is appropriately sandboxed (e.g.,
+  are all indirect stores sandboxed? does it call any privileged
+  instructions?).  If not, the extension is rejected. If yes, the
+  extension is loaded, and can run.  If the extension runs, the
+  instruction that sandbox unsafe instructions check if the unsafe
+  instruction is used in a safe way.
+<p>The verifier must be trusted, but the sandboxer doesn't.  We can do
+  without the verifier, if the trusted program can establish that the
+  extension has been sandboxed by a trusted sandboxer.
+<p>The paper refers to this setup as instance of proof-carrying code.
+<h2>Software fault isolation</h2>
+<p><a href="http://citeseer.ist.psu.edu/wahbe93efficient.html">SFI</a>
+by Wahbe et al. explored out to use sandboxing for fault isolation
+extensions; that is, use sandboxing to control that stores and jump
+stay within a specified memory range (i.e., they don't overwrite and
+jump into addresses in the trusted program unchecked).  They
+implemented SFI for a RISC processor, which simplify things since
+memory can be written only by store instructions (other instructions
+modify registers).  In addition, they assumed that there were plenty
+of registers, so that they can dedicate a few for sandboxing code.
+<p>The extension is loaded into a specific range (called a segment)
+  within the trusted application's address space.  The segment is
+  identified by the upper bits of the addresses in the
+  segment. Separate code and data segments are necessary to prevent an
+  extension overwriting its code.
+<p>An unsafe instruction on the MIPS is an instruction that jumps or
+  stores to an address that cannot be statically verified to be within
+  the correct segment.  Most control transfer operations, such
+  program-counter relative can be statically verified.  Stores to
+  static variables often use an immediate addressing mode and can be
+  statically verified.  Indirect jumps and indirect stores are unsafe.
+<p>To sandbox those instructions the sandboxer could generate the
+  following code for each unsafe instruction:
+<pre>
+  DR0 <- target address
+  R0 <- DR0 >> shift-register;  // load in R0 segment id of target
+  CMP R0, segment-register;     // compare to segment id to segment's ID
+  BNE fault-isolation-error     // if not equal, branch to trusted error code
+  STORE using DR0
+</pre>
+In this code, DR0, shift-register, and segment register
+are <i>dedicated</i>: they cannot be used by the extension code.  The
+verifier must check if the extension doesn't use they registers.  R0
+is a scratch register, but doesn't have to be dedicated.  The
+dedicated registers are necessary, because otherwise extension could
+load DR0 and jump to the STORE instruction directly, skipping the
+check.
+<p>This implementation costs 4 registers, and 4 additional instructions
+  for each unsafe instruction. One could do better, however:
+<pre>
+  DR0 <- target address & and-mask-register // mask segment ID from target
+  DR0 <- DR0 | segment register // insert this segment's ID
+  STORE using DR0
+</pre>
+This code just sets the write segment ID bits.  It doesn't catch
+illegal addresses; it just ensures that illegal addresses are within
+the segment, harming the extension but no other code.  Even if the
+extension jumps to the second instruction of this sandbox sequence,
+nothing bad will happen (because DR0 will already contain the correct
+segment ID).
+<p>Optimizations include: 
+<ul>
+<li>use guard zones for <i>store value, offset(reg)</i>
+<li>treat SP as dedicated register (sandbox code that initializes it)
+<li>etc.
+</ul>
+<h2>XFI</h2>
+<p>XFI extends SFI in several ways:
+<ul>
+<li>Handles fault isolation and protection
+<li>Uses control-folow integrity (CFI) to get good performance
+<li>Doesn't use dedicated registers
+<li>Use two stacks (a scoped stack and an allocation stack) and only
+  allocation stack can be corrupted by buffer-overrun attacks. The
+  scoped stack cannot via computed memory references.
+<li>Uses a binary rewriter.
+<li>Works for the x86
+</ul>
+<p>x86 is challenging, because limited registers and variable length
+  of instructions. SFI technique won't work with x86 instruction
+  set. For example if the binary contains:
+<pre>
+  25 CD 80 00 00   # AND eax, 0x80CD
+</pre>
+and an adversary can arrange to jump to the second byte, then the
+adversary calls system call on Linux, which has binary the binary
+representation CD 80.  Thus, XFI must control execution flow.
+<p>XFI policy goals:
+<ul>
+<li>Memory-access constraints (like SFI)
+<li>Interface restrictions  (extension has fixed entry and exit points)
+<li>Scoped-stack integrity (calling stack is well formed)
+<li>Simplified instructions semantics (remove dangerous instructions)
+<li>System-environment integrity (ensure certain machine model
+  invariants, such as x86 flags register cannot be modified)
+<li>Control-flow integrity: execution must follow a static, expected
+  control-flow graph. (enter at beginning of basic blocks)
+<li>Program-data integrity (certain global variables in extension
+  cannot be accessed via computed memory addresses)
+</ul>
+<p>The binary rewriter inserts guards to ensure these properties.  The
+  verifier check if the appropriate guards in place.  The primary
+  mechanisms used are:
+<ul>
+<li>CFI guards on computed control-flow transfers (see figure 2)
+<li>Two stacks
+<li>Guards on computer memory accesses (see figure 3)
+<li>Module header has a section that contain access permissions for
+  region
+<li>Binary rewriter, which performs intra-procedure analysis, and
+  generates guards, code for stack use, and verification hints
+<li>Verifier checks specific conditions per basic block. hints specify
+  the verification state for the entry to each basic block, and at
+  exit of basic block the verifier checks that the final state implies
+  the verification state at entry to all possible successor basic
+  blocks. (see figure 4)
+</ul>
+<p>Can XFI protect against the attack discussed in last lecture?
+<pre>
+  unsigned int j;
+  p=(unsigned char *)s->init_buf->data;
+  j= *(p++);
+  s->session->session_id_length=j;
+  memcpy(s->session->session_id,p,j);
+</pre>
+Where will <i>j</i> be located?
+<p>How about the following one from the paper <a href="http://research.microsoft.com/users/jpincus/beyond-stack-smashing.pdf"><i>Beyond stack smashing:
+  recent advances in exploiting buffer overruns</i></a>?
+<pre>
+void f2b(void * arg, size_t len) {
+  char buf[100];
+  long val = ..;
+  long *ptr = ..;
+  extern void (*f)();
+  memcopy(buff, arg, len);
+  *ptr = val;
+  f();
+  ...
+  return;
+}
+</pre>
+What code can <i>(*f)()</i> call?  Code that the attacker inserted?
+Code in libc?
+<p>How about an attack that use <i>ptr</i> in the above code to
+  overwrite a method's address in a class's dispatch table with an
+  address of support function?
+<p>How about <a href="http://research.microsoft.com/~shuochen/papers/usenix05data_attack.pdf">data-only attacks</a>?  For example, attacker
+  overwrites <i>pw_uid</i> in the heap with 0 before the following
+  code executes (when downloading /etc/passwd and then uploading it with a
+  modified entry).
+<pre>
+FILE *getdatasock( ... ) {
+  seteuid(0);
+  setsockeope ( ...);
+  ...
+  seteuid(pw->pw_uid);
+  ...
+}
+</pre>
+<p>How much does XFI slow down applications? How many more
+  instructions are executed?  (see Tables 1-4)
+</body>
--- a/web/l1.html
+++ b/web/l1.html
+<title>L1</title>
+<html>
+<head>
+</head>
+<body>
+<h1>OS overview</h1>
+<h2>Overview</h2>
+<ul>
+<li>Goal of course:
+<ul>
+<li>Understand operating systems in detail by designing and
+implementing miminal OS
+<li>Hands-on experience with building systems  ("Applying 6.033")
+</ul>
+<li>What is an operating system?
+<ul>
+<li>a piece of software that turns the hardware into something useful
+<li>layered picture: hardware, OS, applications
+<li>Three main functions: fault isolate applications, abstract hardware, 
+manage hardware
+</ul>
+<li>Examples:
+<ul>
+<li>OS-X, Windows, Linux, *BSD, ... (desktop, server)
+<li>PalmOS Windows/CE (PDA)
+<li>Symbian, JavaOS (Cell phones)
+<li>VxWorks, pSOS (real-time)
+<li> ...
+</ul>
+<li>OS Abstractions
+<ul>
+<li>processes: fork, wait, exec, exit, kill, getpid, brk, nice, sleep,
+trace
+<li>files:  open, close, read, write, lseek, stat, sync
+<li>directories: mkdir, rmdir, link, unlink, mount, umount
+<li>users + security: chown, chmod, getuid, setuid
+<li>interprocess communication: signals, pipe
+<li>networking: socket, accept, snd, recv, connect
+<li>time: gettimeofday
+<li>terminal:
+</ul>
+<li>Sample Unix System calls (mostly POSIX)
+<ul>
+	<li> int read(int fd, void*, int)
+	<li> int write(int fd, void*, int)
+	<li> off_t lseek(int fd, off_t, int [012])
+	<li> int close(int fd)
+	<li> int fsync(int fd)
+	<li> int open(const char*, int flags [, int mode])
+	  <ul>
+		<li> O_RDONLY, O_WRONLY, O_RDWR, O_CREAT
+	  </ul>
+	<li> mode_t umask(mode_t cmask)
+	<li> int mkdir(char *path, mode_t mode);
+	<li> DIR *opendir(char *dirname)
+	<li> struct dirent *readdir(DIR *dirp)
+	<li> int closedir(DIR *dirp)
+	<li> int chdir(char *path)
+	<li> int link(char *existing, char *new)
+	<li> int unlink(char *path)
+	<li> int rename(const char*, const char*)
+	<li> int rmdir(char *path)
+	<li> int stat(char *path, struct stat *buf)
+	<li> int mknod(char *path, mode_t mode, dev_t dev)
+	<li> int fork()
+          <ul>
+		<li> returns childPID in parent, 0 in child; only
+		difference
+           </ul>
+	<li>int getpid()
+	<li> int waitpid(int pid, int* stat, int opt)
+           <ul>
+		<li> pid==-1: any; opt==0||WNOHANG
+		<li> returns pid or error
+	   </ul>
+	<li> void _exit(int status)
+	<li> int kill(int pid, int signal)
+	<li> int sigaction(int sig, struct sigaction *, struct sigaction *)
+	<li> int sleep (int sec)
+	<li> int execve(char* prog, char** argv, char** envp)
+	<li> void *sbrk(int incr)
+	<li> int dup2(int oldfd, int newfd)
+	<li> int fcntl(int fd, F_SETFD, int val)
+	<li> int pipe(int fds[2])
+	  <ul>
+		<li> writes on fds[1] will be read on fds[0]
+		<li> when last fds[1] closed, read fds[0] retursn EOF
+		<li> when last fds[0] closed, write fds[1] kills SIGPIPE/fails
+		EPIPE
+           </ul>
+	<li> int fchown(int fd, uind_t owner, gid_t group)
+	<li> int fchmod(int fd, mode_t mode)
+	<li> int socket(int domain, int type, int protocol)
+	<li> int accept(int socket_fd, struct sockaddr*, int* namelen)
+	  <ul>
+		<li> returns new fd
+          </ul>
+	<li> int listen(int fd, int backlog)
+	<li> int connect(int fd, const struct sockaddr*, int namelen)
+	<li> void* mmap(void* addr, size_t len, int prot, int flags, int fd,
+	off_t offset)
+	<li> int munmap(void* addr, size_t len)
+	<li> int gettimeofday(struct timeval*)
+</ul>
+</ul>
+<p>See the <a href="../reference.html">reference page</a> for links to
+the early Unix papers.
+<h2>Class structure</h2>
+<ul>
+<li>Lab: minimal OS for x86 in an exokernel style (50%)
+<ul>
+<li>kernel interface: hardware + protection
+<li>libOS implements fork, exec, pipe, ...
+<li>applications: file system, shell, ..
+<li>development environment: gcc, bochs
+<li>lab 1 is out
+</ul>
+<li>Lecture structure (20%)
+<ul>
+<li>homework
+<li>45min lecture
+<li>45min case study
+</ul>
+<li>Two quizzes (30%)
+<ul>
+<li>mid-term
+<li>final's exam week
+</ul>
+</ul>
+<h2>Case study: the shell (simplified)</h2>
+<ul>
+<li>interactive command execution and a programming language
+<li>Nice example that uses various OS abstractions. See  <a
+href="../readings/ritchie74unix.pdf">Unix
+paper</a> if you are unfamiliar with the shell.
+<li>Final lab is a simple shell.
+<li>Basic structure:
+<pre>
+       while (1) {
+	    printf ("$");
+	    readcommand (command, args);   // parse user input
+	    if ((pid = fork ()) == 0) {  // child?
+	       exec (command, args, 0);
+	    } else if (pid > 0) {   // parent?
+	       wait (0);   // wait for child to terminate
+	    } else {
+	       perror ("Failed to fork\n");
+            }
+        }
+</pre>
+<p>The split of creating a process with a new program in fork and exec
+is mostly a historical accident.  See the  <a
+href="../readings/ritchie79evolution.html">assigned paper</a> for today.
+<li>Example:
+<pre>
+        $ ls
+</pre>
+<li>why call "wait"?  to wait for the child to terminate and collect
+its exit status.  (if child finishes, child becomes a zombie until
+parent calls wait.)
+<li>I/O: file descriptors.  Child inherits open file descriptors
+from parent. By convention:
+<ul>
+<li>file descriptor 0 for input (e.g., keyboard). read_command: 
+<pre>
+     read (1, buf, bufsize)
+</pre>
+<li>file descriptor 1 for output (e.g., terminal)
+<pre>
+     write (1, "hello\n", strlen("hello\n")+1)
+</pre>
+<li>file descriptor 2 for error (e.g., terminal)
+</ul>
+<li>How does the shell implement:
+<pre>
+     $ls > tmp1
+</pre>
+just before exec insert:
+<pre>
+    	   close (1);
+	   fd = open ("tmp1", O_CREAT|O_WRONLY);   // fd will be 1!
+</pre>
+<p>The kernel will return the first free file descriptor, 1 in this case.
+<li>How does the shell implement sharing an output file:
+<pre>
+     $ls 2> tmp1 > tmp1
+</pre>
+replace last code with:
+<pre>
+	   close (1);
+	   close (2);
+	   fd1 = open ("tmp1", O_CREAT|O_WRONLY);   // fd will be 1!
+	   fd2 = dup (fd1);
+</pre>
+both file descriptors share offset
+<li>how do programs communicate?
+<pre>
+        $ sort file.txt | uniq | wc
+</pre>
+or
+<pre>
+	$ sort file.txt > tmp1
+	$ uniq tmp1 > tmp2
+	$ wc tmp2
+	$ rm tmp1 tmp2
+</pre>
+or 
+<pre>
+        $ kill -9
+</pre>
+<li>A pipe is an one-way communication channel.  Here is an example
+where the parent is the writer and the child is the reader:
+<pre>
+	int fdarray[2];
+	if (pipe(fdarray) < 0) panic ("error");
+	if ((pid = fork()) < 0) panic ("error");
+	else if (pid > 0) {
+	  close(fdarray[0]);
+	  write(fdarray[1], "hello world\n", 12);
+        } else {
+	  close(fdarray[1]);
+	  n = read (fdarray[0], buf, MAXBUF);
+	  write (1, buf, n);
+        }
+</pre>
+<li>How does the shell implement pipelines (i.e., cmd 1 | cmd 2 |..)?
+We want to arrange that the output of cmd 1 is the input of cmd 2.
+The way to achieve this goal is to manipulate stdout and stdin.
+<li>The shell creates processes for each command in
+the pipeline, hooks up their stdin and stdout correctly.  To do it
+correct, and waits for the last process of the
+pipeline to exit.  A sketch of the core modifications to our shell for
+setting up a pipe is:
+<pre>	    
+	    int fdarray[2];
+  	    if (pipe(fdarray) < 0) panic ("error");
+	    if ((pid = fork ()) == 0) {  child (left end of pipe)
+	       close (1);
+	       tmp = dup (fdarray[1]);   // fdarray[1] is the write end, tmp will be 1
+	       close (fdarray[0]);       // close read end
+	       close (fdarray[1]);       // close fdarray[1]
+	       exec (command1, args1, 0);
+	    } else if (pid > 0) {        // parent (right end of pipe)
+	       close (0);
+	       tmp = dup (fdarray[0]);   // fdarray[0] is the read end, tmp will be 0
+	       close (fdarray[0]);
+	       close (fdarray[1]);       // close write end
+	       exec (command2, args2, 0);
+	    } else {
+	       printf ("Unable to fork\n");
+            }
+</pre>
+<li>Why close read-end and write-end? multiple reasons: maintain that
+every process starts with 3 file descriptors and reading from an empty
+pipe blocks reader, while reading from a closed pipe returns end of
+file.
+<li>How do you  background jobs?
+<pre>
+        $ compute &
+</pre>
+<li>How does the shell implement "&", backgrounding?  (Don't call wait
+immediately).
+<li>More details in the shell lecture later in the term.
+</body>
--- a/web/l13.html
+++ b/web/l13.html
+<title>High-performance File Systems</title>
+<html>
+<head>
+</head>
+<body>
+<h1>High-performance File Systems</h1>
+<p>Required reading: soft updates.
+<h2>Overview</h2>
+<p>A key problem in designing file systems is how to obtain
+performance on file system operations while providing consistency.
+With consistency, we mean, that file system invariants are maintained
+is on disk.  These invariants include that if a file is created, it
+appears in its directory, etc.  If the file system data structures are
+consistent, then it is possible to rebuild the file system to a
+correct state after a failure.
+<p>To ensure consistency of on-disk file system data structures,
+  modifications to the file system must respect certain rules:
+<ul>
+<li>Never point to a structure before it is initialized. An inode must
+be initialized before a directory entry references it.  An block must
+be initialized before an inode references it.
+<li>Never reuse a structure before nullifying all pointers to it.  An
+inode pointer to a disk block must be reset before the file system can
+reallocate the disk block.
+<li>Never reset the last point to a live structure before a new
+pointer is set.  When renaming a file, the file system should not
+remove the old name for an inode until after the new name has been
+written.
+</ul>
+The paper calls these dependencies update dependencies.  
+<p>xv6 ensures these rules by writing every block synchronously, and
+  by ordering the writes appropriately.  With synchronous, we mean
+  that a process waits until the current disk write has been
+  completed before continuing with execution.
+<ul>
+<li>What happens if power fails after 4776 in mknod1?  Did we lose the
+  inode for ever? No, we have a separate program (called fsck), which
+  can rebuild the disk structures correctly and can mark the inode on
+  the free list.
+<li>Does the order of writes in mknod1 matter?  Say, what if we wrote
+  directory entry first and then wrote the allocated inode to disk?
+  This violates the update rules and it is not a good plan. If a
+  failure happens after the directory write, then on recovery we have
+  an directory pointing to an unallocated inode, which now may be
+  allocated by another process for another file!
+<li>Can we turn the writes (i.e., the ones invoked by iupdate and
+  wdir) into delayed writes without creating problems?  No, because
+  the cause might write them back to the disk in an incorrect order.
+  It has no information to decide in what order to write them.
+</ul>
+<p>xv6 is a nice example of the tension between consistency and
+  performance.  To get consistency, xv6 uses synchronous writes,
+  but these writes are slow, because they perform at the rate of a
+  seek instead of the rate of the maximum data transfer rate. The
+  bandwidth to a disk is reasonable high for large transfer (around
+  50Mbyte/s), but latency is low, because of the cost of moving the
+  disk arm(s) (the seek latency is about 10msec).
+<p>This tension is an implementation-dependent one.  The Unix API
+  doesn't require that writes are synchronous.  Updates don't have to
+  appear on disk until a sync, fsync, or open with O_SYNC.  Thus, in
+  principle, the UNIX API allows delayed writes, which are good for
+  performance:
+<ul>
+<li>Batch many writes together in a big one, written at the disk data
+  rate.
+<li>Absorp writes to the same block.
+<li>Schedule writes to avoid seeks.
+</ul>
+<p>Thus the question: how to delay writes and achieve consistency?
+  The paper provides an answer.
+<h2>This paper</h2>
+<p>The paper surveys some of the existing techniques and introduces a
+new to achieve the goal of performance and consistency.
+<p>
+<p>Techniques possible:
+<ul>
+<li>Equip system with NVRAM, and put buffer cache in NVRAM.
+<li>Logging.  Often used in UNIX file systems for metadata updates.
+LFS is an extreme version of this strategy.
+<li>Flusher-enforced ordering.  All writes are delayed. This flusher
+is aware of dependencies between blocks, but doesn't work because
+circular dependencies need to be broken by writing blocks out.
+</ul>
+<p>Soft updates is the solution explored in this paper.  It doesn't
+require NVRAM, and performs as well as the naive strategy of keep all
+dirty block in main memory.  Compared to logging, it is unclear if
+soft updates is better.  The default BSD file systems uses soft
+  updates, but most Linux file systems use logging.
+<p>Soft updates is a sophisticated variant of flusher-enforced
+ordering.  Instead of maintaining dependencies on the block-level, it
+maintains dependencies on file structure level (per inode, per
+directory, etc.), reducing circular dependencies. Furthermore, it
+breaks any remaining circular dependencies by undo changes before
+writing the block and then redoing them to the block after writing.
+<p>Pseudocode for create:
+<pre>
+create (f) {
+   allocate inode in block i  (assuming inode is available)
+   add i to directory data block d  (assuming d has space)
+   mark d has dependent on i, and create undo/redo record
+   update directory inode in block di
+   mark di has dependent on d
+}
+</pre>
+<p>Pseudocode for the flusher:
+<pre>
+flushblock (b)
+{
+  lock b;
+  for all dependencies that b is relying on
+    "remove" that dependency by undoing the change to b
+    mark the dependency as "unrolled"
+  write b 
+}
+write_completed (b) {
+  remove dependencies that depend on b
+  reapply "unrolled" dependencies that b depended on
+  unlock b
+}
+</pre>
+<p>Apply flush algorithm to example:
+<ul>
+<li>A list of two dependencies: directory->inode, inode->directory.
+<li>Lets say syncer picks directory first
+<li>Undo directory->inode changes (i.e., unroll <A,#4>)
+<li>Write directory block
+<li>Remove met dependencies (i.e., remove inode->directory dependency)
+<li>Perform redo operation (i.e., redo <A,#4>)
+<li>Select inode block and write it
+<li>Remove met dependencies (i.e., remove directory->inode dependency)
+<li>Select directory block (it is dirty again!)
+<li>Write it.
+</ul>
+<p>An file operation that is important for file-system consistency 
+is rename.  Rename conceptually works as follows:
+<pre>
+rename (from, to)
+   unlink (to);
+   link (from, to);
+   unlink (from);
+</pre>
+<p>Rename it often used by programs to make a new version of a file
+the current version.  Committing to a new version must happen
+atomically.  Unfortunately, without a transaction-like support
+atomicity is impossible to guarantee, so a typical file systems
+provides weaker semantics for rename: if to already exists, an
+instance of to will always exist, even if the system should crash in
+the middle of the operation.  Does the above implementation of rename
+guarantee this semantics? (Answer: no).
+<p>If rename is implemented as unlink, link, unlink, then it is
+difficult to guarantee even the weak semantics. Modern UNIXes provide
+rename as a file system call:
+<pre>
+   update dir block for to point to from's inode // write block
+   update dir block for from to free entry // write block
+</pre>
+<p>fsck may need to correct refcounts in the inode if the file
+system fails during rename.  for example, a crash after the first
+write followed by fsck should set refcount to 2, since both from
+and to are pointing at the inode.
+<p>This semantics is sufficient, however, for an application to ensure
+atomicity. Before the call, there is a from and perhaps a to.  If the
+call is successful, following the call there is only a to.  If there
+is a crash, there may be both a from and a to, in which case the
+caller knows the previous attempt failed, and must retry.  The
+subtlety is that if you now follow the two links, the "to" name may
+link to either the old file or the new file.  If it links to the new
+file, that means that there was a crash and you just detected that the
+rename operation was composite.  On the other hand, the retry
+procedure can be the same for either case (do the rename again), so it
+isn't necessary to discover how it failed.  The function follows the
+golden rule of recoverability, and it is idempotent, so it lays all
+the needed groundwork for use as part of a true atomic action.
+<p>With soft updates renames becomes:
+<pre>
+rename (from, to) {
+   i = namei(from);
+   add "to" directory data block td a reference to inode i
+   mark td dependent on block i
+   update directory inode "to" tdi
+   mark tdi as dependent on td
+   remove "from" directory data block fd a reference to inode i
+   mark fd as dependent on tdi
+   update directory inode in block fdi
+   mark fdi as dependent on fd
+}
+</pre>
+<p>No synchronous writes!
+<p>What needs to be done on recovery?  (Inspect every statement in
+rename and see what inconsistencies could exist on the disk; e.g.,
+refcnt inode could be too high.)  None of these inconsitencies require
+fixing before the file system can operate; they can be fixed by a
+background file system repairer. 
+<h2>Paper discussion</h2>
+<p>Do soft updates perform any useless writes? (A useless write is a
+write that will be immediately overwritten.)  (Answer: yes.) Fix
+syncer to becareful with what block to start.  Fix cache replacement
+to selecting LRU block with no pendending dependencies.
+<p>Can a log-structured file system implement rename better? (Answer:
+yes, since it can get the refcnts right).
+<p>Discuss all graphs.
+</body>
--- a/web/l14.txt
+++ b/web/l14.txt
--- a/web/l19.txt
+++ b/web/l19.txt
--- a/web/l2.html
+++ b/web/l2.html
--- a/web/l3.html
+++ b/web/l3.html
--- a/web/l4.html
+++ b/web/l4.html
--- a/web/l5.html
+++ b/web/l5.html
+<title>Lecture 5/title>
+<html>
+<head>
+</head>
+<body>
+<h2>Address translation and sharing using page tables</h2>
+<p> Reading: <a href="../readings/i386/toc.htm">80386</a> chapters 5 and 6<br>
+<p> Handout: <b> x86 address translation diagram</b> - 
+<a href="x86_translation.ps">PS</a> -
+<a href="x86_translation.eps">EPS</a> -
+<a href="x86_translation.fig">xfig</a>
+<br>
+<p>Why do we care about x86 address translation?
+<ul>
+<li>It can simplify s/w structure by placing data at fixed known addresses.
+<li>It can implement tricks like demand paging and copy-on-write.
+<li>It can isolate programs to contain bugs.
+<li>It can isolate programs to increase security.
+<li>JOS uses paging a lot, and segments more than you might think.
+</ul>
+<p>Why aren't protected-mode segments enough?
+<ul>
+<li>Why did the 386 add translation using page tables as well?
+<li>Isn't it enough to give each process its own segments?
+</ul>
+<p>Translation using page tables on x86:
+<ul>
+<li>paging hardware maps linear address (la) to physical address (pa)
+<li>(we will often interchange "linear" and "virtual")
+<li>page size is 4096 bytes, so there are 1,048,576 pages in 2^32
+<li>why not just have a big array with each page #'s translation?
+<ul>
+<li>table[20-bit linear page #] => 20-bit phys page #
+</ul>
+<li>386 uses 2-level mapping structure
+<li>one page directory page, with 1024 page directory entries (PDEs)
+<li>up to 1024 page table pages, each with 1024 page table entries (PTEs)
+<li>so la has 10 bits of directory index, 10 bits table index, 12 bits offset
+<li>What's in a PDE or PTE?
+<ul>
+<li>20-bit phys page number, present, read/write, user/supervisor
+</ul>
+<li>cr3 register holds physical address of current page directory
+<li>puzzle: what do PDE read/write and user/supervisor flags mean?
+<li>puzzle: can supervisor read/write user pages?
+<li>Here's how the MMU translates an la to a pa:
+   <pre>
+   uint
+   translate (uint la, bool user, bool write)
+   {
+     uint pde; 
+     pde = read_mem (%CR3 + 4*(la >> 22));
+     access (pde, user, read);
+     pte = read_mem ( (pde & 0xfffff000) + 4*((la >> 12) & 0x3ff));
+     access (pte, user, read);
+     return (pte & 0xfffff000) + (la & 0xfff);
+   }
+   // check protection. pxe is a pte or pde.
+   // user is true if CPL==3
+   void
+   access (uint pxe, bool user, bool write)
+   {
+     if (!(pxe & PG_P)  
+        => page fault -- page not present
+     if (!(pxe & PG_U) && user)
+        => page fault -- not access for user
+     if (write && !(pxe & PG_W))
+       if (user)   
+          => page fault -- not writable
+       else if (!(pxe & PG_U))
+         => page fault -- not writable
+       else if (%CR0 & CR0_WP) 
+         => page fault -- not writable
+   }
+   </pre>
+<li>CPU's TLB caches vpn => ppn mappings
+<li>if you change a PDE or PTE, you must flush the TLB!
+<ul>
+  <li>by re-loading cr3
+</ul>
+<li>turn on paging by setting CR0_PE bit of %cr0
+</ul>
+Can we use paging to limit what memory an app can read/write?
+<ul>
+<li>user can't modify cr3 (requires privilege)
+<li>is that enough?
+<li>could user modify page tables? after all, they are in memory.
+</ul>
+<p>How we will use paging (and segments) in JOS:
+<ul>
+<li>use segments only to switch privilege level into/out of kernel
+<li>use paging to structure process address space
+<li>use paging to limit process memory access to its own address space
+<li>below is the JOS virtual memory map
+<li>why map both kernel and current process? why not 4GB for each?
+<li>why is the kernel at the top?
+<li>why map all of phys mem at the top? i.e. why multiple mappings?
+<li>why map page table a second time at VPT?
+<li>why map page table a third time at UVPT?
+<li>how do we switch mappings for a different process?
+</ul>
+<pre>
+    4 Gig -------->  +------------------------------+
+                     |                              | RW/--
+                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+                     :              .               :
+                     :              .               :
+                     :              .               :
+                     |~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| RW/--
+                     |                              | RW/--
+                     |   Remapped Physical Memory   | RW/--
+                     |                              | RW/--
+    KERNBASE ----->  +------------------------------+ 0xf0000000
+                     |  Cur. Page Table (Kern. RW)  | RW/--  PTSIZE
+    VPT,KSTACKTOP--> +------------------------------+ 0xefc00000      --+
+                     |         Kernel Stack         | RW/--  KSTKSIZE   |
+                     | - - - - - - - - - - - - - - -|                 PTSIZE
+                     |      Invalid Memory          | --/--             |
+    ULIM     ------> +------------------------------+ 0xef800000      --+
+                     |  Cur. Page Table (User R-)   | R-/R-  PTSIZE
+    UVPT      ---->  +------------------------------+ 0xef400000
+                     |          RO PAGES            | R-/R-  PTSIZE
+    UPAGES    ---->  +------------------------------+ 0xef000000
+                     |           RO ENVS            | R-/R-  PTSIZE
+ UTOP,UENVS ------>  +------------------------------+ 0xeec00000
+ UXSTACKTOP -/       |     User Exception Stack     | RW/RW  PGSIZE
+                     +------------------------------+ 0xeebff000
+                     |       Empty Memory           | --/--  PGSIZE
+    USTACKTOP  --->  +------------------------------+ 0xeebfe000
+                     |      Normal User Stack       | RW/RW  PGSIZE
+                     +------------------------------+ 0xeebfd000
+                     |                              |
+                     |                              |
+                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+                     .                              .
+                     .                              .
+                     .                              .
+                     |~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
+                     |     Program Data & Heap      |
+    UTEXT -------->  +------------------------------+ 0x00800000
+    PFTEMP ------->  |       Empty Memory           |        PTSIZE
+                     |                              |
+    UTEMP -------->  +------------------------------+ 0x00400000
+                     |       Empty Memory           |        PTSIZE
+    0 ------------>  +------------------------------+
+</pre>
+<h3>The VPT </h3>
+<p>Remember how the X86 translates virtual addresses into physical ones:
+<p><img src=pagetables.png>
+<p>CR3 points at the page directory.  The PDX part of the address
+indexes into the page directory to give you a page table.  The
+PTX part indexes into the page table to give you a page, and then
+you add the low bits in.
+<p>But the processor has no concept of page directories, page tables,
+and pages being anything other than plain memory.  So there's nothing
+that says a particular page in memory can't serve as two or three of
+these at once.  The processor just follows pointers:
+pd = lcr3();
+pt = *(pd+4*PDX);
+page = *(pt+4*PTX);
+<p>Diagramatically, it starts at CR3, follows three arrows, and then stops.
+<p>If we put a pointer into the page directory that points back to itself at
+index Z, as in
+<p><img src=vpt.png>
+<p>then when we try to translate a virtual address with PDX and PTX
+equal to V, following three arrows leaves us at the page directory.
+So that virtual page translates to the page holding the page directory.
+In Jos, V is 0x3BD, so the virtual address of the VPD is
+(0x3BD&lt;&lt;22)|(0x3BD&lt;&lt;12).
+<p>Now, if we try to translate a virtual address with PDX = V but an
+arbitrary PTX != V, then following three arrows from CR3 ends
+one level up from usual (instead of two as in the last case),
+which is to say in the page tables.  So the set of virtual pages
+with PDX=V form a 4MB region whose page contents, as far
+as the processor is concerned, are the page tables themselves.
+In Jos, V is 0x3BD so the virtual address of the VPT is (0x3BD&lt;&lt;22).
+<p>So because of the "no-op" arrow we've cleverly inserted into
+the page directory, we've mapped the pages being used as
+the page directory and page table (which are normally virtually
+invisible) into the virtual address space.
+</body>
--- a/web/mkhtml
+++ b/web/mkhtml
+#!/usr/bin/perl
+my @lines = <>;
+my $text = join('', @lines);
+my $title;
+if($text =~ /^\*\* (.*?)\n/m){
+	$title = $1;
+	$text = $` . $';
+}else{
+	$title = "Untitled";
+}
+$text =~ s/[ \t]+$//mg;
+$text =~ s/^$/<br><br>/mg;
+$text =~ s!\b([a-z0-9]+\.(c|s|pl|h))\b!<a href="src/$1.html">$1</a>!g;
+$text =~ s!^(Lecture [0-9]+\. .*?)$!<b><i>$1</i></b>!mg;
+$text =~ s!^\* (.*?)$!<h2>$1</h2>!mg;
+$text =~ s!((<br>)+\n)+<h2>!\n<h2>!g;
+$text =~ s!</h2>\n?((<br>)+\n)+!</h2>\n!g;
+$text =~ s!((<br>)+\n)+<b>!\n<br><br><b>!g;
+$text =~ s!\b\s*--\s*\b!\&ndash;!g;
+$text =~ s!\[([^\[\]|]+) \| ([^\[\]]+)\]!<a href="$1">$2</a>!g;
+$text =~ s!\[([^ \t]+)\]!<a href="$1">$1</a>!g;
+$text =~ s!``!\&ldquo;!g;
+$text =~ s!''!\&rdquo;!g;
+print <<EOF;
+<!-- AUTOMATICALLY GENERATED: EDIT the .txt version, not the .html version -->
+<html>
+<head>
+<title>$title</title>
+<style type="text/css"><!--
+body {
+	background-color: white;
+	color: black;
+	font-size: medium;
+	line-height: 1.2em;
+	margin-left: 0.5in;
+	margin-right: 0.5in;
+	margin-top: 0;
+	margin-bottom: 0;
+}
+h1 {
+	text-indent: 0in;
+	text-align: left;
+	margin-top: 2em;
+	font-weight: bold;
+	font-size: 1.4em;
+}
+h2 {
+	text-indent: 0in;
+	text-align: left;
+	margin-top: 2em;
+	font-weight: bold;
+	font-size: 1.2em;
+}
+--></style>
+</head>
+<body bgcolor=#ffffff>
+<h1>$title</h1>
+<br><br>
+EOF
+print $text;
+print <<EOF;
+</body>
+</html>
+EOF
--- a/web/x86-intr.html
+++ b/web/x86-intr.html
+<title>Homework: xv6 and Interrupts and Exceptions</title>
+<html>
+<head>
+</head>
+<body>
+<h1>Homework: xv6 and Interrupts and Exceptions</h1>
+<p>
+<b>Read</b>: xv6's trapasm.S, trap.c, syscall.c, vectors.S, and usys.S. Skim
+lapic.c, ioapic.c, and picirq.c
+<p>
+<b>Hand-In Procedure</b>
+<p>
+You are to turn in this homework during lecture. Please
+write up your answers to the exercises below and hand them in to a
+6.828 staff member at the beginning of the lecture.
+<p>
+<b>Introduction</b>
+<p>Try to understand 
+xv6's trapasm.S, trap.c, syscall.c, vectors.S, and usys.S. Skim
+  You will need to consult:
+<p>Chapter 5 of <a href="../readings/ia32/IA32-3.pdf">IA-32 Intel
+Architecture Software Developer's Manual, Volume 3: System programming
+guide</a>; you can skip sections 5.7.1, 5.8.2, and 5.12.2. Be aware
+that terms such as exceptions, traps, interrupts, faults and aborts
+have no standard meaning.
+<p>Chapter 9 of the 1987 <a href="../readings/i386/toc.htm">i386
+Programmer's Reference Manual</a> also covers exception and interrupt
+handling in IA32 processors.
+<p><b>Assignment</b>: 
+In xv6, set a breakpoint at the beginning of <code>syscall()</code> to
+catch the very first system call.  What values are on the stack at
+this point?  Turn in the output of <code>print-stack 35</code> at that
+breakpoint with each value labeled as to what it is (e.g.,
+saved <code>%ebp</code> for <code>trap</code>,
+<code>trapframe.eip</code>, etc.).
+<p>
+<b>This completes the homework.</b>
+</body>
--- a/web/x86-intro.html
+++ b/web/x86-intro.html
+<title>Homework: Intro to x86 and PC</title>
+<html>
+<head>
+</head>
+<body>
+<h1>Homework: Intro to x86 and PC</h1>
+<p>Today's lecture is an introduction to the x86 and the PC, the
+platform for which you will write an operating system.  The assigned
+book is a reference for x86 assembly programming of which you will do
+some.
+<p><b>Assignment</b> Make sure to do exercise 1 of lab 1 before
+coming to lecture.
+</body>
--- a/web/x86-mmu.html
+++ b/web/x86-mmu.html
+<title>Homework: x86 MMU</title>
+<html>
+<head>
+</head>
+<body>
+<h1>Homework: x86 MMU</h1>
+<p>Read chapters 5 and 6 of
+<a href="../readings/i386/toc.htm">Intel 80386 Reference Manual</a>.
+These chapters explain
+the x86 Memory Management Unit (MMU),
+which we will cover in lecture today and which you need
+to understand in order to do lab 2.
+<p>
+<b>Read</b>: bootasm.S and setupsegs() in proc.c
+<p>
+<b>Hand-In Procedure</b>
+<p>
+You are to turn in this homework during lecture. Please
+write up your answers to the exercises below and hand them in to a
+6.828 staff member by the beginning of lecture.
+<p>
+<p><b>Assignment</b>: Try to understand setupsegs() in proc.c.
+  What values are written into <code>gdt[SEG_UCODE]</code>
+  and <code>gdt[SEG_UDATA]</code> for init, the first user-space
+  process?
+  (You can use Bochs to answer this question.)
+</body>
--- a/web/x86-mmu1.pdf
+++ b/web/x86-mmu1.pdf
--- a/web/x86-mmu2.pdf
+++ b/web/x86-mmu2.pdf
--- a/web/xv6-disk.html
+++ b/web/xv6-disk.html
+<html>
+<head>
+<title>Homework: Files and Disk I/O</title>
+</head>
+<body>
+<h1>Homework: Files and Disk I/O</h1>
+<p>
+<b>Read</b>: bio.c, fd.c, fs.c, and ide.c
+<p>
+This homework should be turned in at the beginning of lecture.
+<p>
+<b>File and Disk I/O</b>
+<p>Insert a print statement in bwrite so that you get a
+print every time a block is written to disk:
+<pre>
+  cprintf("bwrite sector %d\n", sector);
+</pre>
+<p>Build and boot a new kernel and run these three commands at the shell:
+<pre>
+  echo &gt;a
+  echo &gt;a
+  rm a
+  mkdir d
+</pre>
+(You can try <tt>rm d</tt> if you are curious, but it should look
+almost identical to <tt>rm a</tt>.)
+<p>You should see a sequence of bwrite prints after running each command.
+Record the list and annotate it with the calling function and
+what block is being written.
+For example, this is the <i>second</i> <tt>echo &gt;a</tt>:
+<pre>
+$ echo >a
+bwrite sector 121  # writei  (data block)
+bwrite sector 3    # iupdate (inode block)
+$ 
+</pre>
+<p>Hint: the easiest way to get the name of the
+calling function is to add a string argument to bwrite,
+edit all the calls to bwrite to pass the name of the
+calling function, and just print it.
+You should be able to reason about what kind of
+block is being written just from the calling function.
+<p>You need not write the following up, but try to
+understand why each write is happening.  This will
+help your understanding of the file system layout
+and the code.
+<p>
+<b>This completes the homework.</b>
+</body>
--- a/web/xv6-intro.html
+++ b/web/xv6-intro.html
+<title>Homework: intro to xv6</title>
+<html>
+<head>
+</head>
+<body>
+<h1>Homework: intro to xv6</h1>
+<p>This lecture is the introduction to xv6, our re-implementation of
+  Unix v6.  Read the source code in the assigned files. You won't have
+  to understand the details yet; we will focus on how the first
+  user-level process comes into existence after the computer is turned
+  on.
+<p>
+<b>Hand-In Procedure</b>
+<p>
+You are to turn in this homework during lecture. Please
+write up your answers to the exercises below and hand them in to a
+6.828 staff member at the beginning of lecture.
+<p>
+<p><b>Assignment</b>: 
+<br>
+Fetch and un-tar the xv6 source:
+<pre>
+sh-3.00$ wget http://pdos.csail.mit.edu/6.828/2007/src/xv6-rev1.tar.gz 
+sh-3.00$ tar xzvf xv6-rev1.tar.gz
+xv6/
+xv6/asm.h
+xv6/bio.c
+xv6/bootasm.S
+xv6/bootmain.c
+...
+$
+</pre>
+Build xv6:
+<pre>
+$ cd xv6
+$ make
+gcc -O -nostdinc -I. -c bootmain.c
+gcc -nostdinc -I. -c bootasm.S
+ld -N -e start -Ttext 0x7C00 -o bootblock.o bootasm.o bootmain.o
+objdump -S bootblock.o > bootblock.asm
+objcopy -S -O binary bootblock.o bootblock
+...
+$ 
+</pre>
+Find the address of the <code>main</code> function by
+looking in <code>kernel.asm</code>:
+<pre>
+% grep main kernel.asm
+...
+00102454 &lt;mpmain&gt;:
+mpmain(void)
+001024d0 &lt;main&gt;:
+  10250d:       79 f1                   jns    102500 &lt;main+0x30&gt;
+  1025f3:       76 6f                   jbe    102664 &lt;main+0x194&gt;
+  102611:       74 2f                   je     102642 &lt;main+0x172&gt;
+</pre>
+In this case, the address is <code>001024d0</code>.
+<p>
+Run the kernel inside Bochs, setting a breakpoint
+at the beginning of <code>main</code> (i.e., the address
+you just found).
+<pre>
+$ make bochs
+if [ ! -e .bochsrc ]; then ln -s dot-bochsrc .bochsrc; fi
+bochs -q
+========================================================================
+                       Bochs x86 Emulator 2.2.6
+                    (6.828 distribution release 1)
+========================================================================
+00000000000i[     ] reading configuration from .bochsrc
+00000000000i[     ] installing x module as the Bochs GUI
+00000000000i[     ] Warning: no rc file specified.
+00000000000i[     ] using log file bochsout.txt
+Next at t=0
+(0) [0xfffffff0] f000:fff0 (unk. ctxt): jmp far f000:e05b         ; ea5be000f0
+(1) [0xfffffff0] f000:fff0 (unk. ctxt): jmp far f000:e05b         ; ea5be000f0
+&lt;bochs&gt; 
+</pre>
+Look at the registers and the stack contents:
+<pre>
+&lt;bochs&gt; info reg
+...
+&lt;bochs&gt; print-stack
+...
+&lt;bochs&gt;
+</pre>
+Which part of the stack printout is actually the stack?
+(Hint: not all of it.)  Identify all the non-zero values
+on the stack.<p>
+<b>Turn in:</b> the output of print-stack with 
+the valid part of the stack marked.  Write a short (3-5 word)
+comment next to each non-zero value explaining what it is.
+<p>
+Now look at kernel.asm for the instructions in main that read:
+<pre>
+  10251e:       8b 15 00 78 10 00       mov    0x107800,%edx
+  102524:       8d 04 92                lea    (%edx,%edx,4),%eax
+  102527:       8d 04 42                lea    (%edx,%eax,2),%eax
+  10252a:       c1 e0 04                shl    $0x4,%eax
+  10252d:       01 d0                   add    %edx,%eax
+  10252f:       8d 04 85 1c ad 10 00    lea    0x10ad1c(,%eax,4),%eax
+  102536:       89 c4                   mov    %eax,%esp
+</pre>
+(The addresses and constants might be different on your system,
+and the compiler might use <code>imul</code> instead of the <code>lea,lea,shl,add,lea</code> sequence.
+Look for the move into <code>%esp</code>).
+<p>
+Which lines in <code>main.c</code> do these instructions correspond to?
+<p>
+Set a breakpoint at the first of those instructions
+and let the program run until the breakpoint:
+<pre>
+&lt;bochs&gt; vb 0x8:0x10251e
+&lt;bochs&gt; s
+...
+&lt;bochs&gt; c
+(0) Breakpoint 2, 0x0010251e (0x0008:0x0010251e)
+Next at t=1157430
+(0) [0x0010251e] 0008:0x0010251e (unk. ctxt): mov edx, dword ptr ds:0x107800 ; 8b1500781000
+(1) [0xfffffff0] f000:fff0 (unk. ctxt): jmp far f000:e05b         ; ea5be000f0
+&lt;bochs&gt; 
+</pre>
+(The first <code>s</code> command is necessary
+to single-step past the breakpoint at main, otherwise <code>c</code>
+will not make any progress.)
+<p>
+Inspect the registers and stack again
+(<code>info reg</code> and <code>print-stack</code>).
+Then step past those seven instructions
+(<code>s 7</code>)
+and inspect them again.
+Convince yourself that the stack has changed correctly.
+<p>
+<b>Turn in:</b> answers to the following questions.
+Look at the assembly for the call to 
+<code>lapic_init</code> that occurs after the
+the stack switch.  Where does the 
+<code>bcpu</code> argument come from?
+What would have happened if <code>main</code>
+stored <code>bcpu</code>
+on the stack before those four assembly instructions?
+Would the code still work?  Why or why not?
+<p>
+</body>
+</html>
--- a/web/xv6-lock.html
+++ b/web/xv6-lock.html
+<title>Homework: Locking</title>
+<html>
+<head>
+</head>
+<body>
+<h1>Homework: Locking</h1>
+<p>
+<b>Read</b>: spinlock.c
+<p>
+<b>Hand-In Procedure</b>
+<p>
+You are to turn in this homework at the beginning of lecture. Please
+write up your answers to the exercises below and hand them in to a
+6.828 staff member at the beginning of lecture.
+<p>
+<b>Assignment</b>:
+In this assignment we will explore some of the interaction
+between interrupts and locking.
+<p>
+Make sure you understand what would happen if the kernel executed
+the following code snippet:
+<pre>
+  struct spinlock lk;
+  initlock(&amp;lk, "test lock");
+  acquire(&amp;lk);
+  acquire(&amp;lk);
+</pre>
+(Feel free to use Bochs to find out.  <code>acquire</code> is in <code>spinlock.c</code>.)
+<p>
+An <code>acquire</code> ensures interrupts are off
+on the local processor using <code>cli</code>,
+and interrupts remain off until the <code>release</code>
+of the last lock held by that processor
+(at which point they are enabled using <code>sti</code>).
+<p>
+Let's see what happens if we turn on interrupts while
+holding the <code>ide</code> lock.
+In <code>ide_rw</code> in <code>ide.c</code>, add a call
+to <code>sti()</code> after the <code>acquire()</code>.
+Rebuild the kernel and boot it in Bochs.
+Chances are the kernel will panic soon after boot; try booting Bochs a few times
+if it doesn't.
+<p>
+<b>Turn in</b>: explain in a few sentences why the kernel panicked.
+You may find it useful to look up the stack trace
+(the sequence of <code>%eip</code> values printed by <code>panic</code>)
+in the <code>kernel.asm</code> listing.
+<p>
+Remove the <code>sti()</code> you added,
+rebuild the kernel, and make sure it works again.
+<p>
+Now let's see what happens if we turn on interrupts
+while holding the <code>kalloc_lock</code>.
+In <code>kalloc()</code> in <code>kalloc.c</code>, add
+a call to <code>sti()</code> after the call to <code>acquire()</code>.
+You will also need to add 
+<code>#include "x86.h"</code> at the top of the file after
+the other <code>#include</code> lines.
+Rebuild the kernel and boot it in Bochs.
+It will not panic.
+<p>
+<b>Turn in</b>: explain in a few sentences why the kernel didn't panic.
+What is different about <code>kalloc_lock</code>
+as compared to <code>ide_lock</code>?
+<p>
+You do not need to understand anything about the details of the IDE hardware
+to answer this question, but you may find it helpful to look 
+at which functions acquire each lock, and then at when those
+functions get called.
+<p>
+(There is a very small but non-zero chance that the kernel will panic
+with the extra <code>sti()</code> in <code>kalloc</code>.
+If the kernel <i>does</i> panic, make doubly sure that
+you removed the <code>sti()</code> call from 
+<code>ide_rw</code>.  If it continues to panic and the
+only extra <code>sti()</code> is in <code>bio.c</code>,
+then mail <i>6.828-staff&#64;pdos.csail.mit.edu</i>
+and think about buying a lottery ticket.)
+<p>
+<b>Turn in</b>: Why does <code>release()</code> clear
+<code>lock-&gt;pcs[0]</code> and <code>lock-&gt;cpu</code>
+<i>before</i> clearing <code>lock-&gt;locked</code>?
+Why not wait until after? 
+</body>
+</html>
--- a/web/xv6-names.html
+++ b/web/xv6-names.html
+<html>
+<head>
+<title>Homework: Naming</title>
+</head>
+<body>
+<h1>Homework: Naming</h1>
+<p>
+<b>Read</b>: namei in fs.c, fd.c, sysfile.c
+<p>
+This homework should be turned in at the beginning of lecture.
+<p>
+<b>Symbolic Links</b>
+<p>
+As you read namei and explore its varied uses throughout xv6,
+think about what steps would be required to add symbolic links
+to xv6.
+A symbolic link is simply a file with a special type (e.g., T_SYMLINK
+instead of T_FILE or T_DIR) whose contents contain the path being
+linked to.
+<p>
+Turn in a short writeup of how you would change xv6 to support 
+symlinks.  List the functions that would have to be added or changed,
+with short descriptions of the new functionality or changes.
+<p>
+<b>This completes the homework.</b>
+<p>
+The following is <i>not required</i>.  If you want to try implementing
+symbolic links in xv6, here are the files that the course staff
+had to change to implement them:
+<pre>
+fs.c: 20 lines added, 4 modified
+syscall.c: 2 lines added
+syscall.h: 1 line added
+sysfile.c: 15 lines added
+user.h: 1 line added
+usys.S: 1 line added
+</pre>
+Also, here is an <i>ln</i> program:
+<pre>
+#include "types.h"
+#include "user.h"
+int
+main(int argc, char *argv[])
+{
+  int (*ln)(char*, char*);
+  ln = link;
+  if(argc &gt; 1 &amp;&amp; strcmp(argv[1], "-s") == 0){
+    ln = symlink;
+    argc--;
+    argv++;
+  }
+  if(argc != 3){
+    printf(2, "usage: ln [-s] old new (%d)\n", argc);
+    exit();
+  }
+  if(ln(argv[1], argv[2]) &lt; 0){
+    printf(2, "%s failed\n", ln == symlink ? "symlink" : "link");
+    exit();
+  }
+  exit();
+}
+</pre>
+</body>
--- a/web/xv6-sched.html
+++ b/web/xv6-sched.html
+<title>Homework: Threads and Context Switching</title>
+<html>
+<head>
+</head>
+<body>
+<h1>Homework: Threads and Context Switching</h1>
+<p>
+<b>Read</b>: swtch.S and proc.c (focus on the code that switches
+between processes, specifically <code>scheduler</code> and <code>sched</code>).
+<p>
+<b>Hand-In Procedure</b>
+<p>
+You are to turn in this homework during lecture. Please
+write up your answers to the exercises below and hand them in to a
+6.828 staff member at the beginning of lecture.
+<p>
+<b>Introduction</b>
+<p>
+In this homework you will investigate how the kernel switches between
+two processes. 
+<p>
+<b>Assignment</b>:
+<p>
+Suppose a process that is running in the kernel
+calls <code>sched()</code>, which ends up jumping
+into <code>scheduler()</code>.
+<p>
+<b>Turn in</b>: 
+Where is the stack that <code>sched()</code> executes on?
+<p>
+<b>Turn in</b>: 
+Where is the stack that <code>scheduler()</code> executes on?
+<p>
+<b>Turn in:</b>
+When <code>sched()</code> calls <code>swtch()</code>,
+does that call to <code>swtch()</code> ever return? If so, when?
+<p>
+<b>Turn in:</b>
+Why does <code>swtch()</code> copy %eip from the stack into the
+context structure, only to copy it from the context
+structure to the same place on the stack
+when the process is re-activated?
+What would go wrong if <code>swtch()</code> just left the
+%eip on the stack and didn't store it in the context structure?
+<p>
+Surround the call to <code>swtch()</code> in <code>schedule()</code> with calls
+to <code>cons_putc()</code> like this:
+<pre>
+      cons_putc('a');
+      swtch(&cpus[cpu()].context, &p->context);
+      cons_putc('b');
+</pre>
+<p>
+Similarly,
+surround the call to <code>swtch()</code> in <code>sched()</code> with calls
+to <code>cons_putc()</code> like this:
+<pre>
+  cons_putc('c');
+  swtch(&cp->context, &cpus[cpu()].context);
+  cons_putc('d');
+</pre>
+<p>
+Rebuild your kernel and boot it on bochs.
+With a few exceptions
+you should see a regular four-character pattern repeated over and over.
+<p>
+<b>Turn in</b>: What is the four-character pattern?
+<p>
+<b>Turn in</b>: The very first characters are <code>ac</code>. Why does
+this happen?
+<p>
+<b>Turn in</b>: Near the start of the last line you should see
+<code>bc</code>. How could this happen?
+<p>
+<b>This completes the homework.</b>
+</body>
--- a/web/xv6-sleep.html
+++ b/web/xv6-sleep.html
+<title>Homework: sleep and wakeup</title>
+<html>
+<head>
+</head>
+<body>
+<h1>Homework: sleep and wakeup</h1>
+<p>
+<b>Read</b>: pipe.c
+<p>
+<b>Hand-In Procedure</b>
+<p>
+You are to turn in this homework at the beginning of lecture. Please
+write up your answers to the questions below and hand them in to a
+6.828 staff member at the beginning of lecture.
+<p>
+<b>Introduction</b>
+<p>
+Remember in lecture 7 we discussed locking a linked list implementation.
+The insert code was:
+<pre>
+        struct list *l;
+        l = list_alloc();
+        l->next = list_head;
+        list_head = l;
+</pre>
+and if we run the insert on multiple processors simultaneously with no locking,
+this ordering of instructions can cause one of the inserts to be lost:
+<pre>
+        CPU1                           CPU2
+        struct list *l;
+        l = list_alloc();
+        l->next = list_head;
+                                       struct list *l;
+                                       l = list_alloc();
+                                       l->next = list_head;
+                                       list_head = l;
+        list_head = l;
+</pre>
+(Even though the instructions can happen simultaneously, we
+write out orderings where only one CPU is "executing" at a time,
+to avoid complicating things more than necessary.)
+<p>
+In this case, the list element allocated by CPU2 is lost from
+the list by CPU1's update of list_head.
+Adding a lock that protects the final two instructions makes
+the read and write of list_head atomic, so that this
+ordering is impossible.
+<p>
+The reading for this lecture is the implementation of sleep and wakeup,
+which are used for coordination between different processes executing
+in the kernel, perhaps simultaneously.
+<p>
+If there were no locking at all in sleep and wakeup, it would be
+possible for a sleep and its corresponding wakeup, if executing
+simultaneously on different processors, to miss each other,
+so that the wakeup didn't find any process to wake up, and yet the
+process calling sleep does go to sleep, never to awake.  Obviously this is something
+we'd like to avoid.
+<p>
+Read the code with this in mind.
+<p>
+<br><br>
+<b>Questions</b>
+<p>
+(Answer and hand in.)
+<p>
+1.  How does the proc_table_lock help avoid this problem?  Give an
+ordering of instructions (like the above example for linked list
+insertion)
+that could result in a wakeup being missed if the proc_table_lock were not used.
+You need only include the relevant lines of code.
+<p>
+2.  sleep is also protected by a second lock, its second argument,
+which need not be the proc_table_lock.  Look at the example in ide.c,
+which uses the ide_lock.  Give an ordering of instructions that could
+result in a wakeup being missed if the ide_lock were not being used.
+(Hint: this should not be the same as your answer to question 2.  The
+two locks serve different purposes.)<p>
+<br><br>
+<b>This completes the homework.</b>
+</body>