summaryrefslogtreecommitdiff
path: root/docs/core-academy/ca06.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/core-academy/ca06.md')
-rw-r--r--docs/core-academy/ca06.md418
1 files changed, 418 insertions, 0 deletions
diff --git a/docs/core-academy/ca06.md b/docs/core-academy/ca06.md
new file mode 100644
index 0000000..6eb1062
--- /dev/null
+++ b/docs/core-academy/ca06.md
@@ -0,0 +1,418 @@
+---
+description: "Core Academy lesson on Vere's memory management covering the loom allocator, noun memory layout, reference counting, structural sharing, the road system, metacircularity, paging, and snapshots."
+layout:
+ title:
+ visible: true
+ description:
+ visible: false
+ tableOfContents:
+ visible: true
+ outline:
+ visible: true
+ pagination:
+ visible: true
+---
+
+# 7. Vere II: The Loom
+
+_In this lesson we explain the memory allocator: the loom, noun memory layout, reference counting, structural sharing, raods, and metacircularity. We'll also see how paging and snapshots work, including page faults, memory protection, the guard page, and demand paging._
+
+## The Loom <a href="#the-loom" id="the-loom"></a>
+
+Vere's main memory model is called the _loom_. (Presumably this is from the roads shuttling back and forth, and perhaps mixing a metaphor.) A contiguous block of memory, formerly 2GB but now specified by the runtime flag, is allocated for the loom. This is the noun arena, and to work on it we need to use special `u3`-specific allocators (`u3a`).
+
+One standard contemporary memory model afforded by an operating system has a [heap](https://en.wikipedia.org/wiki/Memory_management#HEAP) for manual dynamic memory allocation (C `malloc()`) and a [stack](https://en.wikipedia.org/wiki/Stack-based_memory_allocation) for local variable data (last-in-first-out, C `alloca()` but also implicit). The heap grows from the bottom up and the stack from the top down. (The alternative model today is to use SunOS-style `mmap()` to allow virtual memory paging anywhere in memory.)
+
+```
+ 0 brk ffff
+ | heap | stack |
+ |------------#################################+++++++++++++|
+ | | |
+ 0 sp ffff
+```
+
+* `brk` is the [`brk()` System V call](https://utcc.utoronto.ca/~cks/space/blog/unix/SbrkVersusMmap), which marked the limit of the heap arena.
+* `sp` is the stack pointer.
+
+`u3` differs from this model in one particular: by permitting the heap and stack to point either way, we can efficiently nest pairs of stack and heap. We call such a pair (and their free memory arena) a _road_. The outermost road is the _surface road_, and inner roads are created in alternating directions when dependent calculations are embarked upon.
+
+When a new inner road is created, it switches direction from the outer road. This puts its heap up against the outer road's stack, and its stack up against the outer road's heap. But when the road is terminated, its stack is freed while its heap becomes part of the outer road's stack.
+
+A conventional heap-low-stack-high road is a north road:
+
+```
+ 0 rut hat ffff
+ | | | |
+ |~~~~~~~~~~~~-------##########################+++++++$~~~~~|
+ | | | |
+ 0 cap mat ffff
+```
+
+while a reversed heap-high-stack-low road is a south road:
+
+```
+ 0 mat cap ffff
+ | | | |
+ |~~~~~~~~~~~~$++++++##########################--------~~~~~|
+ | | | |
+ 0 hat rut ffff
+```
+
+* `cap` is the top of the stack.
+* `mat` is the bottom of the stack (`ffff` in the example surface road).
+* `rut` is the bottom of the heap arena (not `0` because of immutable storage).
+* `hat` is the top of the heap arena.
+* `~` is deep storage (immutable).
+* `-` is durable storage (heap).
+* `+` is temporary storage (stack).
+* `$` is the allocation frame.
+* `#` is free memory.
+
+The motivation for the road model is that you need not update refcounts in senior memory. This diminishes the downsides of reference counting. (The upside of refcounting is deterministic finalization: automatic memory management comes from the properties of the computation itself not from external preemptive events. Refcounting gives eager determinism.)
+
+When we need to process an event or perform any kind of complicated calculation, we process it using an inner road. Because the roads alternate direction, any data from an inner heap that needs to be preserved must be copied back out. The advantages, tho, are:
+
+1. The surface road is left read-only by `u3` and thus clean. Thus when snapshotting, clean pages are kept clean.
+2. An inner calculation can be aborted without affecting the surface.
+3. The surface is not fragmented because the inner results are copied in when necessary.
+
+Vere ends up in operation with nested roads during a computation:
+
+```
+ 0 ffff
+ |------- ++++++++++++| surface road
+ |~~~~~~~$+++++++++ ---------~~~~~~~~~~~~| inner road 1
+ |~~~~~~~~~~~~~~~~~--------- +++++$~~~~~~~~~~~~~~~~~~~~~| inner road 2
+ | ##&## | free memory
+
+```
+
+The Vere interpreter runs in a road, and you can check if you're on the surface road or in an inner road. (Most of the time you should just assume you're on an inner road.) This distinction matters, such as `c3_assert()`, which produces an exception with stack trace on an inner road, but kills the process on an outer road.
+
+The current road is `u3R`, a global. Within Arvo, a new road is currently begun in the following cases:
+
+* every event
+* every read from the namespace (Arvo scry gate into vane doesn't, userspace `.^` dotket does)
+* every call to `+mink` (and `+mock`, etc.)
+
+Any work that will be done where you only want to keep one portion of it is a good candidate for a road.
+
+The loom is organized into pages, each 16 KB in size. There is a guard page `&` in the middle of free memory to make sure that the stack and heap do not overwrite each other. The guard page is adjusted in `u3e_ward()` when necessary:
+
+> When a fault is detected in the guard page, the guard page is recentered in the free space of the current road. if the guard page cannot be recentered, then memory exhaustion has occurred.
+
+There is a hope to add both a raw hint to suggest to the system to use a new road and a bump-allocation mode to permit turning off refcounting on inner roads when practical. (Cf. [#6805](https://github.com/urbit/urbit/issues/6805#issuecomment-1754208392).)
+
+#### `u3a` Allocator
+
+To work with `u3` memory, use the `u3a` memory allocation functions:
+
+* `u3a_malloc()`
+* `u3a_free()`
+* `u3a_realloc()`
+
+You should never call `malloc()` in the loom (but can, of course, in the Vere layer above `u3`).
+
+> Of course, we don't always know how large our atom will be. Therefore, the standard way of building large atoms is to allocate a block of raw space with `u3i_slab_init()`, then chop off the end with `u3i_slab_malt()` (which does the measuring itself) or `u3i_slab_mint()` in case you've measured it yourself.
+
+Keep in mind that atoms do not retain leading zeros.
+
+The reference counters introduced last time, `u3a_gain()`=`u3k()` and `u3a_lose()`=`u3z()`, are also part of `u3a`. However, other than these you typically use `u3a` indirectly through `u3i` and `u3r`/`u3x`.
+
+Some details of the allocator [are in flux right now](https://github.com/urbit/urbit/issues/6805#issuecomment-1754208392): “As an experiment, \[\~master-morzod has] rewritten the serf to a) stop allocating events, effects, and IPC messages on the home road, and b) keep the Arvo kernel on an inner road for as long as possible (i.e. until we need to save/pack/meld/\&c.).”
+
+* [“Land of Nouns”, section “`u3`: the road model”](../runtime/nouns.md#u3-the-road-model)
+* [“API overview by prefix”, section “`u3a`: allocation functions”](../runtime/api.md#u3a-allocation-functions)
+
+## The King (Urth) <a href="#the-king-urth" id="the-king-urth"></a>
+
+The king process is in charge of:
+
+* IPC
+* Event log
+* Unix effects including I/O
+* Stateless Nock interpreter (the “ghost ship” ivory pill material from `ca05`)
+
+(We discussed the serf/Mars in `ca05`.)
+
+### Event Log & Snapshotting <a href="#event-log-snapshotting" id="event-log-snapshotting"></a>
+
+The event log is the ordered list of all of the Arvo events (completed moves) resulting in the present state of Arvo. Events are handled at two levels: the event log, which is written consistently at each event, and the snapshot of the loom, which allows rapid recovery of the current state.
+
+> In practice, event logs become large and unwieldy over time. Periodically a snapshot of the permanent state is taken, so the entire event log needn't be replayed on reboot. You're still able to rebuild your state down to the last keystroke. This is due to the practice of persistence.\
+> Persistence, in the context of storing data in a computer system, means that data is stored in a non-volatile manner and that input must be recorded before the output result is performed. Thus, every event must be written to disk - or must be _persisted_ - before the event effects actually take place.
+
+The snapshot of the loom allows the last few events from the event log to be replayed to recover the present state.
+
+In fact, with current usage patterns (\~2023.10.12), there's a problem:
+
+> Every few minutes, the runtime applies a patch to its on-disk snapshot. This pauses the process, so the size is important. In particular, for many large ships (\~nibset-napwyn, \~wicdev-wisryt, \~natnex-ronret), this is around 600MB. If the maximum acceptable pause is 1 second, this requires a disk which can handle 600MB/s of throughput (or maybe twice that, because it writes the patchfile, then applies it?), which is extremely high. ([#6805](https://github.com/urbit/urbit/issues/6805))
+
+#### `noun/error.h`
+
+Before we look at the particulars of the event log and snapshotting code, let's take a brief detour to see how assertion errors are handled.
+
+```c
+# define u3_assert(x) \
+ do { \
+ if (!(x)) { \
+ fflush(stderr); \
+ fprintf(stderr, "\rAssertion '%s' " \
+ "failed in %s:%d\r\n", \
+ #x, __FILE__, __LINE__); \
+ u3m_bail(c3__oops); \
+ abort(); \
+ } \
+ } while(0)
+```
+
+An error here then triggers into `u3m_bail()`, the primary crash handler.
+
+* Review `u3m_bail` in `noun/manage.c`.
+
+#### Event Log
+
+What is an event on disk? Urbit maintains an [LMDB](https://en.wikipedia.org/wiki/Lightning_Memory-Mapped_Database) transactional database for key–value pairs, with a `META` table for metadata and an `EVENTS` table for event number–event data pairs, sequentially ordered.
+
+An event is a `u3_fact`, a `struct` including the timestamp and event `ovum`:
+
+```c
+/* u3_fact: completed event
+*/
+ typedef struct _u3_fact {
+ c3_d eve_d; // event number
+ c3_l mug_l; // kernel mug after
+ u3_noun job; // (pair date ovum)
+ struct _u3_fact* nex_u; // next in queue
+ } u3_fact;
+```
+
+Event log replay thus refers to retrieving the sequence of events from the pier's database instance and playing back each sequential event.
+
+* Examine `u3_mars_play()` for details of how playback works.
+ * `_mars_play_batch()`
+ * `_mars_poke_play()`
+ * `u3v_poke_raw()` and we're back to a conventional Arvo poke as we saw in `ca05`.
+
+The introduction of epochs will enable finer-grained system recovery when necessary:
+
+> Historically, Vere has stored a single event log and snapshot. To facilitate replay across different binary versions more convenient and less error-prone, an improved design is the "epoch" system.\
+> In the epoch system, Vere breaks up the event log into "epoch"s, where an epoch represents a snapshot and some events after that snapshot.\
+> An epoch lives in its own folder, named after the first event in that epoch.\
+> In addition to storing a snapshot and a log of events, each epoch folder also stores a version file indicating which version of Vere originally ran these events -- this makes replay across different binary versions much easier, especially in the case of a jet mismatch in an old binary.
+
+#### Snapshots
+
+> Replay is how Vere computes the state of a ship's Arvo instance from the event log after a ship reboots. In order to avoid replaying the entire event log, Replay takes a snapshot of the current state of the ship approximately once every ten minutes. Then when a ship reboots, Replay loads the most recent snapshot and replays events from the event log up to the most recent event.
+
+`u3e_save()` saves the loom (snapshots), often called via `u3m_save()`.
+
+#### Demand Paging
+
+> Demand paging refers to the ability to load only needed pages of memory into RAM, leaving other pages on disk, to reduce memory use.
+
+Pages are marked as clean (`PROT_READ`) or dirty (`PROT_READ|PROT_WRITE`) or guard (`PROT_NONE`). The access pattern assumes all pages are accessed from the outside inwards (another advantage of the loom model).
+
+* [`urbit/vere` #402](https://github.com/urbit/vere/pull/402) (merged)
+* [`urbit/vere` #410](https://github.com/urbit/vere/pull/410) (merged)
+
+### Unix I/O <a href="#unix-io" id="unix-io"></a>
+
+The king is responsible for the I/O operations of the communicating vanes: Ames, Behn, Clay, Dill, Eyre, Iris, Khan, Lick. (The other two vanes, Gall and Jael, are landlocked and only interact within Urbit.)
+
+We will cover the I/O drivers in a later lesson `ca11` after we have covered the major vanes which need to interface with the host OS.
+
+### IPC <a href="#ipc" id="ipc"></a>
+
+The Urbit runtime has two categories of IPC:
+
+1. King/serf interprocess communication
+2. Vane-driven interprocess communication
+3. `%khan`/`conn.c`-based sockets
+4. `%lick`-based communications
+
+In general, [POSIX IPC](https://www.geeksforgeeks.org/inter-process-communication-ipc/) is “a mechanism that allows processes to communicate with each other and synchronize their actions.” This can be done by sharing memory directly between the processes or by passing messages. Vere does a little of both: the loom is the shared memory arena, and sometimes messages are used.
+
+For instance, in Vere's pier management (mainly `vere/pier.c`), the [lord](https://github.com/urbit/vere/blob/ea3eeee0d5efc198c279f2c916b73fc8df283af6/pkg/vere/lord.c#L313) coordinates the king and the serf through messages. Like Arvo, the king and the serf thus need to have the right API shape to connect to each other. The lord coordinates using `writ` to pass a value from the king to the serf, and `plea` to pass a value from the serf to the king. (These are rather like Arvo passes and gifts, but can be initiated from either side rather than just the pass/give pattern.) Whimsically, this is defined in Hoon inside a C comment in `vere/lord.c`:
+
+```hoon
+|%
+:: +writ: from king to serf
+::
++$ writ
+ $% $: %live
+ $% [%cram eve=@]
+ [%exit cod=@]
+ [%save eve=@]
+ [%meld ~]
+ [%pack ~]
+ == ==
+ [%peek mil=@ sam=*]
+ [%play eve=@ lit=(list ?((pair @da ovum) *))]
+ [%work mil=@ job=(pair @da ovum)]
+ ==
+:: +plea: from serf to king
+::
++$ plea
+ $% [%live ~]
+ [%ripe [pro=%1 hon=@ nok=@] eve=@ mug=@]
+ [%slog pri=@ tank]
+ [%flog cord]
+ $: %peek
+ $% [%done dat=(unit (cask))]
+ [%bail dud=goof]
+ == ==
+ $: %play
+ $% [%done mug=@]
+ [%bail eve=@ mug=@ dud=goof]
+ == ==
+ $: %work
+ $% [%done eve=@ mug=@ fec=(list ovum)]
+ [%swap eve=@ mug=@ job=(pair @da ovum) fec=(list ovum)]
+ [%bail lud=(list goof)]
+ == ==
+ ==
+--
+```
+
+The procedure for IPC needs to establish communications, which is a follow-on from the king starting the serf. (Vere is single-threaded but runs two processes.)
+
+`$writ` from king to serf:
+
+* `%live` is a request to start up the serf.
+* `%peek` is a request for data from Arvo in the serf.
+* `%play` is a request to the serf to play an event (in an event playback).
+* `%work` is a request to the serf to carry out a computation in Arvo.
+
+`$plea` from serf to king:
+
+* `%live` tells if the serf is alive.
+* `%ripe` tracks the serf startup state.
+* `%slog` is an output request.
+* `%flog` is a debug output request.
+ * `%peek` is a response to the king with a scry result.
+ * `%play` is a response to an event playback.
+ * `%work` is a response to an injected `ovum` event.
+
+Other parts of IPC depend on the atom-framing implementations in `vere/newt.c`, another key part of king–serf IPC. `newt.c` produces noun blobs that have a five-byte header and a variable-length payload. The header has a one-byte version tag, typically `0x0`, followed by a four-byte little-endian message byte count. The payload is the `+jam`med noun. This is used by `urbit eval` for `stdin` computed against the ivory pill ghost ship, for instance:
+
+```sh
+$ echo "(add 1 41)" | urbit eval
+loom: mapped 2048MB
+lite: arvo formula 2a2274c9
+lite: core 4bb376f0
+lite: final state 4bb376f0
+eval (run):
+42
+```
+
+`eval` supports several options for processing Hoon nouns as input to or output from `conn.c`:
+
+* `-j`, `--jam`: output result as a jammed noun
+* `-c`, `--cue`: read input as a jammed noun
+* `-n`, `--newt`: write output / read input as a newt-encoded jammed noun, when paired with `-j` or `-c` respectively
+* `-k`: treat the input as the jammed noun input of a `%fyrd` request to `conn.c`; if the result is a `goof`, pretty-print it to `stderr` instead of returning it
+
+In `vere/newt.c`, see particularly:
+
+* `u3_newt_send()` transmits a jammed noun (using `u3s_jam_xeno()`, for instance) to a task buffer for `libuv`. (Recall that `libuv` is the main event loop driver for the king process.)
+* `u3_newt_read()` pulls out the jammed noun from the buffer.
+
+Khan and Lick both use `newt.c`.
+
+* Demonstrate invoking a statement at the CLI with the `urbit` executable.
+
+#### Example: `|meld` Trace
+
+* Let's walk through the lifecycle of a command-line initiated `./zod/.run meld`.
+ * `vere/main.c`
+ * `noun/urth.c`
+* Compare the lifecycle of `|meld`.
+ * `/gen/hood/meld`
+ * `/lib/helm/kiln`
+ * `/lib/hood/kiln`
+ * `/sys/clay`
+ * `vere/lord.c`
+ * `vere/serf.c`
+
+We will discuss Khan and Lick in detail during the next lesson, but here's a quick recap of their functionality.
+
+#### Khan
+
+> Khan is the "control plane" and thread-runner vane. Its main purpose is to allow external applications to run [threads](../../urbit-os/base/threads/README.md) via a Unix Socket and receive the result.
+
+A [socket](https://en.wikipedia.org/wiki/Unix_domain_socket) is an endpoint for data communications. In the runtime, it is implemented by [`conn.c`](https://github.com/urbit/vere/blob/develop/pkg/vere/io/conn.c), the runtime counterpart to `%khan`.
+
+* [`%khan` Overview](../../urbit-os/kernel/khan/README.md)
+
+#### Lick
+
+> Lick manages IPC ports, and the communication between Urbit applications and POSIX applications via these ports. Other vanes and applications ask Lick to open an IPC port, notify it when something is connected or disconnected, and transfer data between itself and the Unix application.
+
+* [`%lick` Overview](../../urbit-os/kernel/lick/README.md)
+
+## Debugging the Runtime <a href="#debugging-the-runtime" id="debugging-the-runtime"></a>
+
+To conclude today's material, we would like to briefly demonstrate several debugging principles with VereAres.
+
+### `printf` <a href="#printf" id="printf"></a>
+
+`fprintf`-based output should be done using `fprintf()` to `stderr`. Use both and to achieve line feed (move cursor down one line) and carriage return (move it to the left). You can also use `u3l_log` which does not require `\r`, but should not be used in cases where the I/O drivers have not yet been initialized or can no longer be relied upon, e.g. crashing or shutdown.
+
+### `gdb` <a href="#gdb" id="gdb"></a>
+
+> For C, make heavy use of `gdb`. `lldb` is far worse than `gdb` for debugging Urbit, so it's worth developing on a Linux box even if that means `ssh`ing into a server. (\~wicdev-wisryt)
+
+`gdb` works best when you build with debugging symbols.
+
+```sh
+bazel build :urbit --compilation_mode=dbg
+
+- OR -
+
+bazel build :urbit --copt=-DU3_CPU_DEBUG
+
+- OR -
+
+bazel build :urbit --copt=-DU3_MEM_DEBUG
+```
+
+When using GDB, before attaching to the process you should set the following:
+
+```gdb
+set follow-fork-mode child
+handle SIGSEGV nostop noprint
+```
+
+If you are debugging jets or the serf, then you want to attach to the serf at `urbit-worker`.
+
+```sh
+gdb --args ./bazel-bin/pkg/vere/urbit --args zod
+
+- OR -
+
+gdb attach <PID>
+```
+
+```gdb
+break jet_file:jet_name
+```
+
+### Valgrind <a href="#valgrind" id="valgrind"></a>
+
+[Valgrind](https://valgrind.org/) is a memory profiling tool for diagnosing memory usage and memory leaks. As a memory tracking tool, Valgrind uses several times more memory than the native application requires.
+
+The possible expedient to make it viable is:
+
+* Decrease the loom size with `--loom` (e.g. `--loom 29` or something similarly constrained).
+
+[`urbit/urbit` #5161](https://github.com/urbit/urbit/issues/5161) has some discussion about using Valgrind. (The memory leak in question was ultimately resolved by [#5614](https://github.com/urbit/urbit/pull/5614), which presents an instructive insight.)
+
+### Changing the Serf <a href="#changing-the-serf" id="changing-the-serf"></a>
+
+The process call to invoke the serf is hard-coded in the king. To run Sword (née Ares) as a serf, for instance, you need to change the `protocol` entry in `vere/lord.c:u3_lord_init()` to the Sword executable as a full-path string literal.
+
+## Exercises <a href="#exercises" id="exercises"></a>
+
+* Turn on `LORD_TRACE_CUE` and boot a new fake ship.
+* Examine the new `/ted/runtime-version` thread which is in the latest release (412 K). How does it work? Trace the types and values back through to see how the values are recovered from the runtime.