diff options
Diffstat (limited to 'docs/core-academy/ca05.md')
-rw-r--r-- | docs/core-academy/ca05.md | 287 |
1 files changed, 287 insertions, 0 deletions
diff --git a/docs/core-academy/ca05.md b/docs/core-academy/ca05.md new file mode 100644 index 0000000..2a0c9ba --- /dev/null +++ b/docs/core-academy/ca05.md @@ -0,0 +1,287 @@ +--- +description: "Core Academy lesson on Vere runtime architecture covering the u3 noun system, king/serf processes, the Mars/Urth split, event log management, jet dashboard, and the Arvo/Vere interface." +layout: + title: + visible: true + description: + visible: false + tableOfContents: + visible: true + outline: + visible: true + pagination: + visible: true +--- + +# 6. Vere I: u3 and the Serf + +*This lesson covers the Arvo/Vere interface, including Arvo's `+peek` and `+poke` arms. We also examine the Mars/Urth split, the event log, snapshots, and the jet dashboard.* + +_We need some C background to fluently read the runtime source. If you are not familiar with C, here are some good resources:_ + +- [Beginner C resource list](https://stackoverflow.com/questions/562303/the-definitive-c-book-guide-and-list) +- [Brian Kernighan & Dennis Richie, _The C Programming Language_](https://en.wikipedia.org/wiki/The_C_Programming_Language) +- [Ben Klemens, _21st Century C: Tips from the New School_](https://www.amazon.com/dp/1449327141) +- [Lawrence Angrave, _System Programming](https://github.com/angrave/SystemProgramming/wiki) + + +## Runtime Structure and Responsibilities {#runtime-structure-and-responsibilities} + +If the vision of Urbit is to implement `[2 [0 3] 0 2]` as a frozen lifecycle function, then it needs some scaffolding on any real system. Real computers have memories, chipset architectures, operating system conventions, and other affordances (and limitations). Conventionally, an operating system takes explicit care of such niceties, which is one reason why calling Urbit an “OS” has been controversial. The runtime interpreter is designed to take Nock as a specification and produce a practical computer of it. + +Today, there are two primary Nock executable runtimes: Vere and Sword (née Ares). (Jaque, the JVM experiment, and King Haskell have fallen by the wayside.) + +- Vere is written in C and is the standard runtime for Arvo. +- Sword is written in Rust and aims to solve some deep theoretical challenges to producing efficient Nock on a contemporary chipset. Sword is under development by [Zorp](https://zorp.io) and formerly with contributions from [Tlon Corporation](https://tlon.io), and the [Urbit Foundation](https://urbit.org). + +We will take Vere as the normative runtime for Core Academy. + +As we mentioned last time in the boot sequence lesson, the runtime spawns the king (`king.c`) and indirectly the serf (`serf.c`) processes. These both run for the lifetime of the Urbit process. + +There are two competing frames for how to structure the Urbit process: king/serf and urth/mars. + +King v. serf separates the Nock and Arvo material from the I/O and event log material. It has the advantage that (per the whitepaper), “The serf only ever talks to the king, while the king talks with both the serf and Unix.” + +The king process is in charge of: + +- IPC +- Event log +- Unix effects including I/O +- Stateless Nock interpreter + +The serf process is the Nock runtime and bears responsible for: + +- Nock virtual machine (tracking current state of Arvo as a noun and `+poke`ing it with nouns) +- Bytecode interpretation +- Jet dashboard +- Snapshotting +- Noun allocation for Arvo + +The Mars/Urth split reframes the worker process so that it includes the event log with the current serf responsibility (“Mars”), thus enabling online event log management and truncation. + +### The Structure of Vere’s Source {#the-structure-of-veres-source} + +Vere is provided in the [`urbit/vere`](https://github.com/urbit/vere) repo. It is built from the `pkg/` directory and contains the following top-level folders: + +``` +. +├── c3 +├── ent +├── noun +├── ur +├── urcrypt +└── vere +``` + +- `/c3` contains the types and definitions to enable [the `c3` logical system](/reference/runtime/c). + + `c3` is the set of C conventions which Vere enforces. These include well-specified integer types, tooling for loobeans (instead of booleans), and motes (`#define`s for short Urbit words). “The C3 style uses Hoon style TLV variable names, with a quasi Hungarian syntax.” There are no Urbit-specific requirements for C3, which could otherwise just be a general-purpose C discipline. + + Like aura bitwidth markers, C documents programmer intent but does not generally enforce it. Most of the parts of `c3` are simply lapidary terms for C99 types. + + - Scan the files in `/c3`. + +- `/ent` provides entropy for the runtime. Entropy is derived from [`/dev/urandom`](https://en.wikipedia.org/wiki//dev/random), which is a special file that provides pseudorandom numbers derived from system noise. `/dev/urandom` produces [machine randomness as close to true randomness as possible](https://www.2uo.de/myths-about-urandom/#before-linux-48), including seeds like network latency and keystroke latency to seed the cryptographically secure pseudo-random number generator (CSPRNG). + +- `/noun` is the gorilla, containing `u3` (the noun library) and the jets. We'll go into it in detail with the system architecture in a moment in Section `u3`. + +- `/ur`, is like `/ent` a single-purpose library, in this case for bitstreams and serialization. + +- `/urcrypt` is a C library to standardize cryptographic calls across a number of libraries. + + This library is a dependency for both Vere and Ares, and is in the process of being moved into a standalone repo. + +- `/vere` contains the runtime architecture itself, the king and the serf and related tooling, as independent from `u3`. + +| file | purpose | +| -- | -- | +| `auto.c` | I/O drivers | +| `benchmarks.c` | performance tests | +| `dawn.c` | key validation for bootstrapping | +| `disk.c` | database reads and writes for event log | +| `foil.c` | file synching | +| `king.c` | main runtime loop | +| `lord.c` | manage IPC between king and serf | +| `main.c` | setup and entrypoint for runtime execution | +| `mars.c` | Mars event log replay (see Mars/Urth split above) | +| `newt.c` | noun blob messages | +| `pier.c` | manage pier (files on host OS disk) | +| `save.c` | save events to pier | +| `serf.c` | the serf itself | +| `time.c` | Unix/Urbit time operations | +| `vere.h` | shared Vere-specific `struct`s | +| `ward.c` | lifecycle management for structures | + +#### `u3` + +##### Nouns + +A noun is either an atom or a cell. However, we have to decide what this implementation looks like in a language like C, that prefers arrays and pointers. `u3` is the noun library, which features Urbit-specific memory operations, tracing and profiling tools, and so forth. + +A `u3_noun` is a 32-bit `c3_w` = `uint32_t`. The first bits indicate what kind of value the noun is and thus how to approach it: + +| Bit 31 | Bit 30 | Meaning | +| -- | -- | -- | +| `1` | `1` | Indirect cell (`pom`) | +| `1` | `0` | Indirect atom (`pug`) | +| `0` | ·— | Direct atom (`cat`) | + +An indirect noun is a `dog`. For indirect nouns, bits 29–0 are a word pointer into the loom. In addition, `0xffff.ffff` is `u3_none`, which is “not a noun”. + +A common pattern is to extract values from a noun into C-typed values, carry out the manipulation, and then wrap them back into the noun. Furthermore, the value from an arbitrary atom may in fact be a bignum, and so [GMP](https://en.wikipedia.org/wiki/GNU_Multiple_Precision_Arithmetic_Library) is used to manage these values. + +- Examine `/noun/jets/a/add.c`, in particular `u3qa_add`. + +One of the painful parts of working with `u3` is the reference counting system. [Reference counting](https://en.wikipedia.org/wiki/Reference_counting) is an expedient to handle tracking the number of pointers to an object in memory so that the memory can be freed at the appropriate time. Since C doesn't provide reference counting support in the language, we must manually track these and free the value only when the refcount goes to zero. The relevant functions are `u3k` to gain a refcount and `u3z` to lose one. + +There are also two different protocols for reference counting, used by different parts of the system: + +- `transfer` semantics relinquishes a refcount of any sent values. Most functions behave this way, which means that you don't have to think about de-allocating values if they've been sent elsewhere. +- `retain` semantics hold onto the refcount even if the value is sent elsewhere. The functions which use `retain` semantics tend to inspect or query nouns rather than make or modify nouns. + +> The `u3` convention is that, unless otherwise specified, **all functions have transfer semantics** - with the exception of the prefixes: `u3r`, `u3x`, `u3z`, `u3q` and `u3w`. Also, within jet directories `a` through `f` (but not `g`), internal functions retain (for historical reasons). + +- Compare `u3ka_add` and `u3qa_add`. + +`u3` is designed to make some guarantees for the programmer. It's not Urbit itself, but it's designed to be an implementation platform for Urbit. Thus: + +- Every event is logged internally before it enters `u3`. +- A permanent state noun maintains a single reference. +- Any event can be aborted without damaging the permanent state (“solid state”). +- We snapshot the permanent state and can prune logs. + +We will discuss the specifics of the memory model next week in `ca06` when we discuss the loom and the road model. + +- [“Land of Nouns”](/reference/runtime/nouns); note particularly the section `u3: reference protocols`, labeled ``THIS IS THE MOST CRITICAL SECTION IN THE `u3` DOCUMENTATION.`` Read that if nothing else. + +##### Library + +The contents of `/noun` constitute the `u3` noun library. Functions are organized by file and prefix into certain namespaces by operation. Because `u3` is a library, we can't cleanly separate it into serf/king components, although certain modules do have close identification with one or the other. + +| prefix | purpose | `.h` | `.c` | +| -- | -- | -- | -- | +| `u3a_` | allocation | [`allocate.h`](https://github.com/urbit/vere/blob/develop/pkg/noun/allocate.h) | [`allocate.c`](https://github.com/urbit/vere/blob/develop/pkg/noun/allocate.c) | +| `u3e_` | persistence | [`events.h`](https://github.com/urbit/vere/blob/develop/pkg/noun/events.h) | [`events.c`](https://github.com/urbit/vere/blob/develop/pkg/noun/events.c) | +| `u3h_` | hashtables | [`hashtable.h`](https://github.com/urbit/vere/blob/develop/pkg/noun/hashtable.h) | [`hashtable.c`](https://github.com/urbit/vere/blob/develop/pkg/noun/hashtable.c) | +| `u3i_` | noun construction | [`imprison.h`](https://github.com/urbit/vere/blob/develop/pkg/noun/imprison.h) | [`imprison.c`](https://github.com/urbit/vere/blob/develop/pkg/noun/imprison.c) | +| `u3j_` | jet control | [`jets.h`](https://github.com/urbit/vere/blob/develop/pkg/noun/jets.h) | [`jets.c`](https://github.com/urbit/vere/blob/develop/pkg/noun/jets.c) | +| `u3l_` | logging | [`log.h`](https://github.com/urbit/vere/blob/develop/pkg/noun/log.h) | [`log.c`](https://github.com/urbit/vere/blob/develop/pkg/noun/log.c) | +| `u3m_` | system management | [`manage.h`](https://github.com/urbit/vere/blob/develop/pkg/noun/manage.h) | [`manage.c`](https://github.com/urbit/vere/blob/develop/pkg/noun/manage.c) | +| `u3n_` | nock computation | [`nock.h`](https://github.com/urbit/vere/blob/develop/pkg/noun/nock.h) | [`nock.c`](https://github.com/urbit/vere/blob/develop/pkg/noun/nock.c) | +| `u3o_` | command-line options | [`options.h`](https://github.com/urbit/vere/blob/develop/pkg/noun/options.h) | [`options.c`](https://github.com/urbit/vere/blob/develop/pkg/noun/options.c) | +| `u3r_` | noun access (error returns) | [`retrieve.h`](https://github.com/urbit/vere/blob/develop/pkg/noun/retrieve.h) | [`retrieve.c`](https://github.com/urbit/vere/blob/develop/pkg/noun/retrieve.c) | +| `u3s_` | noun serialization | [`serial.h`](https://github.com/urbit/vere/blob/develop/pkg/noun/serial.h) | [`serial.c`](https://github.com/urbit/vere/blob/develop/pkg/noun/serial.c) | +| `u3t_` | profiling | [`trace.h`](https://github.com/urbit/vere/blob/develop/pkg/noun/trace.h) | [`trace.c`](https://github.com/urbit/vere/blob/develop/pkg/noun/trace.c) | +| `u3u_` | urth (memory management) | [`urth.h`](https://github.com/urbit/vere/blob/develop/pkg/noun/urth.h) | [`urth.c`](https://github.com/urbit/vere/blob/develop/pkg/noun/urth.c) | +| `u3v_` | arvo | [`vortex.h`](https://github.com/urbit/vere/blob/develop/pkg/noun/vortex.h) | [`vortex.c`](https://github.com/urbit/vere/blob/develop/pkg/noun/vortex.c) | +| `u3x_` | noun access (error crashes) | [`xtract.h`](https://github.com/urbit/vere/blob/develop/pkg/noun/xtract.h) | [`xtract.c`](https://github.com/urbit/vere/blob/develop/pkg/noun/xtract.c) | +| `u3z_` | memoization | [`zave.h`](https://github.com/urbit/vere/blob/develop/pkg/noun/zave.h) | [`zave.c`](https://github.com/urbit/vere/blob/develop/pkg/noun/zave.c) | +| `u3k[a-g]` | jets (transfer, C args) | `jets/k.h` | `jets/[a-g]/*.c` | +| `u3q[a-g]` | jets (retain, C args) | `jets/q.h` | `jets/[a-g]/*.c` | +| `u3w[a-g]` | jets (retain, nock core) | `jets/w.h` | `jets/[a-g]/*.c` | + +- `u3a` defines memory allocation functions. These are used throughout, but we'll discuss it a bit more when we talk about the king. You will quickly run into reference counting features, like `u3k` (`u3a_gain()`) to gain a refcount and `u3z` (`u3a_lose()`) to lose one. +- `u3e` manages the loom. +- `u3h` provides fast custom hashing for the runtime. +- `u3i` puts a value (expected to be a `c3` type) into a noun. (Look at this one now.) +- `u3l` supports logging. +- `u3m` manages the system: boots `u3`, makes a pier, handles crashes, etc. +- `u3n` implements the Nock bytecode interpreter. +- `u3o` parses the manifold command-line options of Urbit and writes them into globals. +- `u3r` extracts a value from a noun, with a `u3_weak` on failure. (Look at this one now.) +- `u3s` implements noun serialization (`+jam` and `+cue`). +- `u3t` provides tracing for crashes. +- `u3u` offers memory management tooling (deduplication and memory mapping). +- `u3v` supports Arvo interaction. +- `u3x` extracts a value from a noun., with a crash on failure +- `u3z` supports `~+` siglus rune memoization. + +If you work much in Vere, you will get used to seeing these. There are basically two broad categories of functions: single-use functions (like starting a pier, `u3m_pier`) and utility functions (like writing a value to a noun, `u3i_word`). + +- Return to `/noun/jets/a/add.c` and look at `u3wa_add` and `u3ka_add`. + + +- [“C Runtime System”](/reference/runtime/runtime) +- [“Land of Nouns”](/reference/runtime/nouns) + + +## The Serf {#the-serf} + +The serf process is the Nock runtime and bears responsible for: + +- Nock virtual machine (tracking current state of Arvo as a noun and `+poke`ing it with nouns) +- Bytecode interpretation +- Jet dashboard + +If you examine `/vere/serf.c`, you can get a feel for how it is organized. See e.g. `u3_serf_work` and callees. + +### Arvo Noun Management {#arvo-noun-management} + +- `/vere/vortex.c`, e.g. `u3v_peek`, `u3v_wish`, and `u3v_poke_sure`. + +### Nock Bytecode Interpreter (`u3n`) {#nock-bytecode-interpreter-u3n} + +- `/noun/nock.c`, e.g. `u3n_nock_on`, `u3n_slam_on` (calling convention for gates). + +The end result of the Hoon compilation process is Nock code as a noun. This noun is evaluated by the runtime, but it is not actually directly run as such. Instead, the runtime builds an efficient bytecode stream and executes that instead to complete the calculation. + +The Nock bytecode for any expression can be obtained using the `%xray` raw hint. + +```hoon +> ~> %xray =+(2 [- +(-)]) +{[litb 2] snol head swap head bump ault halt} +[2 3] + +> ~> %xray =+(2 [(add - -) +(-)]) +{[litb 2] snol [fask 4095] [kicb 1] snoc head swap [fabk 6] swap [fabk 6] auto musm [kicb 0] swap head bump ault halt} +[4 3] +``` + +The Nock bytecode is defined in the `OPCODES` macro in `/noun/nock.c` and evaluated by `_n_burn` in that same folder. The `OPCODES` `#define` uses the [X macro](https://en.wikipedia.org/wiki/X_Macro), which is a bit of C deep lore. + +As a consequence of the architecture of Vere today, we see a lot of expensive call overhead. For instance, when you wrap an `%xray` hint around a core, you don't get the core itself—instead you get the formula that invokes the code. + +```hoon +> ~> %xray (met 3 (jam .)) +{[fask 1023] [kicb 3] snol head swap tail [lilb 3] swap tail [fask 1023] [kicb 2] snol head swap tail musm [kicb 1] auto musm [ticb 0] halt} +984.339 +``` + +Since many things are computed in virtual Nock, `+mock`, we have bail/trace/bounded computation at the price of slow virtualization. + +One objective of Sword (née Ares), subject knowledge analysis, is to improve on Nock bytecode generation. This is being implemented into Vere as well. + +### Jet Dashboard (`u3j`) {#jet-dashboard-u3j} + +As we summarized when first introducing jets in `ca00`, the runtime manages jets, including re-running them when playing back the event log history. + +> The jet dashboard is the system in the runtime that registers, validates, and runs jets: specific pieces of Nock code reimplemented in C for performance. + +The jet dashboard maintains three jet state systems: + +1. `cold` state results from the logical execution history of the pier and consists of nouns. `cold` jet state registers jets as they are found. `cold` state ignore restarts. +2. `hot` state is the global jet dashboard and describes the actual set of jets loaded into the pier for the current running process. Calls to `hot` state result from Nock Nine invocations of a core and an axis. `hot` state is thus tied to process restart. +3. `warm` lists dependencies between `cold` and `hot` state. `warm` state can be cleared at any time and is cleared on restart. + +The jet dashboard (`u3j`, `/noun/jets.c`) will not be explored in detail in Core Academy, but we do want to look at a couple of actual jets. + +- [“API Overview by Prefix: `u3j` jets”](/reference/runtime/api#u3j-jets) +- [~timluc-miptev, “Jets in the Urbit Runtime”](https://github.com/timlucmiptev/docs-runtime/blob/master/jets1_current.md) + +### Jets {#jets} + +- Examine `/noun/jets/b/lent.c`, `/noun/jets/b/turn.c`, `/noun/jets/c/turn.c`, `/noun/jets/e/rs.c`, `/noun/jets/e/slaw.c`. + +Many Urbit contributors may find jet composition to be their first serious encounter with the runtime. On the bright side, jetting is a fairly constrained and well-understood space. However, it has a complex interface for unpacking calls and nouns, including reference counting requirements. + +- `u3w` functions are the main entry point (as identified in `/noun/tree.c`). These unpack and sanity-check the sample, then call either `u3q` or `u3k` variants of the jet. The unpacking axes are hard-coded in `/noun/xtract.h`. +- By convention, `u3q` and `u3w` functions have `transfer` semantics. +- `u3k` functions have `retain` semantics, so they are responsible to `u3z` free their values after the computation completes. +- `u3_none` (`0x7fff.ffff`) is NOT the same as `u3_nul`. A jet that returns `u3_none` punts the value back to the Hoon/Nock version. + +- [“Writing Jets”](/reference/runtime/jetting) +- [~timluc-miptev, “Jets in the Urbit Runtime”](https://github.com/timlucmiptev/docs-runtime/blob/master/jets1_current.md) + +### Snapshotting {#snapshotting} + +We'll cover snapshotting in the next lesson, `ca06`. |