diff options
Diffstat (limited to 'docs/core-academy')
-rw-r--r-- | docs/core-academy/README.md | 30 | ||||
-rw-r--r-- | docs/core-academy/ca00.md | 812 | ||||
-rw-r--r-- | docs/core-academy/ca01.md | 995 | ||||
-rw-r--r-- | docs/core-academy/ca02.md | 2585 | ||||
-rw-r--r-- | docs/core-academy/ca03.md | 525 | ||||
-rw-r--r-- | docs/core-academy/ca04.md | 626 | ||||
-rw-r--r-- | docs/core-academy/ca05.md | 287 | ||||
-rw-r--r-- | docs/core-academy/ca06.md | 418 | ||||
-rw-r--r-- | docs/core-academy/ca07.md | 567 | ||||
-rw-r--r-- | docs/core-academy/ca08.md | 626 | ||||
-rw-r--r-- | docs/core-academy/ca09.md | 502 | ||||
-rw-r--r-- | docs/core-academy/ca10.md | 1107 | ||||
-rw-r--r-- | docs/core-academy/ca11.md | 657 | ||||
-rw-r--r-- | docs/core-academy/ca12.md | 454 | ||||
-rw-r--r-- | docs/core-academy/ca13.md | 525 |
15 files changed, 10716 insertions, 0 deletions
diff --git a/docs/core-academy/README.md b/docs/core-academy/README.md new file mode 100644 index 0000000..a78aabf --- /dev/null +++ b/docs/core-academy/README.md @@ -0,0 +1,30 @@ +--- +description: "Course for developers who have finished Hoon School to learn Arvo kernel development, covering each part of Arvo, Zuse, Hoon, and the runtime for core Urbit OS development." +layout: + title: + visible: true + description: + visible: false + tableOfContents: + visible: true + outline: + visible: true + pagination: + visible: true +--- + +# Core Academy + +Core Academy prepares [Hoon](../../hoon/why-hoon.md) developers to become [Arvo](../../urbit-os/what-is-urbit-os.md) kernel developers. While only [Hoon School](../hoon-school/) is strictly necessary to follow this material, readers should be capable of writing fairly complex applications in Hoon. Some knowledge of C and/or Rust is also necessary for some of the runtime material. + +This written course is adapted from the teaching notes for the 2023 cohort of Core Academy, its first and to-date only live cohort. The course explores each part of Arvo, Zuse, Hoon, and the runtime to contextualize how Urbit works today and where further development is needed. + +_The lesson notes represent a snapshot of Urbit core development at Zuse 412k and Vere 2.12. Some details have changed since that release._ + +Video recordings of the 2023 Core Academy are [available to watch here](https://www.youtube.com/playlist?list=PLYGEMSwLguIGgrEGwxu2AAbESpfF_LRKx). The order in which vanes are covered differs between this written material and the recorded lectures, but aside from some Q\&A in the lectures it's all the same information. + +## Preliminary material <a href="#preliminary-material" id="preliminary-material"></a> + +* [Hoon School, Text Parsing III](../hoon-school/Q2-parsing.md) +* [Hoon School, Generic and Variant Cores](../hoon-school/R-metals.md) +* [The Engine Pattern](../../hoon/engine-pattern.md) diff --git a/docs/core-academy/ca00.md b/docs/core-academy/ca00.md new file mode 100644 index 0000000..a11054e --- /dev/null +++ b/docs/core-academy/ca00.md @@ -0,0 +1,812 @@ +--- +description: "Core Academy lesson on Nock evaluation covering all Nock rules 0-12, binary tree addressing, hints and jetting, scry operations, virtualized Nock with +mock, and the Urbit lifecycle function." +layout: + title: + visible: true + description: + visible: false + tableOfContents: + visible: true + outline: + visible: true + pagination: + visible: true +--- + +# 1. Evaluating Nock + +_This lesson explains each Nock rule and how it relates to Hoon expressions. We cover hints (Nock 11), jetting, and how Nock 12 works to effect a scry._ + +Urbit is a machine for writing and evaluating Nock code. Much of the characteristic nature of Hoon derives directly from the affordances of Nock. Perhaps the most fundamental design decision undergirding Nock is that Nock orients the world around binary trees rather than linear arrays. + +``` +Nock 4K + +A noun is an atom or a cell. An atom is a natural number. A cell is an ordered pair of nouns. + +Reduce by the first matching pattern; variables match any noun. + +nock(a) *a +[a b c] [a [b c]] + +?[a b] 0 +?a 1 ++[a b] +[a b] ++a 1 + a +=[a a] 0 +=[a b] 1 + +/[1 a] a +/[2 a b] a +/[3 a b] b +/[(a + a) b] /[2 /[a b]] +/[(a + a + 1) b] /[3 /[a b]] +/a /a + +#[1 a b] a +#[(a + a) b c] #[a [b /[(a + a + 1) c]] c] +#[(a + a + 1) b c] #[a [/[(a + a) c] b] c] +#a #a + +*[a [b c] d] [*[a b c] *[a d]] + +*[a 0 b] /[b a] +*[a 1 b] b +*[a 2 b c] *[*[a b] *[a c]] +*[a 3 b] ?*[a b] +*[a 4 b] +*[a b] +*[a 5 b c] =[*[a b] *[a c]] + +*[a 6 b c d] *[a *[[c d] 0 *[[2 3] 0 *[a 4 4 b]]]] +*[a 7 b c] *[*[a b] c] +*[a 8 b c] *[[*[a b] a] c] +*[a 9 b c] *[*[a c] 2 [0 1] 0 b] +*[a 10 [b c] d] #[b *[a c] *[a d]] + +*[a 11 [b c] d] *[[*[a c] *[a d]] 0 3] +*[a 11 b c] *[a c] + +*a *a +``` + +(To avoid confusion in Nock versions versus Nock rules, this lesson will refer to Nock rules with a written number, e.g. “Nock Four” refers to `*[a 4 b]` rather than to Nock 4K.) + +Many aspects of Hoon's nature and Urbit's operating model are natural consequences of Nock's structure and preferences. Hoon acts as a sort of (very glorified) macro assembler over Nock, providing runes and idioms for Nock patterns and a type system which reduces to Nock. However, it is possible to do things in pure Nock which Hoon restricts; this is one motivation for vase mode. + +In this lesson, we will begin our study of the Urbit kernel by looking at the wellsprings of Urbit. Nock has evolved substantially; the earliest public record is Nock 13K. However, as of Nock 4K no further revisions are expected to be necessary and the system is very cold. + +For clarity, this document will refer to rules with written-out numbers (Nock Four) and versions with Arabic digits (Nock 4K). It will also prefer the shorthand `s` for subject (instead of `a`) and `f`, `g`, etc. for formula (instead of `b`, `c`, etc.). + +* [Nock Definition](../../nock/definition.md) + +## The Nock Preface <a href="#the-nock-preface" id="the-nock-preface"></a> + +The Nock specification needs to lay some interpretive ground before the rules can be defined. These definitions allow us to settle on an unambiguous interpretation of the rules. Importantly, most of the syntax defined here is unique to Nock and does not bear the same meaning in Hoon, with the exception of `[` sel and `]` ser as bounding cells. + +``` +nock(a) *a +[a b c] [a [b c]] + +?[a b] 0 +?a 1 ++[a b] +[a b] ++a 1 + a +=[a a] 0 +=[a b] 1 + +/[1 a] a +/[2 a b] a +/[3 a b] b +/[(a + a) b] /[2 /[a b]] +/[(a + a + 1) b] /[3 /[a b]] +/a /a + +#[1 a b] a +#[(a + a) b c] #[a [b /[(a + a + 1) c]] c] +#[(a + a + 1) b c] #[a [/[(a + a) c] b] c] +#a #a + +*[a [b c] d] [*[a b c] *[a d]] +``` + +### Evaluation and Structure <a href="#evaluation-and-structure" id="evaluation-and-structure"></a> + +#### `nock(a)` → `*a` + +Nock is an evaluated language, and we write the evaluation of this function as `*`. `*` represents an attempt to systematically apply the rules of Nock in an inside-out fashion to the given noun. `*` runs on a valid `[argument function]` pair; i.e. the `function` is the `formula` and the `argument` is the `subject`. + +It is perhaps an oddity of `*` that it “spins forever” instead of merely returning the noun on evaluation. + +#### `[a b c]` → `[a [b c]]` + +For convenience, the pretty-printer elides rightward-branching tuples. + +#### `?[a b]` → `0` + +#### `?a` → `1` + +This represents the ability to tell a cell from an atom. At this point in the preface we actually don't know yet if `0` is `TRUE` or `FALSE`, merely that they are distinct. + +#### `+[a b]` → `+[a b]` + +#### `+a` → `1 + a` + +The increment operation is not distributed across cell entries. (However, note that if `*` evaluation leads to the operated-upon cell reducing to an atom then this will proceed.) + +#### `=[a a]` → `0` + +#### `=[a b]` → `1` + +Now we can compare equality with the `=` operator that accepts a cell. (No specification of behavior applied to an atom is made here, but Nock is presumptively crash-only.) + +#### `/[1 a]` → `a` + +#### `/[2 a b]` → `/[2 [a b]]` → a + +#### `/[3 a b]` → `/[3 [a b]]` → `b` + +These three statements define the basic approach to binary tree addressing: namely that there is a root of the tree at `1`, a head at `2`, and a tail at `3`. + +#### `/[(a + a) b]` → `/[2 /[a b]]` + +#### `/[(a + a + 1) b]` → `/[3 /[a b]]` + +So what do these do? These define the subsequent layers of the tree recursively. Thus the head (`2`) of the cell at `a` has the address 2×`a`, and the tail (`3`) of the cell at `a` has the address 2×`a`+1. + +#### `#[1 a b]` → a + +#### `#[(a + a) b c]` → `#[a [b /[(a + a + 1) c]] c]` + +#### `#[(a + a + 1) b c]` → `#[a [/[(a + a) c] b] c]` + +These statements define what it means to replace part of one noun with another noun. “Replace addresss `x` of `z` with `y`.” They require a similar sort of tree definition to the slot addressing. + +The last definition, distribution, is reserved for later discussion. + +## The Simple Rules: Nock Zero to Five <a href="#the-simple-rules-nock-zero-to-five" id="the-simple-rules-nock-zero-to-five"></a> + +Nock is separated into simple rules which span its Turing-complete expressibility and compound rules which give it more terseness. The simple Nock rules span Nock Zero to Nock Five: + +``` +*[a 0 b] /[b a] +*[a 1 b] b +*[a 2 b c] *[*[a b] *[a c]] +*[a 3 b] ?*[a b] +*[a 4 b] +*[a b] +*[a 5 b c] =[*[a b] *[a c]] +``` + +A good entry point to the study of these basic Nock rules can be made through their Hoon equivalents. Hoon directly exposes several Nock rules as the [`.` dot](../../hoon/rune/dot.md) runes. + +| Nock Rule | Nock Expression | Hoon Equivalent | Comment | +| ---------- | ------------------ | --------------- | -------------------------- | +| Nock Zero | `/[b a]` | (some wings) | Address at slot | +| Nock One | `b` | (any data) | Constant | +| Nock Two | `*[*[a b] *[a c]]` | `.*` dottar | Evaluate | +| Nock Three | `?*[a b]` | `.?` dotwut | Distinguish cell from atom | +| Nock Four | `+*[a b]` | `.+` dotlus | Increment | +| Nock Five | `=[*[a b] *[a c]]` | `.=` dottis | Test for equality | + +The following paragraphs summarize these Nock rules. Two runes and one gate will be particularly useful as we analyze Nock: + +* We will evaluate Nock statements in Hoon using [`.*` dottar](../../hoon/rune/dot.md#dottar) or Nock Two. +* We can produce the Nock equivalent of a Hoon statement using [`!=` zaptis](../../hoon/rune/zap.md#zaptis), which takes a single `hoon` for its sample. +* We can run a virtualized Nock (with crash data) in [`+mock`](../../hoon/stdlib/4n.md#mock). An optional hint handler can be supplied but we will typically ignore it. + +### Nock Zero: Slot <a href="#nock-zero-slot" id="nock-zero-slot"></a> + +#### `[s 0 f]` → `/[f s]` + +Nock Zero locates a noun at address `f` in subject `s`. + +```nock + . + / \ + / \ + / \ + . . + / \ / \ +1 2 3 4 + +[[1 2] [3 4]] + +> .*([[1 2] [3 4]] [0 2]) +[1 2] + +> .*([[1 2] [3 4]] [0 4]) +1 + +> .*([[1 2] [3 4]] [0 8]) +dojo: hoon expression failed + +> (mock [[[1 2] [3 4]] 0 7] ,~) +[%0 p=4] + +> (mock [[[1 2] [3 4]] 0 8] ,~) +[%2 p=~] +``` + +* Lark notation +* Wing search path +* more on these in `csl02` + +It is common to see `[… 0 6]` in Nock expressions generated from Hoon. What is at address 6 that we'd like to grab? + +`..arm` is a Nock Zero lookup, for instance. + +```hoon +> !=(..add) +[0 2.047] +``` + +One of the perennial controversies around Nock has arisen from Nock Zero's binary tree nature. From a processor standpoint, arbitrary data lookup does not take advantage of cache behavior. However, Nock bytecode and subject knowledge analysis (SKA) both work to mitigate this situation. + +### Nock One: Constant <a href="#nock-one-constant" id="nock-one-constant"></a> + +#### `*[s 1 f]` → `f` + +Nock One simply yields its formula as a noun. + +For instance, a numeric value in Nock can refer to a rule, a slot address, or a numeric value. Nock One would clarify that we intend the latter, a numeric constant. + +```hoon + . + / \ + / \ + / \ + . . + / \ / \ +1 2 3 4 + +[[1 2] [3 4]] + +> .*([[1 2] [3 4]] [1 7]) +7 + +> .*([[1 2] [3 4]] [1 [7 8 9]]) +[7 8 9] +``` + +### Nock Two: Evaluate <a href="#nock-two-evaluate" id="nock-two-evaluate"></a> + +#### `*[s 2 f g]` → `*[*[s f] *[s g]]` + +Nock Two effectively modifies the subject against which a second formula is evaluated. Run formula `f` against subject `s` to yield noun `sf`. Run formula `g` against subject `s` to yield noun `sg`. Run formula `sg` against subject `sf`. + +Nock Two is useful when constructing a formula explicitly, but most of the time you are actually just fetching a formula from your context instead—thus Nock Nine, when we arrive to it. (Thus we seem to use bare Nock Two in practice less frequently than one may at first expect.) + +For now, let's build a simple example using only rules Zero and One. + +```hoon + . + / \ +1 2 + +[1 2] + +> .*([1 2] [2 [0 2] [1 [0 1]]]) +1 +``` + +* First, the formula `.*([1 2] [0 2])` yields the single atom `1`. +* Then the second formula `.*([1 2] [1 [0 1]])` results in `[0 1]`. +* Finally the formula `.*(1 [0 1])` yields `1`. + +You can think of Nock Two as a way of storing a subprocedure in a subject, then accessing it for evaluation. + +### Nock Three: Distinguish <a href="#nock-three-distinguish" id="nock-three-distinguish"></a> + +#### `*[s 3 f]` → `?*[s f]` + +Does the formula as applied to the subject resolve to a cell? + +```hoon +> .*([[1 2] [3 4]] [3 0 1]) +0 + +> .*([[1 2] [3 4]] [3 0 4]) +1 +``` + +I.e., is the noun at slot 1 a cell? (Yes.) Is the noun at slot 4 a cell? (No.) + +### Nock Four: Increment <a href="#nock-four-increment" id="nock-four-increment"></a> + +#### `*[s 4 f]` → `+*[s f]` + +Increment the value. This is the only arithmetic operation. + +```hoon +> .*(5 [4 0 1]) +6 +``` + +Note that typically Nock Four is paired with a constant lookup for the formula. You could construct other patterns by hand; what is this doing? + +```hoon +> (mock [5 4 3 0 1] ,~) +[%0 p=2] +``` + +### Nock Five: Equality <a href="#nock-five-equality" id="nock-five-equality"></a> + +#### `*[s 5 f g]` → `=[*[s f] *[s g]]` + +Are the two nouns, as resolved against the subject, identical? (Compare the result in Nock, not Hoon.) This refers to the structure of the noun, which gives you insight into its behavior in Hoon. + +```hoon +> .*([[1 2] [1 2]] [5 [0 2] [0 3]]) +0 + +> .*([[1 2] [3 4]] [5 [0 2] [0 3]]) +1 + +> .*([[1 2] [3 4]] [5 [0 5] [4 0 4]]) +0 +``` + +### Distribution <a href="#distribution" id="distribution"></a> + +#### `*[s [f g] h]` → `[*[s f g] *[s h]]` + +While not a Nock rule per se, the distributive behavior of Nock (“implicit cons”) requires some attention before we examine the compound rules. + +Formulas have a number corresponding to a Nock rule at their head. However, you can also put a cell at the head of a formula. What does this mean? + +```hoon +> .*([[1 2] [3 4]] [[0 3] [4 0 5]]) +[[3 4] 3] +``` + +It means that each entry is itself a formula. So what this means is that `[0 3]` resolves against `[[1 2] [3 4]]` and `[4 0 5]` resolves against `[[1 2] [3 4]]` before they together yield a cell of the results `[[3 4] 3]`. + +In general, we can string many formulas together in a cell to resolve them all sequentially against the same subject. + +## The Compound Rules: Nock Six to Eleven <a href="#the-compound-rules-nock-six-to-eleven" id="the-compound-rules-nock-six-to-eleven"></a> + +| Nock Rule | Nock Expression | Hoon Equivalent | Comment | +| ----------- | --------------------------------------- | --------------- | ------------------ | +| Nock Six | `*[a *[[c d] 0 *[[2 3] 0 *[a 4 4 b]]]]` | `?:` wutcol | Conditional branch | +| Nock Seven | `*[*[a b] c]` | `=>` tisgar | Compose | +| Nock Eight | `*[[*[a b] a] c]` | `=+` tislus | Extend | +| Nock Nine | `*[*[a c] 2 [0 1] 0 b]` | | Invoke | +| Nock Ten | `#[b *[a c] *[a d]]` | `%=` centtis | Replace at address | +| Nock Eleven | `*[[*[a c] *[a d]] 0 3]`, `*[a c]` | `~` sig runes | Hint | + +### Nock Six: Conditional Branch <a href="#nock-six-conditional-branch" id="nock-six-conditional-branch"></a> + +#### `*[s 6 f g h]` → `*[s *[[g h] 0 *[[2 3] 0 *[s 4 4 f]]]]` + +This expression is straightforward to understand, but the logic of its structure is interesting to follow through. What is being said? If formula `f` evaluated against the subject `s` evaluates `TRUE` (`0`), then the result of `g` evaluated against the subject is produced. If it evaluates to `FALSE` (`1`), then the result of `h` evaluated against the subject is produced. Otherwise, crash. + +* First, at right we evaluate `*[s 4 4 f]`. This means evaluate `f` against `s` then increment it twice. This selects either `0+2` = `2` or `1+2` = `3`. These are slot addresses into Rule Zero immediately before this expression. +* `[2 3]` is the lookup subject for the `0`. Why? This enforces that we have `2` (`0`) or `3` (`1`), not any other possible answer from `f`. +* Now we use that `2` or `3` as a Rule Zero slot address into `g` or `h`. + +The funkiness of the equation is because we have to make sure that the result is `0` or `1`; we must crash on another value or on a cell. + +* How would you write `?.` wutdot? +* How would you write `?~` wutsig? +* What do `?^` wutket and `?@` wutpat expand to? (Nock Six of Nock Three) + +### Nock Seven: Compose <a href="#nock-seven-compose" id="nock-seven-compose"></a> + +#### `*[s 7 f g]` → `*[*[s f] g]` + +Composition describes the evaluation of one formula against the subject, then using that result as the subject of the next formula. + +``` +:: With the subject set to a cell of our and now, return the head. +> != => [our now] - +[7 [[0 12] 0 26] 0 2] +``` + +(Incidentally, although we'll get to this later in `ca02`, what are `our` and `now`?) + +### Nock Eight: Extend <a href="#nock-eight-extend" id="nock-eight-extend"></a> + +#### `*[s 8 f g]` → `*[[*[s f] s] g]` + +Nock Eight allows you to pin a value into the subject. It is very similar to Nock Seven. + +The runes `=+` tislus, `=-` tishep, `=/` tisfas, `=;` tismic, and `=|` tisbar are all instances of Nock Eight. You can use the runes like `=+` and `=-` with immediate use of `-` to refer to the value just added to the subject are Nock Eight followed by Nock Zero. + +```hoon +> =+ 5 + - +5 + +> !=(=+(5 -)) +[8 [1 5] 0 2] +``` + +Faces are not even necessary in Hoon—you can refer to a value using lark notation `-` after it has been defined. `+>` frequently refers to a helper core in Gall, for instance. (While lark notation is often unsatisfactory, it can be more expressive in simple relational statements like these.) + +### Nock Nine: Invoke <a href="#nock-nine-invoke" id="nock-nine-invoke"></a> + +#### `*[s 9 a f]` → `*[*[s f] 2 [0 1] 0 a]` + +The underlying rationale for Nock Nine is to invoke a closure or compute over an association of code and data. This is used throughout Urbit in invocation—think gate-building gates, on which more in `ca01`. In plain language, produce a noun containing code and data (a core) by evaluating formula `g` against the subject `s` and then evaluate the formula in it at slot `a`. + +An arm in any core is accessed through a Nock Nine. For instance, here we have a call to `+dec`, the decrement gate. + +``` +> !=((dec 1)) +[8 [9 2.398 0 2.047] 9 2 10 [6 7 [0 3] 1 1] 0 2] +``` + +Compare the actual Hoon: + +```hoon +++ dec + ~/ %dec + :: unsigned decrement by one. + |= a=@ + ~_ leaf+"decrement-underflow" + ?< =(0 a) + =+ b=0 + :: decremented integer + |- ^- @ + ?: =(a +(b)) b + $(b +(b)) +``` + +What's actually being said here? What we see in `+dec` that we don't see in the Nock is a Nock Four increment operator. That's a hint that the Nock code resulting from `(dec 1)` isn't the equivalent of the gate at all. Check the battery of `+dec` to verify this (it's quite long). + +No, something else is going on with this function invocation with Nock Nine. Based on what we've seen so far of the Nock rules: + +* The subject of the outer Nock Nine call is `[8 [9 2.398 0 2.047]]`. The subject is extended using Nock Eight by retrieving the axis at constant `2.047` within the current subject, then grabbing the noun at slot `2.398` within that noun. + * Extend the namespace with the lookup of the `battery` of the core at `+2047` in the subject. (`[9 2 …` is a very common idiom in Nock, resulting from invoking the battery of a core, but here it's not at `2` it's more complexly buried in the subject tree.) +* The formula is `[2 10 [6 7 [0 3] 1 1] 0 2]`. + * Use the battery of that (second Nock Nine) as the subject for evaluating an `if`-`else` Nock Six. + * That has a constant `1` fired inside of a lookup of the `+3` (`payload`) via Nock Seven (which is the sample). + * Finally the slot `2` is accessed which refers to the battery in the subject as received at this point. + +In general, watch for `[0 2]` to grab the battery and `[0 3]` to grab the payload. + +There's another way around this, which will also be illuminating: build it from the ground up. + +```hoon +[8 [1 0] 8 [1 6 [5 [0 7] 4 0 6] [0 6] 9 2 [0 2] [4 0 6] 0 7] 9 2 0 1] +``` + +* Why do these look different from each other? The first one is locating a battery somewhere else in the subject and pulling it in. The second is explicitly constructing the operation in Nock and invoking it at the same point. + +In another example, see what happens with `(add 2 2)` in Nock: + +``` +> !=((add 2 2)) +[8 [9 36 0 2.047] 9 2 10 [6 [7 [0 3] 1 2] 7 [0 3] 1 2] 0 2] +``` + +Two Nock Nines again, one to retrieve the `battery` of `+add` and another to invoke it. + +The core nature of Hoon fits hand-in-glove with Nock Nine. `+arm` is always an arm, and a wing that references it _directly_ always performs a Nock Nine for the lookup. + +(In the Vere runtime, Nock Nine corresponds to `u3v_poke`, in which a `fun` gets a `gate` to evaluate. More in `ca05`.) + +### Nock Ten: Replace at Address <a href="#nock-ten-replace-at-address" id="nock-ten-replace-at-address"></a> + +#### `*[s 10 [f g] h]` → `#[f *[s g] *[s h]]` + +* `%-` centtis resolves to Nock Nine of Nock Ten: it uses Nock Ten to replace the sample of a core, then runs Nock Nine to fire the arm at axis 2. + +```hoon +> != %- add [3 4] +[8 [9 36 0 2.047] 9 2 10 [6 7 [0 3] 1 3 4] 0 2] +``` + +* [Nock examples implemented by hand](https://github.com/urbit/examples/blob/master/wip/lib/nock.hoon) + +### Nock Eleven: Hint <a href="#nock-eleven-hint" id="nock-eleven-hint"></a> + +#### `*[s 11 [f g] h]` → `*[[*[s g] *[s h]] 0 3]` + +#### `*[s 11 f g]` → `*[s g]` + +The fundamental idea of a hint is that it can provide an arbitrary annotation for a computation without changing the result. In practice, this is used to signal to the runtime that it should do something operationally which Nock doesn't know about. + +The first kind of hint is called a dynamic hint. The head of the cell is discarded and the tail is calculated against the current subject, then thrown away. (What is an example of this in practice? `~&` sigpam.) Put another way: the value in a computation is available to the runtime to use as a message. + +```hoon +> !=(~&('hello' ~)) +[ 11 + [ 1.735.355.507 + [1 0] + 8 + [9 2.558 0 127] + 9 + 2 + 10 + [ 6 + 7 + [0 3] + 8 + [9 158 0 127] + 9 + 2 + 10 + [6 7 [0 3] 1 [1.836.020.833 116 0] 478.560.413.032] + 0 + 2 + ] + 0 + 2 + ] + 1 + 0 +] +``` + +which in Pseudo-Hoon is: + +``` +> !=(~&('hello' ~)) +[ 11 + [ %slog ~ + 8 + [9 2.558 %127] + [9 2 10 + [ 6 + 7 + %3 + 8 + [9 158 %127] + [9 2 10 + [6 7 %3 [%atom 't' ~] 'hello'] %2 + ] + ] + ] + %2 + ] + ~ +] +``` + +The second kind of hint is a static hint and is simply directly specified. The runtime may pick this up, and if it is a registered hint then it may trigger an appropriate side effect. Put otherwise: the value is passed as a key to a hashmap in the interpreter which can do what it needs, then discard the value. + +```hoon +> !=(~>(%bout ~)) +[11 1.953.853.282 1 0] +``` + +#### Practical Hints + +The most fundamental way to send a hint to the runtime is to simply `~>` siggar a raw hint tag with data. For instance, this is used in the update sequence to `%slog` (`printf`) a formatted text block. (Note that raw hints use a special `.` dot notation.) + +```hoon +~>(%slog.0^leaf/"clay: rebuilding {<syd>} after kernel update" .) +``` + +There are a number of ways of signaling a Nock Eleven hint to the runtime. Several have dedicated runes in the `~` sig family, but all of these convenience runes do ultimately unwrap to raw hints and we'll examine them that way. `~>` siggar lets you directly specify a raw hint. Some of these are not in active use; the important ones for you to know at this point have been marked with a `*`. + +* \* `%bout` is used to time calculations. +* \* `%fast` corresponds to jet hinting with `~%` sigcen and `~/` sigfas. (See also \~rivpyl-sidfyl's notes.) +* `%germ` is marked as unused in `nock.c` but corresponds to `~=` sigtis noun duplicate testing. +* `%hand` is mentioned but it's unclear what it does. +* `%hela` is used to unify `%slog` traces from all roads. (See the runtime lessons for details on the road.) +* `%hunk` tracks a failed scry in a stack trace. +* `%live` corresponds to `~$` sigbuc profiling hit counter. +* `%lose` tracks elided frames in a stack trace. +* \* `%mean` corresponds to `~|` sigbar and `~_` sigcab debug printing. +* `%meme` prints memory usage. +* `%memo` corresponds to `~+` siglus caching. +* `%nara` is used to `%slog` the deepest road's trace. (See the runtime lessons for details on the road.) +* \* `%slog` prints a value directly, and corresponds to `~&` sigpam and `~|` sigzap debug printing . +* `%sole` is superseded, but was for global deduplication. +* `%spot` drives stack traces. +* `%xray` prints bytecode. +* Enumerate the raw hints used with `~>` siggar. Provide an example of each. + +``` +> !=(~>(%slog [0 leaf/"ride: parsing"])) +[ 11 + 1.735.355.507 + [1 0] + [1 1.717.658.988] + 7 + [0 1] + 8 + [1 1 114 105 100 101 58 32 112 97 114 115 105 110 103 0] + 9 + 2 + 0 + 1 +] +``` + +```hoon +> => 42 ~> %memo.[42] ~> %slog.[3 '42'] 42 +42 +> => 42 ~> %memo.[42] ~> %slog.[3 '42'] 42 +42 +``` + +#### Jet-Accelerated Code + +One “problem” with Nock is that although it's formally correct and Turing-complete, that doesn't mean that it's computationally efficient for every scenario. How can we solve the problem correctly but quickly in a situation for which this calculation is a bottleneck? We treat the Nock code as a specification of behavior rather than an assertion of method. Jet-accelerated code ("jets" for short) solves our speed problem by allowing the programmer to specify an equivalent algorithm which may (should) be much more efficient than the pure Nock version. + +(This is analogous to C's use of the `#pragma` preprocessor command to specify to its compiler how to treat certain code.) + +We mark certain blocks of code with hints, which suggest to the runtime (Nock interpreter) that an equivalent algorithm may exist in the runtime and that the jet dashboard should be checked for a comparable registration. Such jet dispatch then runs the equivalent fast code and supplies the resulting noun back to the compiled Nock code as if it arisen directly from the Hoon expression. (The jet dashboard handles jet state in the runtime. We don't need to delve into it to understand jet dispatch now, but see the appendix for more details.) + +Jets are hashed and checked before they are compiled, and there is an option to run both the Nock and jet and check that they match. One consequence of jetted code is that you can theoretically have a jet mismatch, wherein the jet produces different code than the Hoon expression. This is considered grounds for removing a jet from the runtime since it destroys the deterministic reproducibility of Urbit state. + +Today, jets are compiled and linked directly into the Urbit runtime. They are only supplied for code in `/sys`. There is not a technical reason for this—any core can be jetted—but it is currently a code hygiene practice to keep things manageable and secure. + +* [“Writing Jets”](../runtime/jetting.md) +* [\~rovnys-ricfer, \~lagrev-nocfep, “Dev-Week: Core School Preview”](https://www.youtube.com/watch?v=i1GRoY3QDdU) +* [\~ritpub-sipsyl, “Notes on Jets”](https://hackmd.io/@zorp/HyVWew3ch) + +### The Fake Rule: Nock Twelve <a href="#the-fake-rule-nock-twelve" id="the-fake-rule-nock-twelve"></a> + +| Nock Rule | Nock Expression | Hoon Equivalent | Comment | +| ----------- | --------------- | --------------- | ----------------- | +| Nock Twelve | `*[a 12 b]` | `.^` dotket | Scry in namespace | + +A scry is a pure function mapping from a path to a response. A scry is a referentially transparent way of referring to data. A `.^` dotket scry resolves via an artificial Nock instruction, Nock Twelve. De facto, we are attempting to solve the problem that sometimes information is necessary to complete a calculation which may not be present in the subject. The `12` instruction is picked up by the virtual Nock implementation within Arvo and the appropriate value inserted. The scry pattern is read-only and synchronous. + +While a scry request does violate function purity, it is reproducible due to the scry namespace, in which immutable values are bound by convention at fixed endpoints. The scry namespace is defined by a unique path for each resource. + +``` +/~zod/1/2/c/x/~2023.8.31..12.00.00..0000/base/sys/arvo/hoon +/host/rift/life/vane/request-type/revision-number/desk/file-path/mark +``` + +When in userspace, we work with a stripped-down version of the full scry namespace, which is supplied by Arvo. Thus scries are a bit mangled in userspace. + + + +The scry interface defined by Arvo and each vane is not intrinsic to the runtime, but is an OS-level convention. (The values `eny` and `now` are not scries, but atoms provided by Arvo—and thither dynamically by the runtime from the system. More on these in `ca02`.) + +* [“Additional Guides: Remote Scry”](../userspace/remote-scry.md) + +For instance, consider a namespace scry into Clay: + +``` +> !=(.^(* %cx /===/gen/cat/hoon)) +[ 12 + [1 139 1.853.189.998] + 1 + 30.819 + 1.685.027.454 + 1.702.060.386 + 161.311.343.539.592.098.933.172.536.971.821.663.896.217.238.133.980.260.425.806.462 + 7.234.919 + 7.627.107 + 1.852.796.776 + 0 +] +``` + +Refactored back into Hoon atoms, we can see: + +``` +[ 12 + [1 139 %noun] + 1 + %cx + ~.~zod + ~.base + ~.~2023.7.31..20.39.12..1abd + ~.gen + ~.cat + ~.hoon + ~ +] +``` + +### Virtualized Nock (`+mock`) <a href="#virtualized-nock-mock" id="virtualized-nock-mock"></a> + +How does scrying actually work? Hoon can run both Nock and Hoon within itself, and in particular has Nock Twelve implemented in the [`+mink`](../../hoon/stdlib/4n.md#mink) emulator. (`+mink` underlies `+mock`, which we've been using.) `+mink` takes a subject and a formula like any Nock interpreter, and also a gate called any time a formula contains Nock Twelve at its head. Userspace code such as Gall agents is run using `+mink` and thus has the ability to fill out nouns using calls into Arvo to construct its nouns. + +`+mock` and friends work by coordinating with the runtime to virtualize Nock: + +> With the power of `u3`, we nest arbitrary layers of `mock` without any particular performance cost. Moreover, we simply treat Nock proper as a special case of `mock`. (More precisely, the internal VM loop is `+mink` and the error compiler is `+mook`. But we call the whole sandbox system `mock`.)\ +> The nice thing about `mock` functions is that (by executing within `u3m_soft_run()`, which as you may recall uses a nested road) they provide both exceptions and the namespace operator - `.^` in Hoon, which becomes operator `11` in `mock`. + +For instance, we have been running `.*` dottar at the Dojo prompt. In fact, this does produce Nock code—but then that Nock code is evaluated by whatever the interpreter is at that level. In userspace, this is `+mock`; in kernelspace, the runtime evaluates the Nock code directly (not a metacircular interpreter). (What consequence does this have for scries?) + +This works in userspace because of the metacircular interpreter, but would not work in kernelspace: + +```hoon +> ;;(@t .*(. !=(.^(* %cx /===/gen/cat/hoon)))) +':: ConCATenate file listings\0a\0a::\0a:::: /hoon/cat/gen\0a ::\0a/? 310\ +0a/+ pretty-file, show-dir\0a::\0a::::\0a ::\0a:- %say\0a|= [^ [arg=(list +path)] vane=?(%g %c)]\0a=- tang+(flop `tang`(zing -))\0a%+ turn arg\0a|= pax +=path\0a^- tang\0a=+ ark=.^(arch (cat 3 vane %y) pax)\0a?^ fil.ark\0a ?: =( +%sched -:(flop pax))\0a [>.^((map @da cord) (cat 3 vane %x) pax)<]~\0a [leaf ++(spud pax) (pretty-file .^(noun (cat 3 vane %x) pax))]\0a?- dir.ark + :: handle ambiguity\0a ~\0a [rose+[" " `~ +]^~[leaf+"~" (smyt pax)]]~\0a::\0a [[@t ~] ~ ~]\0a $(pax (welp pax /[p.n.dir +.ark]))\0a::\0a *\0a =- [palm+[": " ``~]^-]~\0a :~ rose+[" " `~]^~[leaf+" +*" (smyt pax)]\0a `tank`(show-dir vane pax dir.ark)\0a ==\0a==\0a' +``` + +## The Urbit Lifecycle Function <a href="#the-urbit-lifecycle-function" id="the-urbit-lifecycle-function"></a> + +The core expression in all of Urbit is the lifecycle function. + +``` +[2 [0 3] 0 2] +``` + +This is replay. The idea of Urbit is a computer whose entire lifecycle is defined by a single small fixed frozen function. We will revisit this function when we examine Arvo. + +You can find it instantiated directly in `/sys/arvo.hoon` in expressions like `[%9 2 %10 [6 %0 3] %0 2]`. + +## Exercises <a href="#exercises" id="exercises"></a> + +* Write your own Nock interpreter in Hoon. +* Write your own jet, following the [tutorial](../runtime/jetting.md). +* Optional: Compose a Nock addition gate from scratch. ([Solution](https://github.com/urbit/examples/blob/master/wip/lib/nock.hoon)) + +## Appendix: The Nock Interpreter <a href="#appendix-the-nock-interpreter" id="appendix-the-nock-interpreter"></a> + +As hinted at by our jetting discussion above, Nock is not directly evaluated by the runtime. There are three key parts of the Nock interpreter: + +1. Bytecode interpreter +2. Jet dashboard +3. Allocator + +This appendix summarizes some of the considerations and architecture that the current runtime interpreters (Vere and Sword, née Ares) must make in some respect. + +### Nock Bytecode <a href="#nock-bytecode" id="nock-bytecode"></a> + +The VM doesn't actually execute Nock directly from its noun form. Instead it produces a Nock-based bytecode. Producing computationally efficient bytecode from Nock is one key to building a practical Urbit runtime, and has been a focus of the Sword (née Ares) project. The details of this in Vere are in `vere/pkg/noun/nock.c`. + +``` +> ~>(%xray (add 2 3)) +{[fask 1023] [kicb 1] snol head swap tail [lilb 2] swap tail [lilb 3] auto musm [ticb 0] halt} +``` + +```hoon +> ~>(%xray =+(2 [- -])) +{[litb 2] snol head swap head ault halt} +``` + +The `%xray` hint on a core doesn't show the core, but it shows the formula that invokes the core. + +``` +> ~> %xray.[0 %outer] +=| i=@ +|- ^- @ +~> %xray.[0 %inner] +?:(=(i ^~((bex 0))) ~ $(i +(i))) +outer: {lit0 snol [libk i:0] snol [ticb 0] halt} + inner + {[fabk 6] sam1 [sbin 3] lil0 [sbip 8] copy swap [fabk 6] bump musm [ticb 0] halt} +``` + +Core Academy lessons will not further explore the Nock bytecode interpreter. + +* [`u3n`: Nock Execution](../runtime/api.md#u3n-nock-execution) +* (no docs exist for the Vere bytecode convention) +* [Sword (née Ares), “Codegen Bootstrapping”](https://github.com/urbit/ares/blob/status/docs/codegen-bootstrap.md?plain=1) + +### Jet Dashboard <a href="#jet-dashboard" id="jet-dashboard"></a> + +> The jet dashboard is the system in the runtime that registers, validates, and runs jets: specific pieces of Nock code reimplemented in C for performance. + +The main part we have to be aware of as developers is jet registration and hinting. However, deeper parts of the jet dashboard do affect performance. + +The jet dashboard maintains three jet state systems: + +1. `cold` state results from the logical execution history of the pier and consists of nouns. `cold` jet state registers jets as they are found. `cold` state ignore restarts. +2. `hot` state is the global jet dashboard and describes the actual set of jets loaded into the pier for the current running process. Calls to `hot` state result from Nock Nine invocations of a core and an axis. `hot` state is thus tied to process restart. +3. `warm` lists dependencies between `cold` and `hot` state. `warm` state can be cleared at any time and is cleared on restart. + +The jet dashboard will not be explored in detail in Core Academy. + +* [“API Overview by Prefix: `u3j` jets”](../runtime/api.md#u3j-jets) + +### Allocator <a href="#allocator" id="allocator"></a> + +The allocator is responsible for memory management. The Vere runtime has some peculiarities, such as the loom/road structure, which are not directly tied to the nature of Nock as a computable language. + +Persistence means that we can store events and the event log, and that we can produce the state derived from them as a snapshot. + +The allocator will be examined more in the runtime lessons on Vere. + +* [Vere, “Land of Nouns: `u3` the road model”](../runtime/nouns.md#u3-the-road-model) +* [Sword (née Ares), “New Mars Persistence”](https://github.com/urbit/ares/blob/status/docs/persistence.md) diff --git a/docs/core-academy/ca01.md b/docs/core-academy/ca01.md new file mode 100644 index 0000000..c225119 --- /dev/null +++ b/docs/core-academy/ca01.md @@ -0,0 +1,995 @@ +--- +description: "Core Academy lesson on the Hoon compiler covering vase-mode programming with +slap/+slop algebra, the compilation pipeline from text to AST to Nock, type system mechanics, and Ford file building." +layout: + title: + visible: true + description: + visible: false + tableOfContents: + visible: true + outline: + visible: true + pagination: + visible: true +--- + +# 2. Building Hoon + +*This lesson explains vase-mode Hoon, including the arms `+slap`, `+slop`, `+slot`, and `+slam`. We examine the compiler stack, and how Ford uses vase mode to process file inclusions.* + +## Vase-Mode Hoon {#vase-mode-hoon} + +A `vase` is a pair `[type noun]` used to describe data of a type unknown at compile time. (For instance, this can result from compiling and running Hoon code.) We use vases throughout the kernel, such as to run userspace agents in Gall or to upgrade Arvo itself. + +What do we mean when we talk about `[type noun]`? Nouns are straightforward but what is a `$type`? + +### `$type` {#type} + +Hoon uses the word “type” a few different ways casually (mold, mark, &c.). `$type` is also a core definition of Hoon, a way of specifying a set of nouns, such as all atoms or all cells. One challenge in interpreting a vase in the Dojo is that the prettyprinter obscures these. Take the type spear of a gate like `+add`, for instance: + +```hoon +> -:!>(add) +#t/<1.otf [[a=@ b=@] <33.sam 1.pnw %139>]> + +> -<:!>(add) +%hold +``` + +**`/sys/hoon.hoon`** + +```hoon ++$ type $~ %noun :: + $@ $? %noun :: any nouns + %void :: no noun + == :: + $% [%atom p=term q=(unit @)] :: atom / constant + [%cell p=type q=type] :: ordered pair + [%core p=type q=coil] :: object + [%face p=$@(term tune) q=type] :: namespace + [%fork p=(set type)] :: union + [%hint p=(pair type note) q=type] :: annotation + [%hold p=type q=hoon] :: lazy evaluation + == :: +``` + +- `%noun` is the superset of all nouns. +- `%void` is the empty noun, but won't occur in practice. +- `%atom` spans the set of all atoms. +- `%cell` contains ordered pairs. + +The other types are more complex: + +- `%core` is the descriptor type for a core. Besides the `$type`, it uses a `$coil`, which is a tuple of variance information, context, and chapters (limbs). +- `%face` spans the same set as nouns but includes a face. +- `%fork` is a union, or choice over options. +- `%hint` is an annotation for the compiler (Nock Eleven). +- `%hold` types are lazily evaluated, such as a recursive type (like the `+list` mold builder). + - `%hold` types are why the compiler can have trouble with lists at runtime, such as needing to distinguish a `lest` or the TMI problem with `+snag` &c. + - A `%hold` type is a “finite subtype” of an infinite type. Hoon doesn't actually know about these directly, just in that it can be lazy about evaluating recursions. + - They result from arms in cores because the `hoon` of the arm is played against the core type as the subject type to get the result. This permits wetness and variance, since the core can be modified to have a sample of a different type. + > One can "evaluate" a hold by asking the compiler to "play" the hoon against the subject type, meaning to infer what type of value would result from running that hoon against a value of the subject type. For a recursive type, this result type refers to the same hold, usually in one or more of the cases of a `%fork`. (~rovnys-ricfer) + +For instance, let's evaluate a value of the type `(list @ud)`. + +```hoon +> =a `(list @ud)`~[1 2 3] + +> !>(a) +[#t/it(@ud) q=[1 2 3 0]] + +> -<:!>(a) +%hold + +> -<:!>(?~(a 0 a)) +%fork + +> ->-:!>(?~(a 0 a)) +#t/@ud +[%atom p=%ud q=~] + +> ->+<:!>(?~(a 0 a)) +l=[#t/[i=@ud t=it(@ud)] l=~ r=~] +``` + +The prettyprinter really wants to gussy these up for you, so you have to play some tricks to see the value unexpurgated: + +```hoon +> ;;($>(%atom type) -:!>(3)) +[%atom p=%ud q=~] + +> ;;($>(?(%core %hold) type) -:!>(dec)) +[ %hold +... +] + +> =+ r=~(repo ut ~(repo ut -:!>(dec))) ?>(?=(%core -.r) r) +[ %core + #t/[a=@ <33.sam 1.pnw %139>] + q + [ p=[p=~ q=%dry r=%gold] + ... + ] +] + +> ?? 4 + [%atom %ud ~] +4 +``` + +A `%hold` can be resolved by one step using `(~(play ut type) hoon)`, which type-infers what the result of running that `hoon` would be against a subject of that `type`. + +```hoon +> =- ?>(?=(%fork -<) -) ~(repo ut -:!>(*(list @))) +[%fork p=[#t/%~ l=[#t/[i=@ t=it(@)] l=~ r=~] r=~]] +``` + +Whenever we talk about `$type`, this is it. This may in some ways feel odd to those coming from other languages, since we can—and often will—talk about the “type of the subject” for instance. That means this head of the vase, or the type used to interpret the associated noun. + +### Building Vases {#building-vases} + +The most common way to get a vase is using the `!>` zapgar rune. + +```hoon +> !>('Hello Mars') +[#t/@t q=545.182.085.650.269.906.691.400] +``` + +A vase can also be manually constructed, e.g. + +```hoon +> `vase`[`type`[%atom %ud `4] 4] +[#t/%4 q=4] +``` + +(This is the approach that the Hoon compiler will use internally when it processes structures.) + +### Eliminating Vases {#eliminating-vases} + +> An elimination form for a vase is something that converts a vase to a statically typed value. + +There are three common ways to eliminate a vase back into a value: + +- `!<` zapgal is an unsafe form which accepts a mold and a vase. (It's unsafe because it doesn't guarantee the value has the type, an evil vase.) + +```hoon +> !<(@ !>(4)) +4 + +> !<(path !>(/===)) +/~zod/base/~2023.8.30..19.57.38..6b94 +``` + +- `;;` micmic is a safer coercion which applies the `spec` to the `hoon` and yields a mold-coerced form (structurally nests). + +```hoon +> ;;(@ 4) +4 + +> ;;(path /===) +/~zod/base/~2023.8.30..19.58.47..d09f +``` + +- A mold is a straightforward way to eliminate the vase, à la `;;` micmic. + +### Vase Algebra {#vase-algebra} + +Given a vase that contains an expression of data and its type, how can we use it besides to just eliminate it back to a static value? We will use vases to build Hoon code, so let's get good at combining and manipulating vases. Facility with vase mode is one of [the bright-line ideas](https://www.joelonsoftware.com/2005/12/29/the-perils-of-javaschools-2/) for kernel work, so read through these concepts several times and ask dumb questions. + +The [`+slap`/`+slop` vase algebra](/guides/additional/vases#slap'n'slop-vase-algebra) provides a framework for working through Hoon expressions at an abstract level. + +#### `+slap`: Envase Hoon + +[`+slap`](/hoon/reference/stdlib/5c#slap) runs a `hoon` against a `vase` and produces a `vase` of the result. + +```hoon +++ slap + |= [vax=vase gen=hoon] ^- vase :: untyped vase .* + =+ gun=(~(mint ut p.vax) %noun gen) + [p.gun .*(q.vax q.gun)] +``` + +First this gate compiles the parsed Hoon expression to Nock against the type of a passed subject using `+mint:ut` (on which more later). Then it envases the product type and the actual evaluation of that Nock against the subject (value) passed in. + +Since any Nock formula is resolved against a subject, `+slap` is Hoon's answer to raw `.*` Nock Two evaluation. The type of the subject contains the information about the arms, faces, and raw subject axes. + +```hoon +> => [one=1 two=2 tri=3] + != [one two] +[[0 2] 0 6] +[[0 2] [0 6]] +``` + +Examples: + +```hoon +> (slap !>(3) (ream '.')) +[#t/@ud q=3] + +> (slap !>(3) (ream '+(.)')) +[#t/@ q=4] + +> (slap !>(3) (ream '[33 44]')) +[#t/[@ud @ud] q=[33 44]] + +> (slap !>(3) (ream '%foo')) +[#t/%foo q=7.303.014] + +> (slap !>([33 44]) (ream '.')) +[#t/[@ud @ud] q=[33 44]] + +> (slap !>([33 44]) (ream '-')) +[#t/@ud q=33] + +> (slap !>([33 44]) (ream '+')) +[#t/@ud q=44] + +> (slap !>([foo=33 bar=44]) (ream '.')) +[#t/[foo=@ud bar=@ud] q=[33 44]] + +> (slap !>([foo=33 bar=44]) (ream 'foo')) +[#t/@ud q=33] + +> (slap !>([foo=33 bar=44]) (ream 'bar')) +[#t/@ud q=44] + +> (slap !>([foo=33 bar=44]) (ream '+(foo)')) +[#t/@ q=34] +``` + +#### `+slop`: Combine Vases + +[`+slop`](/hoon/reference/stdlib/5c#slop) combines a `cell` of `vase`s into a `vase` of a `cell`. + +```hoon +++ slop :: cons two vases + |= [hed=vase tal=vase] + ^- vase + [[%cell p.hed p.tal] [q.hed q.tal]] +``` + +Examples: + +```hoon +> (slop !>(3) !>(4)) +[#t/[@ud @ud] q=[3 4]] + +> (slop !>(foo=3) !>(4)) +[#t/[foo=@ud @ud] q=[3 4]] + +> (slop !>(foo=3) !>(bar=4)) +[#t/[foo=@ud bar=@ud] q=[3 4]] + +> (slop !>(foo=3) !>([bar=4 baz=5])) +[#t/[foo=@ud bar=@ud baz=@ud] q=[3 4 5]] + +> (slop !>(foo=%foo) !>([bar=[4 5] baz=%baz])) +[#t/[foo=%foo bar=[@ud @ud] baz=%baz] q=[7.303.014 [4 5] 8.020.322]] +``` + +#### Vase Algebra Operations + +With `+slap` and `+slop`, we have the ability to build higher-level operators. Each value is a `vase` and the fundamental operators are `+slap` and `+slop`. + +``` + +slap + / \ + +slop h2 + / \ + v1 +slop + / \ + v2 +slap + / \ + v3 h1 + +(slap (slop v1 (slop v2 (slap v3 h1))) h2) +``` + +A concrete example in Hoon: + +```hoon +=/ v1=vase !>(%foo) +=/ v2=vase !>(%bar) +=/ v3=vase !>(%baz) +=/ h1=hoon (ream '%qux') +=/ h2=hoon (ream '[%result .]') +:: +%+ slap + %+ slop v1 + %+ slop v2 + (slap v3 h1) +h2 + +:: result +[#t/[%result %foo %bar %qux] q=[128.009.175.786.866 7.303.014 7.496.034 7.894.385]] +``` + +Compilation and execution take place using vase mode, including many operators built of `+slap` and `+slop`. For instance, this is how Ford imports library cores when a file is built to code. + +#### `+slam`: Slam Gate with Sample + +[`+slam`](/hoon/reference/stdlib/5c#slam) accepts a gate as a `vase` and appropriate arguments. + +(This is an older implementation that's cleaner to interpet.) + +```hoon +++ slam + |= [gat=vase arg=vase] + ^- vase + (slap (slop gat arg) !,(*hoon (- +))) +``` + +Example: + +```hoon +> ;;($>(%hold type) -:(slam !>(dec) !>(5))) +[ %hold + #t/<1.hkg [a=@ <33.sam 1.pnw %139>]> + q + [ %sgcb + p + [ p=[%rock p=%tas q=1.717.658.988] + q + [ %knit + p + ~[100 101 99 114 101 109 101 110 116 45 117 110 100 101 114 102 108 111 119] + ] + ] + q + [ %wtgl + p=[%dtts p=[%sand p=%ud q=0] q=[%wing p=~[%a]]] + q + [ %tsls + p=[%ktts p=term=%b q=[%sand p=%ud q=0]] + q + [ %brhp + p + [ %kthp + p=[%base p=[%atom p=~.]] + q + [ %wtcl + p=[%dtts p=[%wing p=~[%a]] q=[%dtls p=[%wing p=~[%b]]]] + q=[%wing p=~[%b]] + r=[%cnts p=~[%$] q=[i=[p=~[%b] q=[%dtls p=[%wing p=~[%b]]]] t=~]] + ] + ] + ] + ] + ] + ] +] +``` + +- How do we get the result back out? (Elimination mode, such as `!<`.) + +#### `+slot`: Retrieve Noun + +[`+slot`](/hoon/reference/stdlib/5c#slot) can be used to decompose vases. + +```hoon +++ slot :: got axis in vase + |= [axe=@ vax=vase] ^- vase + [(~(peek ut p.vax) %free axe) .*(q.vax [0 axe])] +``` + +`(slot 2 vase)` and `(slot 3 vase)` decompose the `vase` back into constituent types and values. + +#### Tutorial: Finding the Sample + +- How can you retrieve the type of the sample from a vase for a gate? + +```hoon +> =+ -:!>(+6:add) + ?>(?=([%cell *] -) -) +p=[%cell #t/a=@ #t/b=@] + +> =+ -:!>(+6:|=([p=@tas q=@da r=^] ~)) + ?>(?=([%cell *] -) -) +p=[%cell #t/p=@tas #t/[q=@da r=[* *]]] +``` + +- How can you examine the faces of the sample? + +```hoon +> =+ -:!>(+12:|=([p=@tas q=@da r=^] ~)) + ?>(?=([%face *] -) -) +p=[%face p=%p #t/@tas] +``` + +Some places to examine vase mode: + +- Dojo uses vase mode to evaluate statements. +- ~rovnys-ricfer produces a simplified Gall using vase mode in [this video](https://drive.google.com/file/d/10SaE5doCfdeqc2j945t8GvGvexKIBBZq/view). +- [Hoon, “Vases”](/guides/additional/vases) +- [~rovnys-ricfer, “Hoon Vases”](https://rovnys.cataphract.us/vases) (older version but has some diagrams that aren't in the docs version) + + +## The Compiler Stack {#the-compiler-stack} +### The Lifecycle of Hoon (`+ride` or Die) {#the-lifecycle-of-hoon-ride-or-die} + +Hoon code begins life as text, presumably a `cord` `@t` value. If we would like to go from text interpretable as Hoon code to executable Nock, what does that look like? + +There are a few different paths Hoon can take to run the gauntlet: + +1. A text file containing Hoon code, processed via the `%hoon` mark. +2. A text `cord` containing Hoon code, processed via the `+ream` arm +3. A noun interpretable as a Hoon AST. + +We need to go from Hoon code to a Hoon abstract syntax tree (AST), then from the AST to Nock. We can build and describe this process at several layers of granularity. For instance, `+ride` accomplishes this in one step. + +[`+ride`](/hoon/reference/stdlib/5d#ride) accepts a pair of `$type` and an atom (really a `cord`) of text containing Hoon code. + +```hoon +++ ride + |= [typ=type txt=@] + ^- (pair type nock) + ~> %slog.[0 leaf/"ride: parsing"] + =/ gen (ream txt) + ~> %slog.[0 leaf/"ride: compiling"] + ~< %slog.[0 leaf/"ride: compiled"] + (~(mint ut typ) %noun gen) +``` + +Line by line: + +- `|= [typ=type txt=@]` + - Accept a `$type` and a `cord`. +- `^- (pair type nock)` + - A pair of `$type` and compiled `$nock` result. A `nock` is a cell of numbers interpretable as a Nock formula. Notably, the Nock rules tend to be embedded as constants (such as `%0`), which makes it more readable than conventional Nock. +- `~> %slog.[0 leaf/"ride: parsing"]` + - Issue a raw hint to output a starting message. +- `=/ gen (ream txt)` + - `+ream` the `cord`, which compiles the Hoon expression into a Hoon abstract syntax tree (AST). +- `~> %slog.[0 leaf/"ride: compiling"]` + - Issue a raw hint to output a continuation message. +- `~< %slog.[0 leaf/"ride: compiled"]` + - Issue a raw hint to the product (so it prints after the completion). +- `(~(mint ut typ) %noun gen)` + - Do the shovel work of compiling the parsed AST into Nock code, using the type `%noun`. + +As another example, [`+make`](/hoon/reference/stdlib/5d#make) shows the process in capsule form for converting a `cord` of Hoon code into executable `$nock` Nock code. `+make` doesn't explicitly require or deal with `$type` like `+ride` does. + +```hoon +++ make + |= txt=@ + q:(~(mint ut %noun) %noun (ream txt)) +``` + +Given a Hoon expression, convert it to `$nock`. + +```hoon +> (make '~[1 2 3]') +[%1 p=[1 2 3 0]] +``` + +### Text `cord` → AST `$hoon` {#text-cord-ast-hoon} + +#### `+ream` + +To take a `cord` containing Hoon code and transform it to an AST, use [`+ream`](/hoon/reference/stdlib/5d#ream): + +```hoon +> (ream '(add 1 2)') +[%cncl p=[%wing p=~[%add]] q=~[[%sand p=%ud q=1] [%sand p=%ud q=2]]] + +> (ream '-:!>(5)') +[%tsgl p=[%cnts p=~[[%.y p=2]] q=~] q=[%zpgr p=[%sand p=%ud q=5]]] +``` + +`+ream` simply wraps `+vast`: + +```hoon +++ ream :: parse cord to hoon + |= txt=@ + ^- hoon + (rash txt vest) +``` + +Any irregular annotation is changed into its basic Hoon form. However, at this point no desugaring has taken place; equivalent forms may still have different AST representations as `$hoon`: + +```hoon +> (ream '~[1 2 3]') +[%clsg p=~[[%sand p=%ud q=1] [%sand p=%ud q=2] [%sand p=%ud q=3]]] + +> (ream '[1 2 3 ~]') +[%cltr p=~[[%sand p=%ud q=1] [%sand p=%ud q=2] [%sand p=%ud q=3] [%bust p=%null]]] +``` + +You can also clearly see how Hoon supplements "pure" values with metadata to establish context for the values. + +#### `+vast` + +`+ream` is a wrapper for the main parser `+vast`. While `+vast` spans all of Hoon parsing, it is particularly intended to parse Hoon, unsurprisingly. It starts with an attempt to parse in tall form, which will fall back to wide form as necessary. A `+gay` is a way to ignore `+gap` plural whitespace on either side of the text (file). + +```hoon +++ vest + ~/ %vest + |= tub=nail + ^- (like hoon) + %. tub + %- full + (ifix [gay gay] tall:vast) +``` + +The `+vang` wrapper lets you turn off debugging info and doccords in parsing. + +```hoon +> (rash 'goo' tall:(vang | /)) +[%wing p=~[%goo]] + +> (rash 'goo' tall:(vang & /)) +[%dbug p=[p=/ q=[p=[p=1 q=1] q=[p=1 q=4]]] q=[%wing p=~[%goo]]] +``` + +#### `+open:ap` + +A parsed Hoon AST has not yet been desugared. Many Hoon runes are simply convenience wrappers over a few fundamental runes. (For instance, most (all?) `%` cen runes reduce to `%~` censig.) `+open:ap` unwraps these one layer at a time until fundamental runes are reached in the Hoon AST. + +```hoon +> (ream '(add 1 2)') +[%cncl p=[%wing p=~[%add]] q=[i=[%sand p=%ud q=1] t=[i=[%sand p=%ud q=2] t=~]]] + +> ~(open ap (ream '(add 1 2)')) +[ %cnsg + p=~[%$] + q=[%wing p=~[%add]] + r=[i=[%sand p=%ud q=1] t=[i=[%sand p=%ud q=2] t=~]] +] + +> ~(open ap ~(open ap (ream '(add 1 2)'))) +[ %cntr + p=~[%$] + q=[%wing p=~[%add]] + r + [ i=[p=~[[%.n p=0 q=~] [%.y p=12]] q=[%sand p=%ud q=1]] + t=[i=[p=~[[%.n p=0 q=~] [%.y p=13]] q=[%sand p=%ud q=2]] t=~] + ] +] + +> ~(open ap ~(open ap ~(open ap (ream '(add 1 2)')))) +[ %tsls + p=[%wing p=~[%add]] + q + [ %cnts + p=~[%$ [%.y p=2]] + q + [ i + [p=~[[%.n p=0 q=~] [%.y p=12]] q=[%tsgr p=[%$ p=3] q=[%sand p=%ud q=1]]] + t + [ i + [ p=~[[%.n p=0 q=~] [%.y p=13]] + q=[%tsgr p=[%$ p=3] q=[%sand p=%ud q=2]] + ] + t=~ + ] + ] + ] +] + +> ~(open ap ~(open ap ~(open ap ~(open ap (ream '(add 1 2)'))))) +[ %tsgr + p=[p=[%wing p=~[%add]] q=[%$ p=1]] + q + [ %cnts + p=~[%$ [%.y p=2]] + q + [ i + [p=~[[%.n p=0 q=~] [%.y p=12]] q=[%tsgr p=[%$ p=3] q=[%sand p=%ud q=1]]] + t + [ i + [ p=~[[%.n p=0 q=~] [%.y p=13]] + q=[%tsgr p=[%$ p=3] q=[%sand p=%ud q=2]] + ] + t=~ + ] + ] + ] +] +``` + +- [Hoon, “Constants (Atoms and Strings)”](/hoon/reference/rune/constants) + +- Examine `+open:ap` in `/sys/hoon.hoon` to see how desugaring proceeds. + +### AST `$hoon` → Nock `$nock` {#ast-hoon-nock-nock} + +All Hoon code is ultimately run by `+mint:ut`, although there can be many paths there. `+mint:ut` parses from a Hoon AST into a pair of the type and the Nock. What is [`+ut`](/hoon/reference/stdlib/5c#ut)? It's the Hoon compiler backend, containing all of the arms necessary to actually process a `$hoon` into a `$nock`. + +One simple approach is to look at the 2013 Hoon compiler, which presents a relatively uncluttered version + +- [`+mint:ut` (2013)](https://github.com/urbit/archaeology/blob/6b2ce202207b9bb3f4e65fc1ea9a2fb434396dd4/urb/zod/arvo/hoon.hoon#L7698) + +In all cases, the parent door `+ut` receives the sample of `$type` when an arm is invoked. The significant arms include: + +##### `+mull:ut` + +`+mull` is an assertion for wet gates. Each place in the code that calls a wet gate needs the compiler to check in order to ensure that the wet gate when called with that sample would have the same Nock as it would otherwise. + +- `sut` is the subject type +- `gol` is a product constraint (result type must nest in `gol`, passed around to support accurate stack traces) +- `dox` is the formal subject type + +`+mull` is conceptually equivalent to saying, “compile this expression against the actual subject type `sut`, compile it again against the formal type `dox`, assert that they produce the same Nock”. + +In practice, +mull traverses both subject types at the same time, short-circuiting in a couple of scenarios, and simply crashing if the (conceptual) result would be different. + +##### `+find:ut` + +`+find` is a wing resolution arm. It tries Nock Zero first to see if the wing is a leg, then tries Nock Nine if that fails. It also needs information about the core variance (`%read`, `%rite`, `%both`, `%free`); see [`+slab`](/hoon/reference/stdlib/5c#slab) as well, which also uses `+fond` under the hood. + +```hoon +> (slab %read %$ -:!>(add)) +%.y + +> (slab %read %a -:!>(add)) +%.y + +> (slab %read %b -:!>(add)) +%.y + +> (slab %read %c -:!>(add)) +%.n +``` + +##### `+nest:ut` + +`+nest` provides a structural test on whether two `$type`s nest properly. It is called via `+nice` so that the `need`/`have` error messages can be presented neatly. + +##### `+mint:ut` + +`+mint` is quite long, but deserves some attention. + +- `+mint` itself takes a pair of `typ` and `hoon`, a Hoon AST. +- `+mint` produces a pair of `type` and `nock`. +- It features a lot of `+nice` (`+nest`) checks. +- Each branch in the main `switch` operates on an AST tag to convert it recursively to `$nock`. + +```hoon +> (~(mint ut %noun) %noun (ream '~[1 2 3]')) +[#t/[@ud @ud @ud %~] q=[%1 p=[1 2 3 0]]] +``` + +This can be evaluated as `$nock` then using `.*` dottar: + +```hoon +> .*(. (make '~[1 2 3]')) +[1 2 3 0] +``` + +`+mint:ut` is never called on its own in the compiler. It's used to generate a Nock formula then to run it against the subject to make it useful, e.g. for an agent running a formula against the standard library plus imports. + +```hoon +> (~(mint ut -:!>(.)) %noun ~(open ap (ream '(add 1 2)'))) +[ #t/@ + q + [ %8 + p=[%9 p=36 q=[%0 p=2.047]] + q + [ %9 + p=2 + q + [ %10 + p=[p=6 q=[p=[%7 p=[%0 p=3] q=[%1 p=1]] q=[%7 p=[%0 p=3] q=[%1 p=2]]]] + q=[%0 p=2] + ] + ] + ] +] + +> !=((add 1 2)) +[8 [9 36 0 2.047] 9 2 10 [6 [7 [0 3] 1 1] 7 [0 3] 1 2] 0 2] + +> ;;($>(?(%hold %core) type) -:(~(mint ut -:!>(.)) %noun ~(open ap (ream '(add 1 +2)')))) +[ %hold + #t/<1.otf [[a=@ b=@] <33.sam 1.pnw %139>]> + q + [ %kthp + p=[%base p=[%atom p=~.]] + q + [ %wtcl + p=[%dtts p=[%sand p=%ud q=0] q=[%wing p=~[%a]]] + q=[%wing p=~[%b]] + r + [ %cnts + p=~[%$] + q + [ i=[p=~[%a] q=[%cncl p=[%wing p=~[%dec]] q=[i=[%wing p=~[%a]] t=~]]] + t=[i=[p=~[%b] q=[%dtls p=[%wing p=~[%b]]]] t=~] + ] + ] + ] + ] +] + +;;($>(?(%hold %core) type) -:(~(mint ut -:!>(.)) %noun ~(open ap (rash 'add' t +all:(vang | /))))) +``` + +In the case of a core, `+mint` returns a 3-tuple. The head is the battery of the core; the middle is the default sample, or Nock One of the bunted sample; the tail is `[0 1]`, to take the subject and put it in the `context` of the core. + +```hoon +> +:(~(mint ut -:!>(~)) %noun !,(*hoon |=(@ +<))) +q=[%8 p=[%1 p=0] q=[p=[%1 p=[0 6]] q=[%0 p=1]]] +``` + +The Hoon compiler is not an optimizing compiler. It can recognize and replace some simple expressions, like Nock Seven of two Nock Zeros into a single Nock Zero. + +(As an aside, note that Hoon only uses `#` hax in a rune as a placeholder for an experimental rune, or a rune that is involved in the first stage of a multi-stage upgrade process, like swapping two runes and needing a temporary rune to use in the intermediate version of the language.) + +Ted points out that the nature of Hoon as an ergonomic harness to generate Nock code is here very apparent. There is no additional runtime system materials like C, C++, or Rust would inject; there is no extra Nock glue. Hoon is a good assembly language (which is really what it's for)—so one future of Hoon is to strip out features and make it more concrete bare-bones then build things on top of it. + +- [Hoon 141 Compiler Documentation](https://docs.google.com/document/d/1C0k0AY2vsFu5fNfp0nKiU0GsKThkLMLdHBReqAXNy_w/edit) + +Questions: + +- If you create a Gall agent with the wrong number of arms (i.e. add an arm), what goes wrong? Where does this error arise in the compilation process? + +#### `$nock` + +While there are no surprises in `$nock` as a representation of Nock nouns, the use of constants for the rules and the deferral of formulae as subnocks is very readable. + +```hoon ++$ nock $^ [p=nock q=nock] :: autocons + $% [%1 p=*] :: constant + [%2 p=nock q=nock] :: compose + [%3 p=nock] :: cell test + [%4 p=nock] :: increment + [%5 p=nock q=nock] :: equality test + [%6 p=nock q=nock r=nock] :: if, then, else + [%7 p=nock q=nock] :: serial compose + [%8 p=nock q=nock] :: push onto subject + [%9 p=@ q=nock] :: select arm and fire + [%10 p=[p=@ q=nock] q=nock] :: edit + [%11 p=$@(@ [p=@ q=nock]) q=nock] :: hint + [%12 p=nock q=nock] :: grab data from sky + [%0 p=@] :: axis select + == :: +``` + +### Files with Imports (`+ford`) {#files-with-imports-ford} + +`+ford` is the code builder arm (formerly vane) which handles producing code from a file on a desk, including library imports. Ford uses the `+slap`/`+slop` algebra to produce the subject type and value. + +For instance, with `/-` wuthep, Ford builds the `/sur` file then `+slop`s that `vase` with `%zuse` to yield the new build subject, the new `vase`. That combined vase is the subject when Ford `+slap`s the file contents. This illustrates the production of a high-level AST using the `+slap` of `vase` and `hoon` dependent on the `+slop` of two or more `vase`s. + +Clay maintains some state to trigger Ford builds, for instance if the agent is rebuilt after a `|commit`. Thus Gall can subscribe to the next revision of any query that can be sent to Clay. As of December 2022, Clay maintains an official state about which agents are supposed to be running from each desk, then eagerly runs the build. + +- [~rovnys-ricfer, “Ford Pinto Spec”](https://gist.github.com/belisarius222/dc2aae8230a22a84d389c19cb613a1d2) +- [~rovnys-ricfer, “Ford Fusion”](https://urbit.org/blog/ford-fusion) + +#### `-build-file` + +The `-build-file` and `-build-dependency` threads are the simplest way to invoke `+ford`. Ultimately these resolve through `/lib/strandio`: + +**`/lib/strandio.hoon`** + +```hoon +:: +:: +build-file: build the source file at the specified $beam +:: +++ build-file + |= [[=ship =desk =case] =spur] + =* arg +< + =/ m (strand ,(unit vase)) + ^- form:m + ;< =riot:clay bind:m + (warp ship desk ~ %sing %a case spur) + ?~ riot + (pure:m ~) + ?> =(%vase p.r.u.riot) + (pure:m (some !<(vase q.r.u.riot))) +``` + +We will consider marks and tubes in the next lesson. + +### Tutorial: Trace a Rune AST {#tutorial-trace-a-rune-ast} + +The [`|$` barbuc](/hoon/reference/rune/bar#-barbuc) rune is used to build a mold builder gate. (As such, it's fairly abstract.) Essentially, it is sugar for a certain use of [`|*` bartar](/hoon/reference/rune/bar#-bartar): + +```hoon +|$ [a b] +body + +|* [a=$~(* $-(* *)) b=$~(* $-(* *))] +^: +body +``` + +The formal AST for `%brbc` is `[%brbc sample=(lest term) body=spec]`. How is this actually built? + +Let's look at the sample first. The sample is a `(lest term)`, or a non-empty list of identifiers. Notice, however, that the actual value passed in (like `[a b]`) is _not_ a list at all. Several runes that accept a `list` type (like `:~` colsig) handle adding the terminal `~` in the first parsing step. + +The definition of `%brbc` is a little dense but it's doing the same thing, so the sample isn't explicitly a `lest` but becomes one in the first pass of building the rune AST. + +In `hoon.hoon`, find the following line: + +``` +[%brbc sample=(lest term) body=spec] :: |$ +``` + +Next, find where it is parsed in `+vast`, the main parsing core. This will tell you how the rune children are parsed. + +```hoon +++ expression + %- stew + ^. stet ^. limo + :~ :- '|' + ;~ pfix bar + %- stew + ^. stet ^. limo + :~ ['$' (rune buc %brbc exqe)] + == + == + == +``` + +In this case, the rune children are handled using `+exqe`, which automatically turns the sample into a list of names: + +```hoon +++ exqe |.(;~(goop lynx loan)) :: list of names then spec +``` + +In this expression, there are three new names: + +1. `+goop` is a separated `list` with doccord compatible docs included. +2. `+lynx` is a parser for `a` or `[a b c]` or `a b c ==`. +3. `+loan` parses a `spec`. + +So by the time the sample is processed, it has already acquired `list`-type from `+exqe`. + +Next, let's look at the direct handling of the result of `+exqe`: + +```hoon +[%brbc *] =- ?~ - !! + :+ %brtr + [%bccl -] + |- + ?. ?=([%gist *] body.gen) + [%ktcl body.gen] + [%note p.body.gen $(body.gen q.body.gen)] + %+ turn `(list term)`sample.gen + |= =term + ^- spec + =/ tar [%base %noun] + [%bcts term [%bcsg tar [%bchp tar tar]]] +``` + +Desugared and annotated: + +```hoon +=+ :: first parse the sample into a lest + %+ turn `(list term)`sample.gen + |= =term ^- spec + [%bcts term [%bcsg [%base %noun] [%bchp [%base %noun] [%base %noun]]]] + :: + :: if the foregoing turned list is empty, crash + ?~ - !! + :: otherwise, produce the desugared |* + :+ %brtr + :: this is the cell of the sample in the first arm + [%bccl -] + |- + ?: ?=([%gist *] body.gen) + :: if it's a gist, then include the doccords note + [%note p.body.gen $(body.gen q.body.gen)] + :: otherwise just include the body + [%ktcl body.gen] +``` + +Look for the structure: + +```hoon +|* [a=$~(* $-(* *)) b=$~(* $-(* *))] +^: +body +``` + +#### Exercise + +- Carry out the foregoing analysis for another sugar rune, such as `?~`, `:~`, `=.`, or `=/`. + +### Evaluating Nock {#evaluating-nock} + +As mentioned in `ca00`, Nock is dispatched to the Nock interpreter in the runtime, where it is evaluated as a mixture of Nock bytecode and runtime jets. Different parts of the system can have different Nock interpreters; for instance, in userspace, the `+mink` metacircular interpreter is used so that crashes can be handled. + +### Hoon and Nock Nine {#hoon-and-nock-nine} + +In a sense, Hoon is an assembly language macro for Nock as machine code. One of the most common patterns in Hoon is how a gate is defined as a core, then invoked into a particular instance by replacing the sample then evaluating the statement. (This happens in gate-building gates but elsewhere as well.) + +``` +> != %-(add [3 4]) +[8 [9 36 0 2.047] 9 2 10 [6 7 [0 3] 1 3 4] 0 2] +``` + +This code takes the `add` core, modifies its sample to be constant `[3 4]`, then fires the `$` arm at axis `2`. Then `[9 36 …]` fires the `add` arm of that core to get the `add` gate. + +You can connect how Nock thinks of a double Nock Nine invocation to how Hoon has a `%hold` then resolves it into a `%core`. For instance, let's look inside of `+add`, where we can get the `%hold` and then see the AST (on which more later). Here the `%hold` means that the “actual” type of `+add` has not yet been calculated—although it is a core, it will have to be built with a particular sample for it to result in a `%core`. (What I mean by this is that `%hold` is a lazy evaluation.) + +```hoon +> +2:!>(add) +#t/<1.otf [[a=@ b=@] <33.rnj 1.pnw %139>]> + +> +4:!>(add) +%hold + +> +10:!>(add) +#t/<33.rnj 1.pnw %139> + +> +22:!>(add) +%sgfs + +> +46:!>(add) +i='add' +> +47:!>(add) + q +[ %note + p + [ %help + p + [ cuff=~ + crib + [ summary='unsigned addition' + details=~[~[[p=%.y q='a: augend'] [p=%.y q='b: addend']]] + ] + ] + ] + q + [ %brts + p + [ %bccl + p + [ i=[%bcts p=term=%a q=[%base p=[%atom p=~.]]] + t=[i=[%bcts p=term=%b q=[%base p=[%atom p=~.]]] t=~] + ] + ] + q + [ %kthp + p=[%base p=[%atom p=~.]] + q + [ %wtcl + p=[%dtts p=[%sand p=%ud q=0] q=[%wing p=~[%a]]] + q=[%wing p=~[%b]] + r + [ %cnts + p=~[%$] + q + [ i=[p=~[%a] q=[%cncl p=[%wing p=~[%dec]] q=[i=[%wing p=~[%a]] t=~]]] + t=[i=[p=~[%b] q=[%dtls p=[%wing p=~[%b]]]] t=~] + ] + ] + ] + ] + ] +] +``` + +```hoon +> +20:!>(*add) +%core + +> +23:!>(*add) + q +[ p=[%base p=[%atom p=~.]] + q + [ %wtcl + p=[%dtts p=[%sand p=%ud q=0] q=[%wing p=~[%a]]] + q=[%wing p=~[%b]] + r + [ %cnts + p=~[%$] + q + [ i=[p=~[%a] q=[%cncl p=[%wing p=~[%dec]] q=[i=[%wing p=~[%a]] t=~]]] + t=[i=[p=~[%b] q=[%dtls p=[%wing p=~[%b]]]] t=~] + ] + ] + ] +] +``` + + +## Exercises {#exercises} + +- Exercise: The original `;<` micgal macro was unhygienic; that is, it introduced a spurious `$` each time it was called. This could be worked around using `=* foo $` after a `|-`, but was inconvenient. The original AST expansion of the macro was `[%cnls [%cnhp q [%ktcl p]] r [%brts p s]]:gen`; this was improved to `[%cnls [%cnhp q [%ktcl p]] r [%brts p [%tsgr $+3 s]]]:gen`. Explain how this works. + - `[%tsgr $+3 s]` is `=>(+ s)`, i.e., the body of the generated gate `s` should not be evaluated against the generated gate but against the payload at axis `+3`; this will be both the sample and context, where the sample is the newly bound value `.p` and the context is the subject against which the `;<` was expressed. Now it works just like `=/`. +- Exercise: Write your own [`+slam`](/hoon/reference/stdlib/5c#slam) using `+slap` and `+slop`. +- Exercise: Implement a rune. The easiest rune to implement is a five-tuple; let's call it `:#` colhax. The parser (`+vast`) and the AST processor (`+open:ap`) need to be modified for this rune to work. Formally this is a language change, but you should be able to just reload `/sys/hoon.hoon` automatically and have it update in place. Alternatively, implement the ;. micdot rune. diff --git a/docs/core-academy/ca02.md b/docs/core-academy/ca02.md new file mode 100644 index 0000000..fa7f53f --- /dev/null +++ b/docs/core-academy/ca02.md @@ -0,0 +1,2585 @@ +--- +description: "Core Academy lesson on Urbit's core architecture. Subject-oriented programming, limb resolution mechanics, nested core structures, and the Hoon standard library organization." +layout: + title: + visible: true + description: + visible: false + tableOfContents: + visible: true + outline: + visible: true + pagination: + visible: true +--- + +# 3. The Core Stack + +*This lesson goes over the nested core structures of `/sys/hoon.hoon`, `/sys/zuse.hoon`, and `/sys/lull.hoon`. We explain how subject search and limb resolution work.* + +## Subject-Oriented Programming {#subject-oriented-programming} +### Subject Search and Limb Resolution {#subject-search-and-limb-resolution} + +A face is a label for an axis in a tree. The main use for a face is to label a slot (axis) within a noun of a corresponding type. Without faces, you would have to refer to all data by numeric axis. Faces are a Hoon convention, and Nock knows nothing about labels or faces. These values are stripped out of the Nock result. In fact, it's possible but cumbersome to construct Hoon programs without labels. + +From a `$type` perspective, a face results from `=^` kettis (`foo=bar`) modifying the enclosed expression to be wrapped in `[%face %foo original-type]`. + +```hoon +> =/ a b=c=5 + c.b.a +5 + +> =/ a b=c=5 + ([%face type] -:!>(a)) + +> =/ a b=c=5 + ([%face *] -:!>(a)) +[1.701.011.814 98 1.701.011.814 99 1.836.020.833 25.717 0] +``` + +An arm name is not the same thing as a face. `+add` is the name of an arm in the standard subject. When that arm is fired, the result is the `+add` gate, which is then `+slam`med by swapping out the sample with an argument and firing the `$` arm. + +Wings are expressions that compile to an axis. A `$wing` is a `(list limb)`, or basically a `path` to a value in the subject. We can compose wings: + +1. Relatively, using lark syntax (`+>`). +2. Absolutely, using numeric syntax (`+6`, `&6`, `|6`). +3. By name, using faces and arm names. + +```hoon ++$ limb $@ term :: wing element + $% [%& p=axis] :: by geometry + [%| p=@ud q=(unit term)] :: by name + == :: ++$ axis @ :: tree address ++$ wing (list limb) :: search path +``` + +Wings are parsed by `+rope` (`+ven` for lark syntax; `+lus`+`+pam`+`+bar` for numeric syntax). `,` dot wing resolution is conducted by the `+ax:musk` door in the Hoon compiler (invoked by `+open:ap`). This resolves a wing against a sample-supplied subject. Wings resolve by depth first (in other words, from the outermost “closest” match towards the inner cores). + +```hoon +> (spec +:(ream '$:(a=@ b=@)')) +[ %bccl + p + [ i=[%bcts p=term=%a q=[%base p=[%atom p=~.]]] + t=[i=[%bcts p=term=%b q=[%base p=[%atom p=~.]]] t=~] + ] +] + +:: Produce a bunt of the given spec. +> ~(example ax (spec +:(ream '$:(a=@ b=@)'))) +[p=[%ktts p=term=%a q=[%sand p=%$ q=0]] q=[%ktts p=term=%b q=[%sand p=%$ q=0]]] + +:: Produce a normalizing gate (mold) for a given spec, as an AST. +> ~(factory ax (spec +:(ream '$:(a=@ b=@)'))) +[ %brcl + p + [ %ktsg + p +... +``` + +`^` skips a match. In the compiler, this corresponds to a number of skips. + +```hoon +> (ream '$') +[%wing p=~[%$]] + +> (ream '^$') +[%wing p=~[[%.n p=1 q=[~ %$]]]] + +> (ream '^^$') +[%wing p=~[[%.n p=2 q=[~ %$]]]] +``` + +```hoon +> (spec +:(ream '*spec')) +[%like p=~[%spec] q=~] +``` + +`.` dot is Hoon-native syntax (not rune sugar) for a wing resolution search path. `:` col is a shorthand for `=>` tisgar, and generally results in a longer Hoon AST than the `.` dot expression would. (The Nock formula may well come out the same.) + +```hoon +> =/ a [b=42] + !,(*hoon b.a) +[%wing p=~[%b %a]] + +> =/ a [b=42] + !,(*hoon b:a) +[%tsgl p=[%wing p=~[%b]] q=[%wing p=~[%a]]] + +> =/ a [b=42] + !=(b:a) +[0 2] + +> =/ a [b=42] + !=(b.a) +[0 2] +``` + +Since the subject of a core is the core itself, `..add` resolves to the core containing `+add` (which is Layer 1) and thus exposes mutual visibility between all arms in the core. + +We have made some noise in the past about arms and legs. With everything under your belt at this point, you are equipped to really understand the difference: + +- A _leg_ is a noun accessible in the current subject using a Nock Zero call. Thus a value like `=/ pi .3.1415926` would be a leg. +- An _arm_ is a noun which requires a Nock Nine call. Thus `++ pi .3.1415926` would be an arm even though it is an atom _simpliciter_. + +When the compiler dereferences a limb, it either finds an arm (in the battery of a core) or a leg (anywhere else). For an arm, it must be computed against the whole core (Nock Nine) or simply retrieved (Nock Zero). + +Arms are only pulled by name. If you retrieve them by axis or lark syntax then they are treated as raw nouns. The name of an arm is not a face. + +```hoon +> => + |% + ++ $ .+ 100 + -- + +2 +[4 1 100] + +> => + |% + ++ $ .+ 100 + -- + $ +101 +``` + +### Structure Mode {#structure-mode} + +Most Hoon is written in value mode, meaning that sugar syntax like `[]` resolves to a `:` col family rune. However, `$spec` values are written in structure mode. + +```hoon +> !,(*hoon [a=@ b=@]) +[ %cltr + p + [ i=[%ktts p=term=%a q=[%base p=[%atom p=~.]]] + t=[i=[%ktts p=term=%b q=[%base p=[%atom p=~.]]] t=~] + ] +] + +> !,(*hoon $:(a=@ b=@)) +[ %ktcl + p + [ %bccl + p + [ i=[%bcts p=term=%a q=[%base p=[%atom p=~.]]] + t=[i=[%bcts p=term=%b q=[%base p=[%atom p=~.]]] t=~] + ] + ] +] +``` + +The Hoon parser can be switched from one to the other using a leading `,` com. + +```hoon +> !,(*hoon ,[a=@ b=@]) +[ %ktcl + p + [ %bccl + p + [ i=[%bcts p=term=%a q=[%base p=[%atom p=~.]]] + t=[i=[%bcts p=term=%b q=[%base p=[%atom p=~.]]] t=~] + ] + ] +] +``` + +The root type of a structure mode quantity is a `$spec`: + +```hoon ++$ spec :: structure definition + $~ [%base %null] :: + $% [%base p=base] :: base type + [%dbug p=spot q=spec] :: set debug + [%gist p=[%help p=help] q=spec] :: formal comment + [%leaf p=term q=@] :: constant atom + [%like p=wing q=(list wing)] :: reference + [%loop p=term] :: hygienic reference + [%made p=(pair term (list term)) q=spec] :: annotate synthetic + [%make p=hoon q=(list spec)] :: composed spec + [%name p=term q=spec] :: annotate simple + [%over p=wing q=spec] :: relative to subject + :: :: + [%bcgr p=spec q=spec] :: $>, filter: require + [%bcbc p=spec q=(map term spec)] :: $$, recursion + [%bcbr p=spec q=hoon] :: $|, verify + [%bccb p=hoon] :: $_, example + [%bccl p=[i=spec t=(list spec)]] :: $:, tuple + [%bccn p=[i=spec t=(list spec)]] :: $%, head pick + [%bcdt p=spec q=(map term spec)] :: $., read-write core + [%bcgl p=spec q=spec] :: $<, filter: exclude + [%bchp p=spec q=spec] :: $-, function core + [%bckt p=spec q=spec] :: $^, cons pick + [%bcls p=stud q=spec] :: $+, standard + [%bcfs p=spec q=(map term spec)] :: $/, write-only core + [%bcmc p=hoon] :: $;, manual + [%bcpm p=spec q=hoon] :: $&, repair + [%bcsg p=hoon q=spec] :: $~, default + [%bctc p=spec q=(map term spec)] :: $`, read-only core + [%bcts p=skin q=spec] :: $=, name + [%bcpt p=spec q=spec] :: $@, atom pick + [%bcwt p=[i=spec t=(list spec)]] :: $?, full pick + [%bczp p=spec q=(map term spec)] :: $!, opaque core +``` + +A `spec` produces a mold, thus a `%core` with a `$` arm, rather than e.g. a `%cell`. + +```hoon +> -<:!>(,[a=@ b=@]) +%core + +> -<:!>([a=@ b=@]) +%cell + +> (,[%const *] [%const 10]) +[%const 10] + +> $:,[%const *] +[%const 0] + +> -:,[%const *] +[ 8 + [[6 [5 [0 12] 1 500.152.823.651] [1 500.152.823.651] 0 0] 0 13] + 8 + [5 [0 14] 0 2] + 0 + 6 +] + +> +6:,[%const *] +[500.152.823.651 0] + +> $:@ +0 + +> !<(@ (slam !>(@) !>(5))) +5 +``` + +### Cores {#cores} + +In the AST, a `%core` consists of a lot of information about the behavior of various components. + +```hoon ++$ type ... + $: %core + $= p + $: p=(unit term) + q=?(%wet %dry) + r=?(%gold %iron %lead %zinc)) + == + $= q + $: $= p + $: p=(unit term) + q=?(%wet %dry) + r=?(%gold %iron %lead %zinc)) + == + $= q type + $= r (pair seminoun (map term tome)) + == ++$ tome (pair what (map term hoon)) +``` + +The core variance is repeated because of the dry/wet gate distinction. [Core variance](/guides/core/hoon-school/R-metals) starts to make more sense once you've popped the cover off of cores this way. + +> “Suppose this core was actually _compiled_ using the modified payload instead of the one it was originally built with? Would the Nock formula we generated for the original template actually work for the modified `payload`?” + +What we're saying, in other words, is that if you produced Nock using a wet gate via more than one input, would that Nock end up the same? If so, then for a wet gate it's valid. Wetness is handled at three points in the compiler: + +1. `+hemp` dispatches the Nock formula generation slightly differently, turning off `vet` (sample nesting) in `+mint:ut`. +2. `+mint:ut` when it builds `|@` barpat wet doors and `|*` bartar wet gates. +3. `+dext:crop:ar` when `%core` types are handled, enforcing the condition that for `%wet` gates `=(q.r.q.sut q.r.q.ref)`, that the formula results are the same. + +Variance matters when comparing structural nesting. For instance, the main Gall agent type should permit checking the type of the door since it will be used as examples for building actual agent cores, but should not be reliant on things like the sample. Thus in `/sys/lull`, `+agent` is marked as `%iron` using `^|` ketbar. + +In `/sys/lull`, several shared representations like vane interfaces and `+http` are marked as `%lead` using `^?` ketwut. Bivariance here permits any kind of nesting, useful for examples for types. + +In `+deem:nest:ut` we can see how the `%read`/`%rite` permissions are directly set. + +```hoon +++ deem + |= [mel=vair ram=vair] + ^- ? + ?. |(=(mel ram) =(%lead mel) =(%gold ram)) | + ?- mel + %lead & + %gold meet + %iron dext(sut (peek(sut ref) %rite 2), ref (peek %rite 2)) + %zinc dext(sut (peek %read 2), ref (peek(sut ref) %read 2)) + == +``` + +Likewise in `+peel:ut`: + +```hoon +++ peel + |= [way=vial met=?(%gold %iron %lead %zinc)] + ^- [sam=? con=?] + ?: ?=(%gold met) [& &] + ?- way + %both [| |] + %free [& &] + %read [?=(%zinc met) |] + %rite [?=(%iron met) |] + == +``` + +Those permissions sets are the ones actually used in core behavior checks. In kernelspace, you are not strictly limited by the core type system—but you will have to manually construct handlers for other wetness/metallic behaviors and extend things to get the behavior you are aiming for. + +- [Hoon School, “Generic and Variant Cores”](/guides/core/hoon-school/R-metals) + +#### Aside: Constructing Gates + +Gates are special `$`-armed instances of doors. It's interesting to see how that particular sausage is made in `+mint:ut`: + +```hoon +[%brts *] :+ %brcb p.gen + =- [~ [[%$ ~ -] ~ ~]] + (~(put by *(map term hoon)) %$ q.gen) +[%brcb *] :+ %tsls [%kttr p.gen] + :+ %brcn ~ + %- ~(run by r.gen) + |= =tome + :- p.tome + %- ~(run by q.tome) + |= =hoon + ?~ q.gen hoon + [%tstr [p.i.q.gen ~] q.i.q.gen $(q.gen t.q.gen)] +``` + +### Arvo-Supplied Values {#arvo-supplied-values} + +Arvo values such as `our`, `eny`, and `now` are simply supplied at axes in the subject (rather than being scries). (This is why they must be explicitly provided for in generators.) Compare the following Nock results on a fakeship ~zod. + +``` +> !=(=(0 ~zod)) +[5 [1 0] 1 0] + +> !=(=(0 our)) +[5 [1 0] 0 12] +``` + +In the latter, `our` refers to a slot in the subject which needs to be looked up (at `12`) and replaced into the final evaluated noun. + +- `our` is at `12` +- `now` is at `26` +- `eny` is at `27` + +### Dynamic Dispatch {#dynamic-dispatch} + +> Static dispatch (or early binding) happens when I know at compile time which function body will be executed when I call a method. In contrast, dynamic dispatch (or run-time dispatch or virtual method call or late binding) happens when I defer that decision to run time. + +The conventional behavior of Urbit's Hoon language is to statically dispatch against limb labels known at compile time. It's somewhat difficult to get around this in userspace; for instance, to retrieve a list of faces in a core and selectively run against those that exist. Why? As we showed a moment ago, a face or an arm name is a compile-time construct that stands in for an axis in the subject. + +With a subject and the `slap`/`slop` algebra, we can effect dynamic (runtime) dispatch for an interactive interface via `slam`. For instance, Dojo does this for every input. (Cf. [ll. 530–539 in `/app/dojo.hoon`](https://github.com/urbit/urbit/blob/7a205c4b76e1871235dd6f4f4caa1cb6dc5b8c6c/pkg/arvo/app/dojo.hoon#L530).) + +### `-test` Thread {#test-thread} + +- The `/ted/test` thread invokes arms beginning with `test` in the context of the subject provided in the core. How does it do this? + + +## Kelvin Versioning {#kelvin-versioning} + +The innermost core of Hoon is the root marker for the language version. Not every part of a system should be subject to kelvin versioning: userspace generally will not be, and even in the `%base` desk many portions will not be subject to kelvin versioning. + +> Deep, onion-like layering is essential. A thin layer has no room to grow. A good example of this principle is the difference between Urbit and Lisp machines. Both Nock and Lisp are very simple axiomatic definitions of computing. But practical Lisp systems expand by extending the model, whereas Urbit layers over a frozen axiom system. +> — [~sorreg-namtyv, ~ravmel-ropdyl, “Towards a Frozen Operating System”](https://urbit.org/blog/toward-a-frozen-operating-system) + +The parts of the system subject to kelvin versioning are: + +- Nock, `%4` (liquid helium, 4.15 K). +- Hoon, `%137` (about liquid krypton, 115.8 K). (`%140` in [December 2020](https://groups.google.com/a/urbit.org/g/dev/c/2H28uz4T6xU/m/coJSpg4aAgAJ)). +- Arvo, `%236` (about liquid mercury, 234.4 K). (`%240` in December 2020). +- Lull, `%322`, (about gaseous water, 373.1 K). (`%330` in December 2020). +- Zuse, `%410`, which in a sense represents the most important kelvin for userspace developers since it's what they peg releases against. (`%420` in December 2020.) + +What parts are subject to kelvin versioning? Essentially, the things we see as _platform_: as you can see, the language, the event handler, and parts of the standard library. + +What results in a kelvin change? Not every release, even a change in a system file, motivates a kelvin decrement. The rule of thumb is that something which changes the specification of the platform burns a kelvin. In practice, although there are many kelvins yet to burn, it is more straightforward to bundle breaking changes together. This is both frugal of platform changes and generous to userspace developers. + +Formally, `/?` faswut is used to pin a version number; in practice, it is not enforced at the compiler level. + +### Telescoping Kelvins {#telescoping-kelvins} + +> The rules of telescoping are simple: +> 1. If tool B sits on platform A, either both A and B must be at absolute zero, or B must be warmer than A. +> 2. Whenever the temperature of A (the platform) declines, the temperature of B (the tool) must also decline. +> 3. B must state the version of A it was developed against. A, when loading B, must state its own current version, and the warmest version of itself with which it's backward-compatible. +> — [~sorreg-namtyv, ~ravmel-ropdyl, “Towards a Frozen Operating System”](https://urbit.org/blog/toward-a-frozen-operating-system) + +Thus if you introduced a tool into kernelspace which relies on Nock alone, you could version it at anything above 4. If it relies on Hoon, then it should be above 139. And preferably a fair bit above—fat onion rings are tastier than paper-thin ones. + + +## The Structure of Kernelspace {#the-structure-of-kernelspace} + +The kernel is constructed of nested cores from the innermost `/sys/hoon` definitions out to `/sys/zuse`. All of userspace runs outside of these cores. + +Although Arvo (`ca03`) is the operational core of Urbit, we actually require a boot process (see the boot lesson) building on the definition of Hoon itself. Thus we begin with `hoon.hoon`, `zuse.hoon`, and `lull.hoon` today before proceeding into Arvo proper. + +### Core 0 {#core-0} + +The first core consists of the Hoon version tag, currently `%139`. Since there are no documentation references to this core, we call it `0`, the “version stub”. + +```hoon +=> %139 => +:: :: +:::: 0: version stub :: + :: :: +~% %k.139 ~ ~ :: +|% +++ hoon-version + +-- => +``` + +1. This resolves down to `++ hoon-version %139` in a circuitous way. +2. The `~%` sigcen tag starts a jet registration tree. Unlike other jet registrations we have seen and will see later, this one is the root jet registration, meaning it has no parent and exports no named formulas, although it contains all of `hoon.hoon`. + + Since we refer to a “parent core” and imply a “child core“, we need to clarify something a bit counterintuitive about Urbit's subject-oriented nature. We say that the child core contains the parent core, through its `context`; and we refer to the parent core as the “inner” core, the child being “outer”. In fact, the parent/inner core is a leg in the child/outer core. + + Compare the expansion of `~/` sigfas: `~%(p +7 ~ q)`. + +Thus, all things considered, Core 0 `layer-0` is the innermost core of all of Urbit. It appears at the rightmost side when the prettyprinter shows a core: + +```hoon +> add +<1.otf [[a=@ b=@] <33.sam 1.pnw %139>]> + +> +7:add +<33.sam 1.pnw %139> + +> +7:+7:add +%139 +``` + +### Core 1 {#core-1} + +The next core contains arithmetic. Since each core can only access limbs present in its `payload` (`+3`), and in particular its `context` (`+7`), each core builds outwards on its predecessors, in this case on a foundation of straightforward integer arithmetic. + +```hoon +++ add +++ dec +++ div +++ dvr +++ gte +++ gth +++ lte +++ lth +++ max +++ min +++ mod +++ mul +++ sub +``` + +The next block are for binary tree calculations: + +```hoon +++ cap +++ mas +++ peg +``` + +Then we have some standard definitions of types and values for mold building and handling types like `unit`s. + +```hoon ++$ bite ++$ bloq +++ each ++$ gate +++ list +++ lone +++ lest ++$ mold +++ pair +++ pole +++ qual +++ quip +++ step +++ trap +++ tree +++ trel +++ unit +``` + +- Several of these, like `trel` and `qual`, are hardly used even in `hoon.hoon` but standardize named faces. +- We particularly draw your attention to `pole`, which is a faceless `list`. This has more recently shown up in contexts where it is helpful to replace supplied faces with your own, as in `+on-peek` `?+` wutlus statements. +- `each` allows you to discriminate between values on type using a flag. (This is useful when returning structures out of a parser for instance, like `(each manx marl)` where `manx` is a structure and `marl` is a `list`.) + +```hoon +> ((each @ ^) [%& 6]) +[%.y p=6] + +> ((each @ ^) [%| [6 7]]) +[%.n p=[6 7]] +``` + +### Core 2 {#core-2} + +Many practical tools live in `layer-2`, including functional tools, maps, sets, list operators, and string and formatted text operators. + +##### Unit Logic + +First up, the `unit` tools. While `unit`s are often just stripped off in userspace, there is a full-featured algebra handling `unit`s. (I'm of the opinion that these are probably underutilized because it can be hard to reason correctly with `unit`s.) + +```hoon +++ biff :: apply +++ bind :: argue +++ bond :: replace +++ both :: all the above +++ clap :: combine +++ clef :: compose +++ drop :: enlist +++ fall :: default +++ flit :: make filter +++ hunt :: first of units +++ lift :: lift mold (fmap) +++ mate :: choose +++ need :: demand +++ some :: lift (pure) +``` + +- In particular, check out the definitions of `+biff`, `+bond`, `+flit`, and `+lift`, which apply wet gates and deferred traps. + +##### List Logic + +```hoon +++ snoc +++ lure +++ fand :: all indices +++ find :: first index +++ flop :: reverse +++ gulf :: range inclusive +++ homo :: homogenize +++ join +++ bake +++ lent :: length +++ levy +++ lien :: some of +++ limo :: listify +++ murn :: maybe transform +++ oust :: remove +++ reap :: replicate +++ rear :: last item of list +++ reel :: right fold +++ roll :: left fold +++ scag :: prefix +++ skid :: separate +++ skim :: only +++ skip :: except +++ slag :: suffix +++ snag :: index +++ snip :: drop tail off list +++ sort :: quicksort +++ spin :: stateful turn +++ spun :: internal spin +++ swag :: slice +++ turn +++ weld :: concatenate +++ snap :: replace item +++ into :: insert item +++ welp :: faceless weld +++ zing :: promote +``` + +- You are likely familiar with all of these except `+lure`, which is a list builder that's unused in the system. +- Note that `+sort` (quicksort) turns off the stack trace because feedback from such a crash is liable to be a mess. + +```hoon +++ sort !. :: quicksort + ~/ %sort + |* [a=(list) b=$-([* *] ?)] + => .(a ^.(homo a)) + |- ^+ a + ?~ a ~ + =+ s=(skid t.a |:(c=i.a (b c i.a))) + %+ weld + $(a p.s) + ^+ t.a + [i.a $(a q.s)] +``` + +- Optional exercise: Implement [another sorting algorithm](https://en.wikipedia.org/wiki/Sorting_algorithm) besides quicksort in Hoon applicable to `list`s. + +##### Bitwise Logic + +```hoon +++ bex :: binary exponent +++ can :: assemble +++ cat :: concatenate +++ cut :: slice +++ end :: tail +++ fil :: fill bloqstream +++ lsh :: left-shift +++ met :: measure +++ rap :: assemble variable +++ rep :: assemble fixed +++ rev +++ rip :: disassemble +++ rsh :: right-shift +++ run :: +turn into atom +++ rut :: +turn into list +++ sew :: stitch into +++ swp :: naive rev bloq order +++ xeb :: binary logarithm +++ fe :: modulo bloq + ++ dif :: difference + ++ inv |=(b=@ (sub (dec out) (sit b))) :: inverse + ++ net |= b=@ ^- @ :: flip byte endianness + ++ out (bex (bex a)) :: mod value + ++ rol |= [b=bloq c=@ d=@] ^- @ :: roll left + ++ ror |= [b=bloq c=@ d=@] ^- @ :: roll right + ++ sum |=([b=@ c=@] (sit (add b c))) :: wrapping add + ++ sit |=(b=@ (end a b)) :: enforce modulo +++ con :: binary or +++ dis :: binary and +++ mix :: binary xor +++ not |= [a=bloq b=@ c=@] :: binary not (sized) +``` + +- These provide bitwise operators for atoms. `+fe` is barely used but seems like it could be used to organize some of the logic. +- Optional exercise: Produce `+rip` and `+sew`. + +##### Insecure Hashing + +```hoon +++ muk :: standard murmur3 +++ mug :: mug with murmur3 +++ aor +++ dor +++ gor +++ mor +``` + +- These provide simple hashing and ordering algorithms. + +The [Murmur3 algorithm](https://en.wikipedia.org/wiki/MurmurHash) is non-cryptographic hash function. `+muk` implements the 32-bit version. In pseudocode from Wikipedia: + +``` +algorithm Murmur3_32 is + // Note: In this version, all arithmetic is performed with unsigned 32-bit integers. + // In the case of overflow, the result is reduced modulo 232. + input: key, len, seed + + c1 ← 0xcc9e2d51 + c2 ← 0x1b873593 + r1 ← 15 + r2 ← 13 + m ← 5 + n ← 0xe6546b64 + + hash ← seed + + for each fourByteChunk of key do + k ← fourByteChunk + + k ← k × c1 + k ← k ROL r1 + k ← k × c2 + + hash ← hash XOR k + hash ← hash ROL r2 + hash ← (hash × m) + n + + with any remainingBytesInKey do + remainingBytes ← SwapToLittleEndian(remainingBytesInKey) + // Note: Endian swapping is only necessary on big-endian machines. + // The purpose is to place the meaningful digits towards the low end of the value, + // so that these digits have the greatest potential to affect the low range digits + // in the subsequent multiplication. Consider that locating the meaningful digits + // in the high range would produce a greater effect upon the high digits of the + // multiplication, and notably, that such high digits are likely to be discarded + // by the modulo arithmetic under overflow. We don't want that. + + remainingBytes ← remainingBytes × c1 + remainingBytes ← remainingBytes ROL r1 + remainingBytes ← remainingBytes × c2 + + hash ← hash XOR remainingBytes + + hash ← hash XOR len + + hash ← hash XOR (hash >> 16) + hash ← hash × 0x85ebca6b + hash ← hash XOR (hash >> 13) + hash ← hash × 0xc2b2ae35 + hash ← hash XOR (hash >> 16) +``` + +##### Unsigned Powers + +```hoon +++ pow :: unsigned exponent +++ sqt :: unsigned sqrt/rem +``` + +##### Container Logic + +```hoon +++ jar :: map of lists +++ jug :: map of sets +++ map +++ qeu +++ set +++ in :: set engine + ++ all :: logical AND + ++ any :: logical OR + ++ apt :: check correctness + ++ bif :: splits a by b + ++ del :: b without any a + ++ dif :: difference + ++ dig :: axis of a in b + ++ gas :: concatenate + ++ has + ++ int :: intersection + ++ put :: puts b in a, sorted + ++ rep :: reduce to product + ++ run :: apply gate to values + ++ tap :: convert to list + ++ uni :: union + ++ wyt :: size of set +++ by :: map engine + ++ all :: logical AND + ++ any :: logical OR + ++ bif :: splits a by b + ++ del :: delete at key b + ++ dif :: difference + ++ dig :: axis of b key + ++ apt :: check correctness + ++ gas :: concatenate + ++ get :: grab value by key + ++ got :: need value by key + ++ gut :: fall value by key + ++ has :: key existence check + ++ int :: intersection + ++ jab + ++ mar :: add with validation + ++ put :: adds key-value pair + ++ rep :: reduce to product + ++ rib :: transform + product + ++ run :: apply gate to values + ++ rut :: apply gate to nodes + ++ tap :: listify pairs + ++ uni :: union, merge + ++ uno :: general union + ++ urn :: apply gate to nodes + ++ wyt :: depth of map + ++ key :: set of keys + ++ val :: list of vals +++ ja :: jar engine + ++ get :: gets list by key + ++ add :: adds key-list pair +++ ju :: jug engine + ++ del :: del key-set pair + ++ gas :: concatenate + ++ get :: gets set by key + ++ has :: existence check + ++ put :: add key-set pair +++ to :: queue engine + ++ apt :: check correctness + ++ bal + ++ dep :: max depth of queue + ++ gas :: insert list to que + ++ get :: head-rest pair + ++ nip :: removes root + ++ nap :: removes root + ++ put :: insert new tail + ++ tap :: adds list to end + ++ top :: produces head +++ malt :: map from list +++ molt :: map from pair list +++ silt :: set from list +++ ly :: list from raw noun +++ my :: map from raw noun +++ sy :: set from raw noun +++ nl :: noun-to-container ops +``` + +- Jars and jugs seem oddly specific, and are only invoked in a couple of special contexts in the base distribution. +- We don't see `+ly` used often. What are the difference in these constructed lists? + +```hoon +> -:!>((ly ~[1 2 3])) +#t/it(?(@ud ?(@ud ?(@ud #!)))) + +> -:!>((homo ~[1 2 3])) +#t/it(@ud) + +> -:!>((limo ~[1 2 3])) +#t/^#3.?([i=@ud t=#3] ^#2.?([i=@ud t=#2] it(@ud))) + +> -:!>(`(list @ud)`~[1 2 3]) +#t/it(@ud) +``` + +- `+ly` uses the crash type for an empty list. + +```hoon +> (scag 0 ((list @) ~)) +~ + +> (scag 0 (ly ~)) +mull-grow +-find.b +find-fork +dojo: hoon expression failed +``` + +##### Serialization + +```hoon +++ cue :: unpack +++ jam :: pack +++ mat :: length-encode +++ rub :: length-decode +``` + +- `+jam` and `+cue` are critically important for noun communication operations. + +Here is an annotated version of `+jam`. The basic idea is to produce a serial noun (in order of head/tail): + +1. One bit marks cell or atom. +2. Next entry marks bit length of value. +3. Then the actual value. + +(`+cue` distinguishes the bit length from the value by unary until the first `0`.) + +```hoon +> `@ub`(jam ~) +0b10 +:: start at LSB, so `0` for atom, `1` for length, `0` for value (head-trimmed zero) + +> `@ub`(jam 1) +0b1100 +:: start at LSB, so `0` for atom, `01` + +> `@ub`(jam [0 0]) +0b10.1001 +:: start at LSB, so `01` for cell, then `0` for head atom, length `1`, value `0`, repeat + +> `@ub`(jam [0 1]) +0b1100.1001 + +> `@ub`(jam [1 0]) +0b1011.0001 +``` + +Functional programming combinators: + +```hoon +++ aftr :: pair after +++ cork :: compose forward +++ corl :: compose backwards +++ cury :: curry left +++ curr :: curry right +++ fore |*(a=$-(* *) |*(b=$-(* *) (pair a b))) :: pair before +++ head |*(^ ,:+<-) :: get head +++ same |*(* +<) :: identity +++ succ |=(@ +(+<)) :: successor +++ tail |*(^ ,:+<+) :: get tail +++ test |=(^ =(+<- +<+)) :: equality +++ lead |*(* |*(* [+>+< +<])) :: put head +++ late |*(* |*(* [+< +>+<])) :: put tail +``` + +##### Various Type Definitions + +```hoon +++ fn :: float, infinity, or NaN +++ dn :: decimal float, infinity, or NaN +++ rn :: parsed decimal float ++$ axis @ :: tree address ++$ bean ? :: 0=&=yes, 1=|=no ++$ flag ? ++$ char @t :: UTF8 byte ++$ cord @t :: UTF8, LSB first ++$ byts [wid=@ud dat=@] :: bytes, MSB first ++$ date [[a=? y=@ud] m=@ud t=tarp] :: parsed date ++$ knot @ta :: ASCII text ++$ noun * :: any noun ++$ path (list knot) :: like unix path ++$ pith (list iota) :: typed urbit path ++$ stud :: standard name ++$ tang (list tank) :: bottom-first error ++$ iota :: typed path segment ++$ tank ++$ tape (list @tD) :: utf8 string as list ++$ tour (list @c) :: utf32 clusters ++$ tarp [d=@ud h=@ud m=@ud s=@ud f=(list @ux)] :: parsed time ++$ term @tas :: ascii symbol ++$ wain (list cord) :: text lines ++$ wall (list tape) :: text lines +``` + +- Floating-point structs +- Paths +- Strings + +### Core 3 {#core-3} + +```hoon +++ egcd :: schneier's egcd +++ fo :: modulo prime + ++ dif + ++ exp + ++ fra + ++ inv + ++ pro + ++ sit + ++ sum +++ si :: signed integer + ++ abs |=(a=@s (add (end 0 a) (rsh 0 a))) :: absolute value + ++ dif |= [a=@s b=@s] :: subtraction + ++ dul |= [a=@s b=@] :: modulus + ++ fra |= [a=@s b=@s] :: divide + ++ new |= [a=? b=@] :: [sign value] to @s + ++ old |=(a=@s [(syn a) (abs a)]) :: [sign value] + ++ pro |= [a=@s b=@s] :: multiplication + ++ rem |=([a=@s b=@s] (dif a (pro b (fra a b)))) :: remainder + ++ sum |= [a=@s b=@s] :: addition + ++ sun |=(a=@u (mul 2 a)) :: @u to @s + ++ syn |=(a=@s =(0 (end 0 a))) :: sign test + ++ cmp |= [a=@s b=@s] :: compare +++ fl :: arb. precision fp + ++ rou + ++ rau + ++ add :: add; exact if e + ++ sub :: subtract; exact if e + ++ mul :: multiply + ++ div :: divide + ++ sqt :: square root + ++ lth :: less-than + ++ equ :: equals + ++ ibl + ++ uni + ++ xpd + ++ lug + ++ drg :: dragon4; get + ++ toj :: round to integer + ++ ned :: require ?=([%f *] a) + ++ shf :: a * 2^b; no rounding + ++ fli :: flip sign + ++ swr ?+(r r %d %u, %u %d) :: flipped rounding + ++ prc ?>((gth p 1) p) :: force >= 2 precision + ++ den d :: denorm+flush+inf exp + ++ emn v :: minimum exponent + ++ emx (sum:si emn (sun:si w)) :: maximum exponent + ++ spd [e=emn a=1] :: smallest denormal + ++ spn [e=emn a=(bex (dec prc))] :: smallest normal + ++ lfn [e=emx a=(fil 0 prc 1)] :: largest + ++ lfe (sum:si emx (sun:si prc)) :: 2^lfe is > than all + ++ zer [e=--0 a=0] + ++ rou :: round + ++ syn :: get sign + ++ abs :: absolute value + ++ add :: add + ++ ead :: exact add + ++ sub :: subtract + ++ mul :: multiply + ++ emu :: exact multiply + ++ div :: divide + ++ fma :: fused multiply-add + ++ sqt :: square root + ++ inv :: inverse + ++ sun :: uns integer to float + ++ san :: sgn integer to float + ++ lth :: less-than + ++ lte :: less-equal + ++ equ :: equal + ++ gte :: greater-equal + ++ gth :: greater-than + ++ drg :: float to decimal + ++ grd :: decimal to float + ++ toi :: round to integer @s + ++ toj :: round to integer fn +:: provided to ++rd, ++rs, ++rq, and ++rh +:: r=rounding mode: same as in ++fl +++ ff :: ieee 754 format fp + ++ sb (bex (^add w p)) :: sign bit + ++ me (dif:si (dif:si --1 b) (sun:si p)) :: minimum exponent + ++ pa + ++ sea :: @r to fn + ++ bit |= [a=fn] (bif (rou:pa a)) :: fn to @r w+ rounding + ++ bif :: fn to @r no rounding + ++ sig :: get sign + ++ exp :: get exponent + ++ add :: add + ++ sub :: subtract + ++ mul :: multiply + ++ div :: divide + ++ fma :: fused multiply-add + ++ sqt :: square root + ++ lth :: less-than + ++ lte :: less-equals + ++ equ :: equals + ++ gte :: greater-equals + ++ gth :: greater-than + ++ sun :: uns integer to @r + ++ san :: signed integer to @r + ++ toi :: round to integer + ++ drg :: @r to decimal float + ++ grd :: decimal float to @r +++ rlyd |= a=@rd ^- dn (drg:rd a) :: prep @rd for print +++ rlys |= a=@rs ^- dn (drg:rs a) :: prep @rs for print +++ rlyh |= a=@rh ^- dn (drg:rh a) :: prep @rh for print +++ rlyq |= a=@rq ^- dn (drg:rq a) :: prep @rq for print +++ ryld |= a=dn ^- @rd (grd:rd a) :: finish parsing @rd +++ ryls |= a=dn ^- @rs (grd:rs a) :: finish parsing @rs +++ rylh |= a=dn ^- @rh (grd:rh a) :: finish parsing @rh +++ rylq |= a=dn ^- @rq (grd:rq a) :: finish parsing @rq +++ rd :: double precision fp + ++ ma + ++ sea :: @rd to fn + ++ bit :: fn to @rd + ++ add ~/ %add :: add + ++ sub ~/ %sub :: subtract + ++ mul ~/ %mul :: multiply + ++ div ~/ %div :: divide + ++ fma ~/ %fma :: fused multiply-add + ++ sqt ~/ %sqt :: square root + ++ lth ~/ %lth :: less-than + ++ lte ~/ %lte :: less-equals + ++ equ ~/ %equ :: equals + ++ gte ~/ %gte :: greater-equals + ++ gth ~/ %gth :: greater-than + ++ sun |= [a=@u] ^- @rd (sun:ma a) :: uns integer to @rd + ++ san |= [a=@s] ^- @rd (san:ma a) :: sgn integer to @rd + ++ sig |= [a=@rd] ^- ? (sig:ma a) :: get sign + ++ exp |= [a=@rd] ^- @s (exp:ma a) :: get exponent + ++ toi |= [a=@rd] ^- (unit @s) (toi:ma a) :: round to integer + ++ drg |= [a=@rd] ^- dn (drg:ma a) :: @rd to decimal float + ++ grd |= [a=dn] ^- @rd (grd:ma a) :: decimal float to @rd +++ rs :: single precision fp + ++ ma + ++ sea :: @rs to fn + ++ bit :: fn to @rs + ++ add ~/ %add :: add + ++ sub ~/ %sub :: subtract + ++ mul ~/ %mul :: multiply + ++ div ~/ %div :: divide + ++ fma ~/ %fma :: fused multiply-add + ++ sqt ~/ %sqt :: square root + ++ lth ~/ %lth :: less-than + ++ lte ~/ %lte :: less-equals + ++ equ ~/ %equ :: equals + ++ gte ~/ %gte :: greater-equals + ++ gth ~/ %gth :: greater-than + ++ sun |= [a=@u] ^- @rs (sun:ma a) :: uns integer to @rs + ++ san |= [a=@s] ^- @rs (san:ma a) :: sgn integer to @rs + ++ sig |= [a=@rs] ^- ? (sig:ma a) :: get sign + ++ exp |= [a=@rs] ^- @s (exp:ma a) :: get exponent + ++ toi |= [a=@rs] ^- (unit @s) (toi:ma a) :: round to integer + ++ drg |= [a=@rs] ^- dn (drg:ma a) :: @rs to decimal float + ++ grd |= [a=dn] ^- @rs (grd:ma a) :: decimal float to @rs +++ rq :: quad precision fp + ++ ma + ++ sea :: @rq to fn + ++ bit :: fn to @rq + ++ add ~/ %add :: add + ++ sub ~/ %sub :: subtract + ++ mul ~/ %mul :: multiply + ++ div ~/ %div :: divide + ++ fma ~/ %fma :: fused multiply-add + ++ sqt ~/ %sqt :: square root + ++ lth ~/ %lth :: less-than + ++ lte ~/ %lte :: less-equals + ++ equ ~/ %equ :: equals + ++ gte ~/ %gte :: greater-equals + ++ gth ~/ %gth :: greater-than + ++ sun |= [a=@u] ^- @rq (sun:ma a) :: uns integer to @rq + ++ san |= [a=@s] ^- @rq (san:ma a) :: sgn integer to @rq + ++ sig |= [a=@rq] ^- ? (sig:ma a) :: get sign + ++ exp |= [a=@rq] ^- @s (exp:ma a) :: get exponent + ++ toi |= [a=@rq] ^- (unit @s) (toi:ma a) :: round to integer + ++ drg |= [a=@rq] ^- dn (drg:ma a) :: @rq to decimal float + ++ grd |= [a=dn] ^- @rq (grd:ma a) :: decimal float to @rq +++ rh :: half precision fp + ++ ma + ++ sea :: @rh to fn + ++ bit :: fn to @rh + ++ add ~/ %add :: add + ++ sub ~/ %sub :: subtract + ++ mul ~/ %mul :: multiply + ++ div ~/ %div :: divide + ++ fma ~/ %fma :: fused multiply-add + ++ sqt ~/ %sqt :: square root + ++ lth ~/ %lth :: less-than + ++ lte ~/ %lte :: less-equals + ++ equ ~/ %equ :: equals + ++ gte ~/ %gte :: greater-equals + ++ gth ~/ %gth :: greater-than + ++ tos :: @rh to @rs + ++ fos :: @rs to @rh + ++ sun |= [a=@u] ^- @rh (sun:ma a) :: uns integer to @rh + ++ san |= [a=@s] ^- @rh (san:ma a) :: sgn integer to @rh + ++ sig |= [a=@rh] ^- ? (sig:ma a) :: get sign + ++ exp |= [a=@rh] ^- @s (exp:ma a) :: get exponent + ++ toi |= [a=@rh] ^- (unit @s) (toi:ma a) :: round to integer + ++ drg |= [a=@rh] ^- dn (drg:ma a) :: @rh to decimal float + ++ grd |= [a=dn] ^- @rh (grd:ma a) :: decimal float to @rh +++ year :: date to @d +++ yore :: @d to date +++ yell :: tarp from @d +++ yule :: time atom +++ yall :: day / to day of year +++ yawn :: days since Jesus +++ yelp :: leap year +++ yo :: time constants + |% ++ cet 36.524 :: (add 24 (mul 100 365)) + ++ day 86.400 :: (mul 24 hor) + ++ era 146.097 :: (add 1 (mul 4 cet)) + ++ hor 3.600 :: (mul 60 mit) + ++ jes 106.751.991.084.417 :: (mul 730.692.561 era) + ++ mit 60 + ++ moh `(list @ud)`[31 28 31 30 31 30 31 31 30 31 30 31 ~] + ++ moy `(list @ud)`[31 29 31 30 31 30 31 31 30 31 30 31 ~] + ++ qad 126.144.001 :: (add 1 (mul 4 yer)) + ++ yer 31.536.000 :: (mul 365 day) +++ shad |=(ruz=@ (shax (shax ruz))) :: double sha-256 +++ shaf :: half sha-256 +++ sham :: 128bit noun hash +++ shas :: salted hash +++ shax :: sha-256 +++ shay :: sha-256 with length +++ shaw :: hash to nbits +++ shaz :: sha-512 +++ shal :: sha-512 with length +++ shan :: sha-1 (deprecated) +++ og :: shax-powered rng + ++ rad :: random in range + ++ rads :: random continuation + ++ raw :: random bits + ++ raws :: random bits +++ sha :: correct byte-order + ++ flin |=(a=@ (swp 3 a)) :: flip input + ++ flim |=(byts [wid (rev 3 wid dat)]) :: flip input w= length + ++ flip |=(w=@u (cury (cury rev 3) w)) :: flip output of size + ++ meet |=(a=@ [(met 3 a) a]) :: measure input size + ++ sha-1 (cork meet sha-1l) + ++ sha-256 :(cork flin shax (flip 32)) + ++ sha-512 :(cork flin shaz (flip 64)) + ++ sha-256l :(cork flim shay (flip 32)) + ++ sha-512l :(cork flim shal (flip 64)) + ++ sha-1l +++ un :: =(x (wred (wren x))) + ++ wren :: conceal structure + ++ wred :: restore structure + ++ xafo |=([a=@ b=@] +((mod (add (dec b) a) 255))) + ++ xaro |=([a=@ b=@] +((mod (add (dec b) (sub 255 (mod a 255))) 255))) + ++ zaft :: forward 255-sbox + ++ zart :: reverse 255-sbox + ++ zyft :: forward 256-sbox + ++ zyrt :: reverse 256-sbox +++ ob + ++ fein + ++ fynd + ++ feis + ++ tail + ++ fee + ++ feen + ++ fe + ++ fen + ++ eff + ++ raku ++$ coin $~ [%$ %ud 0] :: print format ++$ dime [p=@ta q=@] :: ++$ edge [p=hair q=(unit [p=* q=nail])] :: parsing output ++$ hair [p=@ud q=@ud] :: parsing trace +++ like |* a=$-(* *) :: generic edge ++$ nail [p=hair q=tape] :: parsing input ++$ pint [p=[p=@ q=@] q=[p=@ q=@]] :: line+column range ++$ rule _|:($:nail $:edge) :: parsing rule ++$ spot [p=path q=pint] :: range in file ++$ tone $% [%0 product=*] :: success ++$ toon $% [%0 p=*] :: success +++ wonk |* veq=_$:edge :: product from edge +``` + +### Core 4 {#core-4} + +```hoon +++ po :: phonetic base + ++ ins ~/ %ins :: parse prefix + ++ ind ~/ %ind :: parse suffix + ++ tos ~/ %tos :: fetch prefix + ++ tod ~/ %tod :: fetch suffix +++ fa :: base58check +++ at :: basic printing +++ cass :: lowercase +++ cuss :: uppercase +++ crip |=(a=tape `@t`(rap 3 a)) :: tape to cord +++ mesc :: ctrl code escape +++ runt :: prepend repeatedly +++ sand :: atom sanity +++ sane :: atom sanity +++ ruth :: biblical sanity +++ trim :: tape split +++ trip :: cord to tape +++ teff :: length utf8 +++ taft :: utf8 to utf32 +++ tuba :: utf8 to utf32 tape +++ tufa :: utf32 to utf8 tape +++ tuft :: utf32 to utf8 text +++ wack :: knot escape +++ wick :: knot unescape +++ woad :: cord unescape +++ wood :: cord escape +++ wash :: render tank at width +++ re + ++ ram + ++ win + ++ rig + ++ wig +++ show :: XX deprecated! + ++ shep + ++ shop + ++ shol +++ last |= [zyc=hair naz=hair] :: farther trace +++ lust |= [weq=char naz=hair] :: detect newline +++ bend :: conditional comp +++ comp +++ fail |=(tub=nail [p=p.tub q=~]) :: never parse +++ glue :: add rule +++ less :: no first and second +++ pfix :: discard first rule +++ plug :: first then second +++ pose :: first or second +++ simu :: first and second +++ sfix :: discard second rule +++ bass :: leftmost base +++ boss :: rightmost base +++ cold :: replace w+ constant +++ cook :: apply gate +++ easy :: always parse +++ fuss +++ full :: has to fully parse +++ funk :: add to tape first +++ here :: place-based apply +++ inde |* sef=rule :: indentation block +++ ifix +++ jest :: match a cord +++ just :: XX redundant, jest +++ knee :: callbacks +++ mask :: match char in set +++ more :: separated, * +++ most :: separated, + +++ next :: consume a char +++ perk :: parse cube fork +++ pick :: rule for ++each +++ plus |*(fel=rule ;~(plug fel (star fel))) :: +++ punt |*([a=rule] ;~(pose (stag ~ a) (easy ~))) :: +++ sear :: conditional cook +++ shim :: match char in range +++ stag :: add a label +++ stet :: +++ stew :: switch by first char +++ slug :: +++ star :: 0 or more times +++ stir +++ stun :: parse several times +++ rash |*([naf=@ sab=rule] (scan (trip naf) sab)) +++ rose |* [los=tape sab=rule] +++ rush |*([naf=@ sab=rule] (rust (trip naf) sab)) +++ rust |* [los=tape sab=rule] +++ scan |* [los=tape sab=rule] +++ ace (just ' ') :: spACE +++ bar (just '|') :: vertical BAR +++ bas (just '\\') :: Back Slash (escaped) +++ buc (just '$') :: dollars BUCks +++ cab (just '_') :: CABoose +++ cen (just '%') :: perCENt +++ col (just ':') :: COLon +++ com (just ',') :: COMma +++ doq (just '"') :: Double Quote +++ dot (just '.') :: dot dot dot ... +++ fas (just '/') :: Forward Slash +++ gal (just '<') :: Greater Left +++ gar (just '>') :: Greater Right +++ hax (just '#') :: Hash +++ hep (just '-') :: HyPhen +++ kel (just '{') :: Curly Left +++ ker (just '}') :: Curly Right +++ ket (just '^') :: CareT +++ lus (just '+') :: pLUS +++ mic (just ';') :: seMIColon +++ pal (just '(') :: Paren Left +++ pam (just '&') :: AMPersand pampersand +++ par (just ')') :: Paren Right +++ pat (just '@') :: AT pat +++ sel (just '[') :: Square Left +++ ser (just ']') :: Square Right +++ sig (just '~') :: SIGnature squiggle +++ soq (just '\'') :: Single Quote +++ tar (just '*') :: sTAR +++ tic (just '`') :: backTiCk +++ tis (just '=') :: 'tis tis, it is +++ wut (just '?') :: wut, what? +++ zap (just '!') :: zap! bang! crash!! +++ alf ;~(pose low hig) :: alphabetic +++ aln ;~(pose low hig nud) :: alphanumeric +++ alp ;~(pose low hig nud hep) :: alphanumeric and - +++ bet ;~(pose (cold 2 hep) (cold 3 lus)) :: axis syntax - + +++ bin (bass 2 (most gon but)) :: binary to atom +++ but (cook |=(a=@ (sub a '0')) (shim '0' '1')) :: binary digit +++ cit (cook |=(a=@ (sub a '0')) (shim '0' '7')) :: octal digit +++ dem (bass 10 (most gon dit)) :: decimal to atom +++ dit (cook |=(a=@ (sub a '0')) (shim '0' '9')) :: decimal digit +++ dog ;~(plug dot gay) :: . number separator +++ dof ;~(plug hep gay) :: - @q separator +++ doh ;~(plug ;~(plug hep hep) gay) :: -- phon separator +++ dun (cold ~ ;~(plug hep hep)) :: -- (stop) to ~ +++ duz (cold ~ ;~(plug tis tis)) :: == (stet) to ~ +++ gah (mask [`@`10 ' ' ~]) :: newline or ace +++ gap (cold ~ ;~(plug gaq (star ;~(pose vul gah)))) :: plural space +++ gaq ;~ pose :: end of line +++ gaw (cold ~ (star ;~(pose vul gah))) :: classic white +++ gay ;~(pose gap (easy ~)) :: +++ gon ;~(pose ;~(plug bas gay fas) (easy ~)) :: long numbers \ / +++ gul ;~(pose (cold 2 gal) (cold 3 gar)) :: axis syntax < > +++ hex (bass 16 (most gon hit)) :: hex to atom +++ hig (shim 'A' 'Z') :: uppercase +++ hit ;~ pose :: hex digits +++ iny :: indentation block +++ low (shim 'a' 'z') :: lowercase +++ mes %+ cook :: hexbyte +++ nix (boss 256 (star ;~(pose aln cab))) :: +++ nud (shim '0' '9') :: numeric +++ prn ;~(less (just `@`127) (shim 32 256)) :: non-control +++ qat ;~ pose :: chars in blockcord +++ qit ;~ pose :: chars in a cord +++ qut ;~ simu soq :: cord +++ soz ;~(plug soq soq soq) :: delimiting ''' +++ sym :: symbol +++ mixed-case-symbol +++ ven ;~ (comp |=([a=@ b=@] (peg a b))) :: +>- axis syntax +++ vit :: base64 digit +++ vul %+ cold ~ :: comments +++ ab + ++ bix (bass 16 (stun [2 2] six)) + ++ fem (sear |=(a=@ (cha:fa a)) aln) + ++ haf (bass 256 ;~(plug tep tiq (easy ~))) + ++ hef %+ sear |=(a=@ ?:(=(a 0) ~ (some a))) + ++ hif (bass 256 ;~(plug tip tiq (easy ~))) + ++ hof (bass 0x1.0000 ;~(plug hef (stun [1 3] ;~(pfix hep hif)))) + ++ huf (bass 0x1.0000 ;~(plug hef (stun [0 3] ;~(pfix hep hif)))) + ++ hyf (bass 0x1.0000 ;~(plug hif (stun [3 3] ;~(pfix hep hif)))) + ++ pev (bass 32 ;~(plug sev (stun [0 4] siv))) + ++ pew (bass 64 ;~(plug sew (stun [0 4] siw))) + ++ piv (bass 32 (stun [5 5] siv)) + ++ piw (bass 64 (stun [5 5] siw)) + ++ qeb (bass 2 ;~(plug seb (stun [0 3] sib))) + ++ qex (bass 16 ;~(plug sex (stun [0 3] hit))) + ++ qib (bass 2 (stun [4 4] sib)) + ++ qix (bass 16 (stun [4 4] six)) + ++ seb (cold 1 (just '1')) + ++ sed (cook |=(a=@ (sub a '0')) (shim '1' '9')) + ++ sev ;~(pose sed sov) + ++ sew ;~(pose sed sow) + ++ sex ;~(pose sed sox) + ++ sib (cook |=(a=@ (sub a '0')) (shim '0' '1')) + ++ sid (cook |=(a=@ (sub a '0')) (shim '0' '9')) + ++ siv ;~(pose sid sov) + ++ siw ;~(pose sid sow) + ++ six ;~(pose sid sox) + ++ sov (cook |=(a=@ (sub a 87)) (shim 'a' 'v')) + ++ sow ;~ pose + ++ sox (cook |=(a=@ (sub a 87)) (shim 'a' 'f')) + ++ ted (bass 10 ;~(plug sed (stun [0 2] sid))) + ++ tep (sear |=(a=@ ?:(=(a 'doz') ~ (ins:po a))) til) + ++ tip (sear |=(a=@ (ins:po a)) til) + ++ tiq (sear |=(a=@ (ind:po a)) til) + ++ tid (bass 10 (stun [3 3] sid)) + ++ til (boss 256 (stun [3 3] low)) + ++ urs %+ cook + ++ urt %+ cook + ++ urx %+ cook + ++ voy ;~(pfix bas ;~(pose bas soq bix)) +++ ag + ++ ape |*(fel=rule ;~(pose (cold `@`0 (just '0')) fel)) + ++ bay (ape (bass 16 ;~(plug qeb:ab (star ;~(pfix dog qib:ab))))) + ++ bip =+ tod=(ape qex:ab) + ++ dem (ape (bass 1.000 ;~(plug ted:ab (star ;~(pfix dog tid:ab))))) + ++ dim (ape dip) + ++ dip (bass 10 ;~(plug sed:ab (star sid:ab))) + ++ dum (bass 10 (plus sid:ab)) + ++ fed %+ cook fynd:ob + ++ feq %+ cook |=(a=(list @) (rep 4 (flop a))) + ++ fim (sear den:fa (bass 58 (plus fem:ab))) + ++ hex (ape (bass 0x1.0000 ;~(plug qex:ab (star ;~(pfix dog qix:ab))))) + ++ lip =+ tod=(ape ted:ab) + ++ mot ;~ pose + ++ viz (ape (bass 0x200.0000 ;~(plug pev:ab (star ;~(pfix dog piv:ab))))) + ++ vum (bass 32 (plus siv:ab)) + ++ wiz (ape (bass 0x4000.0000 ;~(plug pew:ab (star ;~(pfix dog piw:ab))))) +++ mu + ++ zag [p=(end 4 (add top bot)) q=bot] + ++ zig [p=(end 4 (add top (sub 0x1.0000 bot))) q=bot] + ++ zug (mix (lsh 4 top) bot) +++ ne + ++ c (cut 3 [tig 1] key:fa) + ++ d (add tig '0') + ++ x ?:((gte tig 10) (add tig 87) d) + ++ v ?:((gte tig 10) (add tig 87) d) + ++ w ?:(=(tig 63) '~' ?:(=(tig 62) '-' ?:((gte tig 36) (add tig 29) x))) +++ co + ++ rear |=(rom=tape rend(rep rom)) + ++ rent ~+ `@ta`(rap 3 rend) + ++ rend + ++ a-co |=(dat=@ ((d-co 1) dat)) + ++ c-co (em-co [58 1] |=([? b=@ c=tape] [~(c ne b) c])) + ++ d-co |=(min=@ (em-co [10 min] |=([? b=@ c=tape] [~(d ne b) c]))) + ++ r-co + ++ s-co + ++ v-co |=(min=@ (em-co [32 min] |=([? b=@ c=tape] [~(v ne b) c]))) + ++ w-co |=(min=@ (em-co [64 min] |=([? b=@ c=tape] [~(w ne b) c]))) + ++ x-co |=(min=@ (em-co [16 min] |=([? b=@ c=tape] [~(x ne b) c]))) + ++ y-co |=(dat=@ ((d-co 2) dat)) + ++ z-co |=(dat=@ `tape`['0' 'x' ((x-co 1) dat)]) + ++ em-co + ++ ed-co + ++ ox-co + ++ ro-co +++ so + ++ bisk + ++ crub + ++ nuck + ++ nusk + ++ perd + ++ royl + ++ royl-rh (cook rylh ;~(pfix ;~(plug sig sig) (cook royl-cell royl-rn))) + ++ royl-rq (cook rylq ;~(pfix ;~(plug sig sig sig) (cook royl-cell royl-rn))) + ++ royl-rd (cook ryld ;~(pfix sig (cook royl-cell royl-rn))) + ++ royl-rs (cook ryls (cook royl-cell royl-rn)) + ++ royl-rn + ++ royl-cell + ++ tash + ++ twid + ++ when + ++ zust +++ scot +++ scow +++ slat |=(mod=@tas |=(txt=@ta (slaw mod txt))) +++ slav |=([mod=@tas txt=@ta] (need (slaw mod txt))) +++ slaw +++ slay +++ smyt :: pretty print path +++ spat |=(pax=path (crip (spud pax))) :: render path to cord +++ spud |=(pax=path ~(ram re (smyt pax))) :: render path to tape +++ stab |=(zep=@t `path`(rash zep stap)) :: parse cord to path +++ stap :: path parser +++ stip :: typed path parser + ++ swot |=(n=nail (;~(pfix fas (more fas spot)) n)) + ++ spot +++ pout +++ pave +++ mack +++ mink !. + ++ frag + ++ edit +++ mock +++ mook + ++ skip + ++ rend +++ mole +++ mong +++ mule +++ mure +++ mute +++ slum +++ soft +``` + +If you encounter a biblical name (`$abel`, `$onan`, etc.) then you're in the prettyprinter. + +```hoon ++$ abel typo :: original sin: type ++$ alas (list (pair term hoon)) :: alias list ++$ atom @ :: just an atom ++$ aura @ta :: atom format ++$ base :: base mold ++$ woof $@(@ [~ p=hoon]) :: simple embed ++$ chum $? lef=term :: jet name ++$ coil $: p=garb :: name, wet=dry, vary ++$ garb (trel (unit term) poly vair) :: core ++$ poly ?(%wet %dry) :: polarity ++$ foot $% [%dry p=hoon] :: dry arm, geometric ++$ link :: lexical segment ++$ cuff (list link) :: parsed lex segments ++$ crib [summary=cord details=(list sect)] :: ++$ help [=cuff =crib] :: documentation ++$ limb $@ term :: wing element ++$ null ~ :: null, nil, etc ++$ onyx (list (pair type foot)) :: arm activation ++$ opal :: limb match ++$ pica (pair ? cord) :: & prose, | code ++$ palo (pair vein opal) :: wing trace, match ++$ pock (pair axis nock) :: changes ++$ port (each palo (pair type nock)) :: successful match ++$ spec :: structure definition ++$ tent :: model builder ++$ tiki :: test case ++$ skin :: texture ++$ tome (pair what (map term hoon)) :: core chapter ++$ tope :: topographic type +++ hoot :: hoon tools + +$ beer $@(char [~ p=hoon]) :: simple embed + +$ mane $@(@tas [@tas @tas]) :: XML name+space + +$ manx $~([[%$ ~] ~] [g=marx c=marl]) :: dynamic XML node + +$ marl (list tuna) :: dynamic XML nodes + +$ mart (list [n=mane v=(list beer)]) :: dynamic XML attrs + +$ marx $~([%$ ~] [n=mane a=mart]) :: dynamic XML tag + +$ mare (each manx marl) :: node or nodes + +$ maru (each tuna marl) :: interp or nodes + +$ tuna :: maybe interpolation ++$ hoon :: hoon AST ++$ tyre (list [p=term q=hoon]) :: ++$ tyke (list (unit hoon)) :: ++$ nock $^ [p=nock q=nock] :: autocons ++$ note :: type annotation ++$ type $~ %noun :: ++$ tony :: ++tone done right ++$ tine :: partial noun ++$ tool $@(term tune) :: type decoration ++$ tune :: complex ++$ typo type :: old type ++$ vase [p=type q=*] :: type-value pair ++$ vise [p=typo q=*] :: old vase ++$ vial ?(%read %rite %both %free) :: co/contra/in/bi ++$ vair ?(%gold %iron %lead %zinc) :: in/contra/bi/co ++$ vein (list (unit axis)) :: search trace ++$ sect (list pica) :: paragraph ++$ whit :: prefix docs parse ++$ whiz cord :: postfix doc parse ++$ what (unit (pair cord (list sect))) :: help slogan/section ++$ wing (list limb) :: search path ++$ block ++$ result ++$ thunk ++$ seminoun ++$ stencil ++$ output ++$ doss ++$ moan :: sample metric ++$ hump +``` + +### Core 5 {#core-5} + +##### Parsing and Compiler + +```hoon +++ musk !. :: nock with block set + ++ abet + ++ araw + ++ apex + ++ combine + ++ complete + ++ fragment + ++ mutate + ++ require + ++ squash +++ bool `type`(fork [%atom %f `0] [%atom %f `1] ~) :: make loobean +++ cell :: make %cell type +++ core :: make %core type +++ hint +++ face :: make %face type +++ fork :: make %fork type +++ cove :: extract [0 *] axis +++ comb :: combine two formulas +++ cond :: ?: compile +++ cons :: make formula cell +++ fitz :: odor compatibility +++ flan :: loobean & +++ flip :: loobean negation +++ flor :: loobean | +++ hike + ++ contains + ++ parent + ++ sibling + ++ insert +++ jock +++ look +++ loot +++ ah :: tiki engine + ++ blue + ++ teal + ++ tele + ++ gray + ++ puce + ++ wthp |= opt=(list (pair spec hoon)) + ++ wtkt |=([sic=hoon non=hoon] (gray [%wtkt puce (blue sic) (blue non)])) + ++ wtls |= [gen=hoon opt=(list (pair spec hoon))] + ++ wtpt |=([sic=hoon non=hoon] (gray [%wtpt puce (blue sic) (blue non)])) + ++ wtsg |=([sic=hoon non=hoon] (gray [%wtsg puce (blue sic) (blue non)])) + ++ wthx |=(syn=skin (gray [%wthx (tele syn) puce])) + ++ wtts |=(mod=spec (gray [%wtts (teal mod) puce])) +++ ax + ++ autoname + ++ function + ++ interface + ++ home + ++ clear + ++ basal + ++ unfold + ++ unreel + ++ descend + ++ decorate + ++ pieces + ++ spore + ++ example + ++ factory + ++ analyze + ++ basic + ++ clear + ++ fetch + ++ fetch-wing + ++ choice + ++ switch + ++ relative +++ ap :: hoon engine + ++ grip + ++ name + ++ feck + :: not used at present; see comment at %csng in ++open +::++ hail + ++ half + ++ flay + ++ open + ++ open-mane + ++ open-mart + ++ rake ~>(%mean.'rake-hoon' (need reek)) + ++ reek + ++ rusk +++ ut + ++ clip + ++ ar !: + ++ fish + ++ gain + ++ lose + ++ blow + ++ bran + ++ burp + ++ busk + ++ buss + ++ crop + ++ dext + ++ sint + ++ cool + ++ duck ^-(tank ~(duck us sut)) + ++ dune |.(duck) + ++ dunk + ++ elbo + ++ ergo + ++ endo + ++ et + ++ play + ++ mint + ++ mull + ++ epla + ++ emin + ++ emul + ++ felt !! + ++ feel :: detect existence + ++ fond + ++ pony :: raw match + |% ++ here ?: =(0 p.heg) + ++ lose [%| %& p.heg] + ++ stop ?~(q.heg here lose) + ++ twin |= [hax=pony yor=pony] + ++ $ + ++ main + ++ next + ++ find + ++ fund + ++ fine + ++ fire + ++ fish + ++ fuse + ++ gain + ++ hemp + ++ laze + ++ chapter + ++ lose + ++ chip + ++ bake + ++ balk + ++ mile + ++ mine + +$ gol-type + ++ core-check + ++ chapters-check + ++ get-tomes + ++ get-arms + ++ arms-check + ++ get-arm-type + ++ nice + ++ mint + ++ nice + ++ grow + ++ moot + ++ mull + ++ beth + ++ nice + ++ grow + ++ meet |=(ref=type &((nest | ref) (nest(sut ref) | sut))) + ++ miss :: nonintersection + ++ dext + ++ sint + ++ mite |=(ref=type |((nest | ref) (nest(sut ref) & sut))) + ++ nest + ++ deem + ++ deep + ++ dext + ++ meet &(dext dext(sut ref, ref sut)) + ++ sint + ++ peek + ++ peel + ++ play + ++ redo :: refurbish faces + ++ dear :: resolve tool stack + ++ dext :: subject traverse + ++ done :: complete assembly + ++ sint :: reduce by reference + ++ repo + ++ rest + ++ sink + ++ mup |=(* (scot %p (mug +<))) + ++ take + ++ tack + ++ tend + ++ toss + ++ wrap +++ us :: prettyprinter + +$ cape [p=(map @ud wine) q=wine] :: + +$ wine :: + ++ dash + ++ deal |=(lum=* (dish dole lum)) + ++ dial + ++ many + ++ $ + ++ dish !: + ++ doge + ++ dole + ++ duck (dial dole) +++ cain sell :: $-(vase tank) +++ noah text :: $-(vase tape) +++ onan seer :: $-(vise vase) +++ levi :: $-([type type] ?) +++ text :: tape pretty-print +++ seem |=(toy=typo `type`toy) :: promote typo +++ seer |=(vix=vise `vase`vix) :: promote vise +++ sell +++ skol +++ slam :: slam a gate +++ slab :: test if contains +++ slap +++ slog :: deify printf +++ mean :: crash with trace +++ road +++ slew :: get axis in vase +++ slim :: identical to seer? +++ slit :: type of slam +++ slob :: superficial arm +++ sloe :: get arms in core +++ slop :: cons two vases +++ slot :: got axis in vase +++ slym :: slam w+o sample-type +++ sped :: reconstruct type +++ swat +++ vang +++ vast :: main parsing core + ++ gash %+ cook :: parse path + ++ gasp ;~ pose :: parse =path= etc. + ++ glam ~+((glue ace)) + ++ hasp ;~ pose :: path element + ++ limp %+ cook + ++ mota %+ cook + ++ docs + ++ apex + ++ apse + ++ leap :: whitespace w/o docs + ++ smol + ++ larg + ++ rant + ++ skip :: non-doccord comment + ++ null (cold ~ (star ace)) + ++ text (pick line code) + ++ teyt (pick line ;~(pfix step code)) + ++ line ;~(less ace (cook crip (star prn))) + ++ code ;~(pfix step ;~(less ace (cook crip (star prn)))) + ++ step ;~(plug ace ace) + ++ into + ++ en-link + ++ clad :: hoon doccords + ++ coat :: spec doccords + ++ scye :: with prefix doccords + ++ seam :: with doccords + ++ plex :: reparse static path + ++ phax + ++ posh + ++ poof :: path -> (list hoon) + ++ poon :: try to replace '='s + ++ poor + ++ porc + ++ rump + ++ rood + ++ reed + ++ stem + ++ slip |*(r=rule (stag %hoon r)) + ++ slot |*(r=rule (sear (soft iota) r)) + ++ spit + ++ rupl +``` + +##### Sail and XML Parsing + +```hoon + ++ sail :: xml template + ++ apex :: product hoon + ++ top-level :: entry-point + ++ inline-embed :: brace interpolation + ++ script-or-style :: script or style + ++ tuna-mode :: xml node(s) kind + ++ wide-top :: wide outer top + ++ wide-inner-top :: wide inner top + ++ wide-attrs :: wide attributes + ++ wide-tail :: wide elements + ++ wide-elems :: wide elements + ++ wide-paren-elems :: wide flow + ++ drop-top + ++ join-tops + ++ wide-quote :: wide quote + ++ quote-innards :: wide+tall flow + ++ bracketed-elem :: bracketed element + ++ wrapped-elems :: wrapped tuna + ++ a-mane :: mane as hoon + ++ en-class + ++ tag-head :: tag head + ++ tall-top :: tall top + ++ tall-attrs :: tall attributes + ++ tall-elem :: tall preface + ++ hopefully-quote :: prefer "quote" form + ++ script-style-tail :: unescaped tall tail + ++ tall-tail :: tall tail + ++ tall-kids :: child elements + ++ collapse-chars :: group consec chars + ++ cram :: parse unmark ++$ mane $@(@tas [@tas @tas]) :: XML name+space ++$ manx $~([[%$ ~] ~] [g=marx c=marl]) :: dynamic XML node ++$ marl (list manx) :: XML node list ++$ mars [t=[n=%$ a=[i=[n=%$ v=tape] t=~]] c=~] :: XML cdata ++$ mart (list [n=mane v=tape]) :: XML attributes ++$ marx $~([%$ ~] [n=mane a=mart]) :: dynamic XML tag +``` + +##### Compiler + +```hoon + ++ scad + ++ scat + ++ soil + ++ sump (ifix [kel ker] (stag %cltr (most ace wide))) + ++ norm :: rune regular form + ++ structure + ++ expression + ++ boog !: + ++ bola :: ++ arms + ;~ pfix (jest '++') + ++ boba :: +$ arms + ;~ pfix (jest '+$') + ++ lynx + ++ whap !: :: chapter + ++ glow + :: we only support ++ and +$ batch comments right now + ++ whip :: chapter declare + ++ wasp :: $brcb aliases + ++ wisp !: :: core tail + ++ toad :: untrap parser expr + ++ rune :: build rune + ++ runo :: rune plus + ++ runq :: wide or tall if tol + ++ butt |* zor=rule :: closing == if tall + ++ ulva |* zor=rule :: closing -- and tall + ++ glop ~+((glue mash)) :: separated by space + ++ gunk ~+((glue muck)) :: separated list + ++ goop ~+((glue mush)) :: separator list & docs + ++ hank (most mush loaf) :: gapped hoons + ++ hunk (most mush loan) :: gapped specs + ++ jump ;~(pose leap:docs gap) :: gap before docs + ++ loaf ?:(tol tall wide) :: hoon + ++ loll ?:(tol tall(doc |) wide(doc |)) :: hoon without docs + ++ loan ?:(tol till wyde) :: spec + ++ lore (sear |=(=hoon ~(flay ap hoon)) loaf) :: skin + ++ lomp ;~(plug sym (punt ;~(pfix tis wyde))) :: typeable name + ++ mash ?:(tol gap ;~(plug com ace)) :: list separator + ++ muss ?:(tol jump ;~(plug com ace)) :: list w/ doccords + ++ muck ?:(tol gap ace) :: general separator + ++ mush ?:(tol jump ace) :: separator w/ docs + ++ teak %+ knee *tiki |. ~+ :: wing or hoon + ++ rack (most muss ;~(goop loaf loaf)) :: list [hoon hoon] + ++ ruck (most muss ;~(goop loan loaf)) :: list [spec hoon] + ++ rick (most mash ;~(goop rope loaf)) :: list [wing hoon] + ++ expa |.(loaf) :: one hoon + ++ expb |.(;~(goop loaf loaf)) :: two hoons + ++ expc |.(;~(goop loaf loaf loaf)) :: three hoons + ++ expd |.(;~(goop loaf loaf loaf loaf)) :: four hoons + ++ expe |.(wisp) :: core tail + ++ expf |.(;~(goop ;~(pfix cen sym) loaf)) :: %term and hoon + ++ expg |.(;~(gunk lomp loll loaf)) :: term/spec, two hoons + ++ exph |.((butt ;~(gunk rope rick))) :: wing, [wing hoon]s + ++ expi |.((butt ;~(goop loaf hank))) :: one or more hoons + ++ expj |.(;~(goop lore loaf)) :: skin and hoon + :: ++ expk |.(;~(gunk loaf ;~(plug loaf (easy ~)))):: list of two hoons + :: ++ expl |.(;~(gunk sym loaf loaf)) :: term, two hoons + ++ expm |.((butt ;~(gunk rope loaf rick))) :: several [spec hoon]s + ++ expn |. ;~ gunk rope loaf :: wing, hoon, + ++ expo |.(;~(goop wise loaf loaf)) :: =; + ++ expp |.(;~(goop (butt rick) loaf)) :: [wing hoon]s, hoon + ++ expq |.(;~(goop rope loaf loaf)) :: wing and two hoons + ++ expr |.(;~(goop loaf wisp)) :: hoon and core tail + ++ exps |.((butt hank)) :: closed gapped hoons + ++ expt |.(;~(gunk wise rope loaf loaf)) :: =^ + ++ expu |.(;~(gunk rope loaf (butt hank))) :: wing, hoon, hoons + :: ++ expv |.((butt rick)) :: just changes + ++ expw |.(;~(goop rope loaf loaf loaf)) :: wing and three hoons + ++ expx |.(;~(goop ropa loaf loaf)) :: wings and two hoons + ++ expy |.(loaf(bug &)) :: hoon with tracing + ++ expz |.(;~(goop loan loaf loaf loaf)) :: spec and three hoons + ++ exqa |.(loan) :: one spec + ++ exqb |.(;~(goop loan loan)) :: two specs + ++ exqc |.(;~(goop loan loaf)) :: spec then hoon + ++ exqd |.(;~(goop loaf loan)) :: hoon then spec + ++ exqe |.(;~(goop lynx loan)) :: list of names then spec + ++ exqs |.((butt hunk)) :: closed gapped specs + ++ exqg |.(;~(goop sym loan)) :: term and spec + ::++ exqk |.(;~(goop loaf ;~(plug loan (easy ~)))):: hoon with one spec + ++ exqn |.(;~(gunk loan (stag %cltr (butt hank)))):: autoconsed hoons + ++ exqr |.(;~(gunk loan ;~(plug wasp wisp))) :: spec/aliases?/tail + ::++ exqw |.(;~(goop loaf loan)) :: hoon and spec + ++ exqx |.(;~(goop loaf loan loan)) :: hoon, two specs + ++ exqy |.(;~(goop loaf loan loan loan)) :: hoon, three specs + ++ exqz |.(;~(goop loaf (butt hunk))) :: hoon, n specs + ++ txhp |. %+ cook |= [a=tiki b=(list (pair spec hoon))] + ++ tkkt |. %+ cook |= [a=tiki b=hoon c=hoon] + ++ txls |. %+ cook |= [a=tiki b=hoon c=(list (pair spec hoon))] + ++ tkvt |. %+ cook |= [a=tiki b=hoon c=hoon] + ++ tksg |. %+ cook |= [a=tiki b=hoon c=hoon] + ++ txts |. %+ cook |= [a=spec b=tiki] + ++ txhx |. %+ cook |= [a=skin b=tiki] + ++ hinb |.(;~(goop bont loaf)) :: hint and hoon + ++ hinc |. :: optional =en, hoon + ++ hind |.(;~(gunk bonk loaf ;~(goop bonz loaf))) :: jet hoon "bon"s hoon + ++ hine |.(;~(goop bonk loaf)) :: jet-hint and hoon + ++ hinf |. :: 0-3 >s, two hoons + ++ hing |. :: 0-3 >s, three hoons + ++ bonk :: jet signature + ++ hinh |. :: 1/2 numbers, hoon + ++ bont ;~ (bend) :: term, optional hoon + ++ bony (cook |=(a=(list) (lent a)) (plus tis)) :: base 1 =en count + ++ bonz :: term-labelled hoons + ++ lang :: lung sample + ++ lung + ++ long + ++ lobo (most ;~(plug com ace) ;~(glam rope wide)) + ++ loon (most ;~(plug com ace) ;~(glam wide wide)) + ++ lute :: tall [] noun + ++ ropa (most col rope) + ++ rope :: wing form + ++ wise + ++ tall :: full tall form + ++ till :: mold tall form + ++ wede :: wide bulb + ++ wide :: full wide form + ++ wyde :: mold wide form + ++ wart + ++ wert +++ vest +++ vice +++ make :: compile cord to nock +++ rain :: parse with % path +++ ream :: parse cord to hoon +++ reck :: parse hoon file +++ ride :: end-to-end compiler ++$ mite (list @ta) :: mime type ++$ pass @ :: public key ++$ ring @ :: private key ++$ ship @p :: network identity ++$ shop (each ship (list @ta)) :: urbit/dns identity ++$ spur path :: ship desk case spur ++$ time @da :: galactic time +++ pi-heck +++ pi-noon :: sample trace +++ pi-mope :: add sample +++ pi-moth :: count sample +++ pi-mumm :: print sample +++ pi-tell :: produce dump +``` + +### Core 6 {#core-6} + +Hoon is the root of the whole system—you cannot parse and build Arvo or anything else without these definitions. As part of `ca01`, you examined how `$hoon` types are built and how the AST is implemented for a basic rune. + +Outside of the language-necessary components, the `%lull` core provides kernel-wide structures (essentially, a header file) and the `%zuse` core provides a kernel-appropriate standard library. `%zuse` organizes its cores into what it terms “engines”. + +#### `%lull` + +##### Models + +```hoon +:: +capped-queue: a +qeu with a maximum number of entries +++ capped-queue +:: +:: +clock: polymorphic cache type for use with the clock replacement algorithm +++ clock +:: +:: +mop, +on: ordered map engine +++ mop +++ on + ++ all :: apply logical AND on all values + ++ any :: apply logical OR on all values + ++ apt :: verify horz/vert orderings + ++ bap :: convert to list, right-to-left + ++ del :: delete key if it exists + ++ dip :: partial inorder traversal + ++ gas :: put a list of items + ++ get :: get value at key (unit) + ++ got :: get value at key (crash) + ++ has :: check for key existence + ++ lot :: take subset range + ++ nip :: remove root (internal) + ++ pop :: produce head and rest + ++ pry :: produce head or null + ++ put :: insert ordered item + ++ ram :: produce tail or null + ++ run :: apply gate to all values + ++ tab :: tabulate subset to max count + ++ tap :: convert to list, left-to-right + ++ uni :: unify two ordered maps + ++ wyt :: measure size +``` + +- [`+on`](/hoon/reference/zuse/2m) provides the services for `+mop` ordered maps. + +```hoon ++$ deco ?(~ %bl %br %un) :: text decoration ++$ json :: normal json value ++$ life @ud :: ship key revision ++$ rift @ud :: ship continuity ++$ mime (pair mite octs) :: mimetyped data ++$ octs (pair @ud @) :: octet-stream ++$ sock (pair ship ship) :: outgoing [our his] ++$ stub (list (pair stye (list @c))) :: styled unicode ++$ stye (pair (set deco) (pair tint tint)) :: decos/bg/fg ++$ styl %+ pair (unit deco) :: cascading style ++$ styx (list $@(@t (pair styl styx))) :: styled text ++$ tint $@ ?(%r %g %b %c %m %y %k %w %~) :: text color ++$ turf (list @t) :: domain, tld first +``` + +- Various types, notably [structured text support](/guides/additional/app-workbook/track7). + +``` +++ ethereum-types + ++ address @ux + +$ event-id [block=@ud log=@ud] + ++ events (set event-id) +:: +++ azimuth-types + ++ point + +$ dnses [pri=@t sec=@t ter=@t] + ++ diff-azimuth + ++ diff-point ++$ vane-task +:: +++ http ^? + +$ header-list + +$ method + +$ request + +$ response-header + +$ http-event + ++ get-header + ++ set-header + ++ delete-header + ++ unpack-header + +$ simple-payload +``` + +- Common structures: Ethereum, Azimuth, HTTP. + +##### Networking (Ames) + +After this point, `%lull` defines types and interfaces for interacting with vanes. We will skip lightly over these, but come back to them in the appropriate lessons. + +```hoon +:::: ++ames :: (1a) network +++ ames +``` + +##### Timekeeping (Behn) + +```hoon +:::: ++behn :: (1b) timekeeping +++ behn +``` + +##### Versioning (Clay) + +```hoon +:::: ++clay :: (1c) versioning +++ clay +``` + +##### Console (Dill) + +```hoon +:::: ++dill :: (1d) console +++ dill +``` + +##### HTTP Server (Eyre) + +```hoon +:::: ++eyre :: (1e) http-server +++ eyre +``` + +##### Extensions (Gall) + +```hoon +:::: ++gall :: (1g) extensions +++ gall ^? +``` + +##### HTTP Client (Iris) + +```hoon +:: %iris http-client interface +++ iris +``` + +##### Security (Jael) + +```hoon +:::: ++jael :: (1h) security +++ jael + ++ pki +``` + +##### Threads (Khan) + +```hoon +:::: ++khan :: (1i) threads +++ khan +``` + +##### IPC (Lick) + +```hoon +:::: ++lick :: (1j) IPC +++ lick +``` + +##### Computation + +Various definitions for cards, strands, and moves. + +```hoon +++ rand :: computation ++$ gift-arvo :: out result <-$ ++$ task-arvo :: in request ->$ ++$ note-arvo :: out request $-> ++$ sign-arvo :: in result $<- ++$ unix-task :: input from unix +``` + +#### `%zuse` + +##### Cryptography + +One of the most important components of `%zuse` is the crypto library. This supplies modular arithmetic (`+fu`) and several specific algorithms. (In general, signed arithmetic in Urbit uses different names (like `+sum` instead of `+add`) to prevent accidental confusion.) Some significant portions of this include: + +```hoon +++ number ^? + ++ fu :: modulo (mul p q) + ++ dif :: subtract + ++ exp :: exponent + ++ out :: garner's formula + ++ pro :: multiply + ++ sum :: add + ++ sit :: represent + ++ curt :: curve25519 +++ crypto :: (2b) cryptography + ++ aes + ++ keccak :: (2b7) keccak family + ++ keccak-224 + ++ keccak-256 + ++ keccak-384 + ++ keccak-512 + ++ keccak + ++ sha3-224 + ++ sha3-256 + ++ sha3-384 + ++ sha3-512 + ++ sha3 + ++ hmac :: (2b8) hmac family + ++ hmac-sha1 + ++ hmac-sha256 + ++ hmac-sha512 + ++ hmac-sha1t + ++ hmac-sha256t + ++ hmac-sha512t + ++ hmac-sha1l + ++ hmac-sha256l + ++ hmac-sha512l + ++ hmac + ++ secp + ++ secp256k1 + ++ sign :: schnorr signature + ++ verify :: schnorr verify + ++ blake + ++ blake2b + ++ argon2 + ++ ripemd + ++ ripemd-160 + ++ pbkdf + ++ hmac-sha1 + ++ hmac-sha256 + ++ hmac-sha512 + ++ hmac-sha1t + ++ hmac-sha256t + ++ hmac-sha512t + ++ hmac-sha1l + ++ hmac-sha256l + ++ hmac-sha512l + ++ hmac-sha1d + ++ hmac-sha256d + ++ hmac-sha512d + ++ pbkdf +``` + +Notes on the above: + +- [Modular arithmetic](https://en.wikipedia.org/wiki/Modular_arithmetic) is used in public key systems. +- [Curve25519](https://en.wikipedia.org/wiki/Curve25519) is elliptic-curve cryptography. +- [Advanced encryption standard](https://en.wikipedia.org/wiki/Advanced_Encryption_Standard) (AES) is a cipher for data encryption. There are a lot of modes available in `+aes`. +- [KECCAK](https://en.wikipedia.org/wiki/SHA-3) is a cryptographic family underlying SHA-3. +- A [Schnorr signature](https://en.wikipedia.org/wiki/Schnorr_signature) is a digital signature scheme. +- [BLAKE](https://en.wikipedia.org/wiki/BLAKE_%28hash_function%29) is a cryptographic hash function like KECCAK. +- [Argon2](https://en.wikipedia.org/wiki/Argon2) is a key derivation function. +- [RIPEMD](https://en.wikipedia.org/wiki/RIPEMD) cryptographic hash functions are used in Bitcoin. +- [Hashed-based message authentication codes](https://en.wikipedia.org/wiki/HMAC) (HMAC) are used in shared-secret key exchange based on the [SHA-2](https://en.wikipedia.org/wiki/SHA-2) cryptographic hash algorithms, of which several are made available. (Variants of these also live in the [password-based key derivation function](https://en.wikipedia.org/wiki/PBKDF2) (PBKDF) arm.) + +##### Units + +After cryptography, there are a number of library utility functions. + +```hoon +++ unity :: (2c) unit promotion + ++ drop-list :: collapse unit list + ++ drop-map :: collapse unit map + ++ drop-pole :: collapse to tuple +``` + +- I don't see `unit`s used a lot outside of standard functions, but there are some convenience operators for them: + +```hoon +> (drop-list:unity `(list (unit @))`~[`1 `2 `3]) +[~ [i=1 t=~[2 3]]] + +> (drop-list:unity `(list (unit @))`~[`1 `2 ~]) +~ +``` + +##### Formatting Text and JSON Reparsing + +```hoon +++ format :: (2d) common formats + ++ to-wain :: cord to line list + ++ of-wain :: line list to cord + ++ of-wall :: line list to tape + ++ json-rn :: json to rn parser + ++ enjs :: json encoders + ++ frond :: object from k-v pair + ++ pairs :: object from k-v list + ++ tape :: string from tape + ++ wall :: string from wall + ++ ship :: string from ship + ++ numb :: number from unsigned + ++ sect :: s timestamp + ++ time :: ms timestamp + ++ path :: string from path + ++ tank :: tank as string arr + ++ dejs :: json reparser + ++ ar :: array as list + ++ as :: array as set + ++ at :: array as tuple + ++ bo :: boolean + ++ bu :: boolean not + ++ ci :: maybe transform + ++ cu :: transform + ++ di :: millisecond date + ++ du :: second date + ++ mu :: true unit + ++ ne :: number as real + ++ ni :: number as integer + ++ ns :: number as signed + ++ no :: number as cord + ++ nu :: parse number as hex + ++ of :: object as frond + ++ ot :: object as tuple + ++ ou :: object of units + ++ oj :: object as jug + ++ om :: object as map + ++ op :: parse keys of map + ++ pa :: string as path + ++ pe :: prefix + ++ sa :: string as tape + ++ sd :: string @ud as date + ++ se :: string as aura + ++ so :: string as cord + ++ su :: parse string + ++ uf :: unit fall + ++ un :: unit need + ++ ul :: null + ++ za :: full unit pole + ++ zl :: collapse unit list + ++ zp :: unit tuple + ++ zm :: collapse unit map + ++ klr :: styx/stub engine +++ cloy +``` + +- [`+enjs:format`](/hoon/reference/zuse/2d_1-5) supports noun-to-JSON conversions. +- [`+dejs:format`](/hoon/reference/zuse/2d_6) are the reparsers (see `+de:json:html` for the parser). Notably this is where many noun-to-text converters live. +- [`+dejs-soft`](/hoon/reference/zuse/2d_7) offers non-crashing versions of the `+dejs` arms (thus, returning `unit`s). + +##### Diffs + +Diff tools, using the [Hunt-McIlroy algorithm](https://en.wikipedia.org/wiki/Hunt%E2%80%93Szymanski_algorithm): + +```hoon +++ differ :: (2d) hunt-mcilroy + ++ berk :: invert diff patch + ++ loss :: longest subsequence + ++ lurk :: apply list patch + ++ lusk :: lcs to list patch +``` + +##### Web Text (HTML &c.) + +```hoon +++ html :: (2e) text encodings + ++ mimes :: (2e1) MIME + ++ as-octs :: atom to octstream + ++ as-octt :: tape to octstream + ++ en-mite :: mime type to text + ++ base16 :: MSB hex strings + ++ base64 :: flexible le base64 encoding + ++ en-base58 :: Bitcoin base-58 address, encode + ++ de-base58 :: Bitcoin base-58 address, decode +``` + +More JSON, this time the parser: + +```hoon +++ html + ++ json :: (2e2) JSON + ++ en :: encode JSON to tape + ++ de :: parse cord to JSON + ++ abox :: array + ++ apex :: any value + ++ bool :: boolean + ++ esca :: escaped character + ++ expo :: exponent + ++ frac :: fraction + ++ jcha :: string character + ++ mayb :: optional + ++ numb :: number + ++ obje :: object list + ++ obox :: object + ++ pear :: key-value + ++ piec :: listify + ++ stri :: string + ++ spac :: whitespace + ++ unic :: escaped UTF16 + ++ utfe :: UTF-8 sequence + ++ wish :: with whitespace + ++ sune :: cord UTF-8 sanity + ++ sung :: char UTF-8 sanity + ++ teff :: UTF-8 length +``` + +- [`+json:html`](/hoon/reference/zuse/2e_2-3) tools are reparsers (see `+dejs:format` for the reparsers). +- Since JSON do not have a single canonical form as text, these parsers provide an opportunity to see how to parse something structurally when whitespace doesn't matter. + +```hoon +++ html + ++ en-xml :: xml printer + ++ apex :: top level + ++ attr :: attributes to tape + ++ escp :: escape for xml + ++ many :: nodelist to tape + ++ name :: name to tape + ++ clot :: self-closing tags + ++ de-xml :: xml parser + ++ apex :: top level + ++ attr :: attributes + ++ cdat :: CDATA section + ++ chrd :: character data + ++ comt :: comments + ++ decl :: ++decl:de-xml:html + ++ escp :: + ++ enty :: entity + ++ empt :: self-closing tag + ++ head :: opening tag + ++ many :: contents + ++ name :: tag name + ++ tail :: closing tag + ++ whit :: whitespace + ++ en-urlt :: url encode + ++ de-urlt :: url decode + ++ en-purl :: print purl + ++ de-purl :: url+header parser + ++ en-turf :: encode as TLD-last string + ++ de-turf :: decode from TLD-last string + ++ fuel :: parse urbit fcgi +``` + +##### Wires + +```hoon +++ wired :: wire formatting + ++ dray :: load tuple in path + ++ raid :: demand path odors +``` + +##### Identity + +Retrieve your mathematical sponsor, convert a number to a rank, etc. + +```hoon +++ title :: (2j) identity + ++ sein + ++ clan :: ship to rank + +$ rank ?(%czar %king %duke %earl %pawn) :: ship width class + ++ name :: identity + ++ saxo :: autocanon + ++ sein :: autoboss + ++ cite :: render ship + ++ saxo :: autocanon + ++ sein :: autoboss + ++ team :: her + ++ moon :: her moon +``` + +##### Millisecond Timing + +Some time-related tools (currently used for timing in Eyre): + +```hoon +++ milly :: (2k) milliseconds + ++ around :: relative msec + ++ about :: unit relative msec + ++ mill :: msec diff + ++ msec :: @dr to @ud ms + ++ mull :: unit msec diff +++ contain + ++ by-clock + ++ to-capped-queue +``` + +##### Userlib + +More userspace stuff. (At this point, `%zuse` is a grab bag of things that people have added over the years, and it's not clear who needs what or if it's even in contemporary use.) + +```hoon +++ userlib :: (2u) non-vane utils + ++ chrono :: (2uB) time + ++ from-unix + ++ from-unix-ms + ++ dawn :: Jan 1 weekday + ++ daws :: date weekday + ++ deal :: to leap sec time + ++ lead :: from leap sec time + ++ dust :: print UTC format + ++ stud :: parse UTC format + ++ unm :: Urbit to Unix ms + ++ unt :: Urbit to Unix time + ++ yu :: UTC format constants + ++ space :: (2uC) file utils + ++ feel :: simple file write + ++ file :: simple file load + ++ foal :: high-level write + ++ fray :: high-level delete + ++ furl :: unify changes + ++ unix :: (2uD) unix line-list + ++ lune :: cord by unix line + ++ nule :: lines to unix cord + ++ scanf :: (2uF) exterpolation +``` + +- `+chrono:userlib` provides tools to print and parse basic UTC time statements. + +```hoon +> (dust:chrono:userlib (yore now)) +"Wed, 13 Sep 2023 20:9:38 +0000" + +> (stud:chrono:userlib '13 Sep 2023 20:9:38 +0000') +[~ [[a=%.y y=2.023] m=9 t=[d=13 h=20 m=9 s=38 f=~]]] + +> (stud:chrono:userlib 'Sep 13, 2023 20:9:38 +0000') +[~ [[a=%.y y=2.023] m=9 t=[d=13 h=20 m=9 s=38 f=~]] +``` + +(Overheard memo to self: work this into whatever [`/lib/chronos`](https://github.com/sigilante/chronos) becomes.) + +- `+space:userlib` is used by the Hood tools like `|mv`. + +#### The Compilation Subject + +Ford uses `%zuse` (thus the full standard library) as the compilation subject for a `hoon` file. Typically a userspace file will produce a core (or, in the case of some generators, a head tag and a core). That core will contain the standard library in its context because `|` bar runes (the only runes that produce cores) return cores containing the original subject in their payloads. + +Ford also allows you to modify the compilation subject by imports. (This is why you have to import files at the top of a `hoon` file, and why you do it in a particular order.) Each import is prepended to the compilation subject, so in general your compilation subject will look like `[lib1 lib2 sur1 sur2 zuse]`. + +You can see this process in `+run-prelude:ford` in `/sys/vane/clay`. (`/` fas Ford runes are actually parsed in `+parse-pile` using `+pile-rule`. This is also where `/?` faswut is ignored.) In `ca10` we'll take a deep dive through Clay. + +##### `/lib/tiny` + +The whole standard library is included in every piece of userspace Hoon, unless you go out of your way to remove it from the subject. This is only rarely a good idea, but you \*can* build a small working Hoon against a minimalist subject. For instance, this is done for the naïve rollup smart contract code and Sword (née Ares) development using `/lib/tiny`. + +```hoon +> => tiny (add 1 1) +2 + +> => tiny (add:rs 1 1) +-find.rs +dojo: hoon expression failed +``` + + +## Exercises {#exercises} + +- Implement a custom aura, `@uo` (octal/byte encoding). At one level, simply implementing an aura requires no overhead. However, the rune must have a unique parsed format for input, and should have a corresponding output. (The rules around this are laxer for more complex nouns like `set`s and `tree`s.) One format which would be compatible with the restrictions on atom syntax as well as not shadow any current atom types is `0o1234.5670` (`89acbdef` are not valid characters in octal). You can model heavily on `@ux` to implement this aura. (A [tutorial](../../hoon/aura-guide.md) is available for a degree–minute–second implementation which you can use as a guide.) +- `+sloe` is intended to receive a `$type` and return a list of the named arms in that type. Modeling on `+sloe`, produce a gate `+beau` which retrieves each face in the sample of a supplied gate argument and produces a list of them. + +```hoon +> (sloe -:!>(..add)) +~[%sub %gte %mod %min %mul %gth %add %div %lth %dec %dvr %max %lte + %mas %cap %peg %list %lest %trap %mold %tree %bloq %step %pair + %gate %bite %each %lone %qual %unit %pole %quip %trel] + +> (sloe -:!>(add)) +~[%$] +``` + +Hint: + +```hoon +> ([%cell *] +<+<:~(repo ut ~(repo ut -:!>(add)))) +[ 1.819.043.171 + [1.701.011.814 97 1.836.020.833 0 0] + 1.701.011.814 + 98 + 1.836.020.833 + 0 + 0 +] +``` diff --git a/docs/core-academy/ca03.md b/docs/core-academy/ca03.md new file mode 100644 index 0000000..a1d1a44 --- /dev/null +++ b/docs/core-academy/ca03.md @@ -0,0 +1,525 @@ +--- +description: "Core Academy lesson on Arvo's event handling architecture covering the main sequence, moves and events, ducts and wires, vane interactions, scry operations, and Arvo's ACID transaction guarantees." +layout: + title: + visible: true + description: + visible: false + tableOfContents: + visible: true + outline: + visible: true + pagination: + visible: true +--- + +# 4. Arvo I: The Main Sequence + +*This lesson covers the structure of a mature Arvo. We'll also look at event handling: wires and ducts, running vanes, vane scries, builds, and kernel upgrades.* + +> If you ever find yourself on solid ground, protect that with your life. Learn how it works and how to use it well. Then build a strong structure on top of that. Don't let it decompose into a layer of mud. (_Precepts Discussion_, C1) + +A good place to begin our study of [Arvo](https://www.youtube.com/watch?v=QtFPdBUl7XQ) is to considered the design intent to Urbit as a system: + +> Arvo is designed to avoid the usual state of complex event networks: event spaghetti. We keep track of every event's cause so that we have a clear causal chain for every computation. At the bottom of every chain is a Unix I/O event, such as a network request, terminal input, file sync, or timer event. We push every step in the path the request takes onto the chain until we get to the terminal cause of the computation. Then we use this causal stack to route results back to the caller. + +You are likely familiar with the high-level concept of Arvo as an event handler and the main ratchet of the state machine. + +$$ +L: \text{History} \rightarrow \text{State} +$$ + +Before we embark into Arvo, let's define some terms: + +- A _move_ is a “cause and action”. There is a formal `$move` type that specifies the associated call stack and action. +- An _event_ is a completed move. The event results in an updated state, or subject for future computations. A completed event is recorded immutably in the event log history. Arvo itself doesn't know about the event log and history—it is amnesiac in the sense that it only computes against its current state. + +> Arvo is a pure function f(logs) of its event log, so formally Arvo is just a function run against an event log. A naive implementation has very bad asymptotics; processing each new event is O(n) in the number of historical events. Choose the function g(state,log) such that f(logs ++ log) = g(f(logs),log). Then, as long as you keep the state in memory, processing each new event is constant in the number of previous events. This still requires O(n) restart from disk, but you can also periodically (and non-blockingly) write a checkpoint of the state to disk, so that restart from disk is only linear in the number of events since the last checkpoint. (_Precepts Discussion_ A11) + +- [“Arvo Overview”](/reference/arvo/overview) + +### Guarantees {#guarantees} + +1. Solid-state. Specifically, Urbit is a _solid-state interpreter_. What is meant by that? If you search "solid-state interpreter", then Urbit is the answer, so it's not much help! “Solid state” refers to Urbit's grounded ability to acquire and specify state, as a unique, auditable, and reproducible basis for computing. An event, once committed to the log, is immutable and permanent, never lost. + + Arvo's solid-statefulness differs from every major operating system in that no required information in the state of the OS is stored in RAM alone. (Incidentally, of course, the runtime uses RAM quite a lot, and a move being processed may be lost with a sudden shutdown.) This is one reason that Urbit is particularly write-heavy on hard drives, a consideration that hosting providers must take into account. + +2. Interpreter. Urbit is an “interpreter” in the same sense that the Java Virtual Machine is an interpreter, only Nock serves the role of the Java bytecode. This means that it can receive a noun and update itself _in situ_. + + Furthermore, Hoon is compiled and built continually throughout the lifetime of the process. + +3. Atomic. Events + +> An interrupted event never happened. The computer is deterministic; an event is a transaction; the event log is a log of successful transactions. In a sense, replaying this log is not Turing complete. The log is an existence proof that every event within it terminates. + +4. Consistency. Every update (completed event) leads to a new valid state. + +5. Isolation. Events are performed sequentially, meaning that the effects are isolated. (See the “Breadth-First Move Ordering” section below.) + +6. Durability. Completed events are permanent and immutable. No events will be reversed. But: + +> It is easy to think that "completed transaction will survive permanently" along with "the state of Arvo is pure function of its event log" implies that nothing can ever be deleted. This is not quite true. Clay is our referentially transparent file system, which could naively be thought to mean that since data must be immutable, files cannot be deleted. However, Clay can replace a file with a "tombstone" that causes Clay to crash whenever it is accessed. Referential transparency only guarantees that there won't be new data at a previously accessed location - not that it will still be available. + + +## Arvo’s Structure {#arvos-structure} + +The beating heart of Arvo is a Nock expression, formally `[%2 [%0 3] %0 2]`. This means to evaluate the `battery` formula against the current subject to yield a new `battery`, then to evaluate that second `battery` against the subject produced from evaluating the `payload` against the current subject. In other words, to process the next event. + +This is presented explicitly by the arm `+aeon`, for instance, in the `+eden` lifecycle formula generator. (The inline commentary there is worth reading in full; we'll see `+eden` again in [*Arvo II: The Boot Sequence*](ca04.md).) Elsewhere it results operationally. + +In most respects, tho, Arvo is an event processor. Most of the event processing and routing machinery is contained in `/sys/arvo`, with some plumbing on the vane-side as well. + +```hoon ++$ move [=duct =ball] :: CAUSE & ACTION + :: ++$ duct (list wire) :: CAUSAL HISTORY + :: ++$ ball (wite [vane=term task=maze] maze) :: see below ++$ ball :: DYNAMIC KERNEL + :: ACTION + $? [%hurl [%error-tag stack-trace] wite=pass-or-gift] :: action failed + [%pass wire=/vane-name/etc note=[vane=%vane-name task=[%.y p=vase]]] + :: advance + :: request + [%slip note=[vane=%vane-name task=[%.y p=vase]]] :: lateral; + :: make a request + :: as though you're + :: a different vane + [%give gift=[%.y vase] :: retreat; response + :: ++$ card (cask) :: tagged untyped + :: event + :: +++ cask |$ [a] (pair mark a) :: marked data + :: builder + :: ++$ meta (pair) :: meta-vase + :: ++$ maze (each vase meta) :: vase or meta-vase +``` + +(There is some entanglement with particular vanes, which undercuts the simplicity of the system. It would be formally nice to refactor `/sys/arvo` into a pure statement of the event log without reference to vanes, which should be largely hypothetical to an event loop aside from routing labels. Cf. [~wicrum-wicrun on “Urbit is a ball of mud”](https://gist.github.com/wicrum-wicrun/b308b298721ccbe0de9c6c73a6ee4242#urbit-is-a-ball-of-mud). On the other hand, perhaps that loses efficiency. A question for a cooler kelvin.) + +- A `$move` is a cause and effect, simply a request to complete some task in a computation on behalf of the causal stack. Think of this as message data and history metadata. A move sends an action to a location along a call stack. +- A `$duct` is a causal history, or the auditable chain of causes that leads to the current computation. +- A `$ball` is an action. This is the part we conventionally think of as a computation: one of a `%hurl` failure, a `%pass` advance, a `%slip` lateral move, or a `%give` return back down the causal chain. +- A `$card` is a tagged untyped event. (Notably, it is not `$card:agent:gall`.) A card is an event of action. Cards can be arbitrarily complicated depending on the vane and message. +- A `+cask` is a marked data builder, commonly used to transmit data over the network (since vases are local only). +- A `+meta` meta-vase is an untyped vase (e.g. a vase of vase). We'll see more of these in their `$maze` form. + +If you read up on the runtime, at some point it used to refer to an “Arvo-shaped noun”. That means that the runtime expects to receive a core with certain arms, as this is how it will systematically interact with the noun it hosts. + +Arvo defines four standard arms for vanes and the binary runtime to use: + +- `+load` is used in kernel upgrades, allowing Arvo to update itself in-place, canonically at `+4`. (Formerly a `+come` arm was defined to assist in this procedure, targeting `$type` changes.) +- `+wish` accepts a core and parses it against `%zuse`, which is instrumentation for runtime access, canonically at `+10`. (See `ca02`.) +- `+peek` grants read-only access to a vane; this is called a _scry_, canonically at `+22`. +- `+poke` accepts `$move`s and processes them; this is the only arm that actually alters Arvo’s state, canonically at `+23`. + +Each arm possesses the same structure, which means that as the Urbit OS kernel grows and changes the main event dispatcher can remain the same. For instance, when the build vane `%ford` was incorporated into `%clay`, no brain surgery was needed on Arvo to make this possible and legible. Only the affected vanes (and any calls to `%ford`) needed to change. + +There's another wrinkle to the “simplicity” of Arvo: there are actually four Arvos in `/sys/arvo`: + +1. The larval core, used in building Arvo for the first time. We'll discuss this with the boot process in [*Arvo II: The Boot Sequence*](ca04.md). +2. The structural interface core, the primary mature core, which is in operation for most of a ship's lifecycle and contains Arvo's state. It primarily calls out to the next two cores. +3. The implementation core, the actual operational core carrying out particular computations and dispatches. +4. The Section 3bE core, containing helper functions for Arvo that are sequestered for security and to maintain the four-arm prototype of Arvo. E.g., how to parse a scry path, how to negotiate versions, just Arvo's library code. Only pure functions, no state machines. + +Urbit hews to the principle that “stateless is better than stateful” (_Precepts_ A16). Most parts of the system are designed with the intent that they are stateless, or that any state is explicitly sequestered to a particular core. In Arvo, the state is the entire kernel in operation. There are persistent components (the Arvo state) and the ephemeral move worklists. + +```hoon +:: persistent arvo state +:: +=/ pit=vase !>(..is) :: +=/ vil=vile (viol p.pit) :: cached reflexives +=| $: lac=_& :: laconic bit + eny=@ :: entropy + our=ship :: identity + bud=vase :: %zuse + vanes=(map term vane) :: modules + == :: +``` + +- What is `!>(..is)` doing here? It refers to the code above this point in Arvo as a `vase`. +- `vil` is a cache of specific types—`type`, `duct`, `path`, `vase`. This saves on recompilation. +- `lac` is the verbosity dial (`|verb`). +- `eny` is the entropy. +- `our` is the ship's identity. +- `bud` is the stdlib. +- `vanes` is a list of vanes (more in a moment). + +### `+wish` and `urbit eval` {#wish-and-urbit-eval} + +Is `urbit eval` just `+wish`? Yes, it is—but it's not, in practice, any of the Arvo cores in `/sys/arvo`. In the king (the main process you'd run `urbit eval` through), it's actually an Arvo-shaped wrapper in `/lib/vere` which also sees Azimuth and Ethereum data. This is the outermost layer of the ivory pill (see [*Arvo II: The Boot Sequence*](ca04.md)). + + +## Vanes & Move Handling {#vanes-move-handling} + +### Vanes {#vanes} + +Arvo is essentially an event router between heavyweight modules but including a stack discipline. (It's sort of like using a router to bind names to callback functions.) We need to consider two aspects of event processing: + +1. Kinds of moves +2. Move mechanics + +But first, what is Arvo routing moves between? Arvo contains in its state: + +```hoon +vanes=(map term vane) +``` + +where a `$vane` is `[=vase =worm]`. (On which more later.) Arvo also stores a map from a name to a `vase` and a `worm` cache. + +```hoon +++ grow + |= way=term + ?+ way way + %a %ames + %b %behn + %c %clay + %d %dill + %e %eyre + %g %gall + %i %iris + %j %jael + %k %khan + %l %lick + == +``` + +A vane is ultimately a `vase` describing an outer gate which produces a core. The vane's state is that gate that wraps the core that has the right standard arms. Why a `vase`? Because all of this is going to be done explicitly in vase mode. + +Eliding a lot of code, here is the skeleton of a vane, in this case Behn: + +```hoon +|= our=ship +=> |% + +$ behn-state + $: %2 + timers=(tree [key=@da val=(qeu duct)]) + unix-duct=duct + next-wake=(unit @da) + drips=drip-manager + == + -- +:: +=> +=| behn-state +=* state - +|= [now=@da eny=@uvJ rof=roof] +=* behn-gate . +^? +|% +:: +call: handle a +task:behn request +:: +++ call + |= $: hen=duct + dud=(unit goof) + wrapped-task=(hobo task) + == + ^- [(list move) _behn-gate] + * * * +:: +load: migrate an old state to a new behn version +:: +++ load + |= old=behn-state + ^+ behn-gate + * * * +:: +scry: view timer state +:: +++ scry + ^- roon + |= [lyc=gang pov=path car=term bem=beam] + ^- (unit (unit cage)) + * * * +:: +++ stay state +:: +++ take + |= [tea=wire hen=duct dud=(unit goof) hin=sign] + ^- [(list move) _behn-gate] + * * * +``` + +A vane exposes five standard arms: +- `+stay` produces the state of the vane. +- `+load` migrates the state of the vane. +- `+call` handles an incoming task request. +- `+take` handles a response `sign`. +- `+scry` return the vane's state (as before a state migration). + +### Kinds of Moves {#kinds-of-moves} + +Arvo is an event loop, but until an event completes it is merely a move, a pair of cause and effect. Most of Arvo's work is done dispatching `$card`s to and from vanes, which do the actual computation. From userspace, we are accustomed to seeing `%pass` and `%gift` Gall `$card:agent:gall`s. These are actually nerfed relative to Arvo `$move`s, which contain more information. (They are laundered through `+wind`.) + +A `%pass` `move` is analogous to a call: + +```hoon +[duct %pass return-path=path vane-name=@tD data=card] +``` + +Arvo pushes the return path (preceded by the first letter of the vane name) onto the `duct` and sends the given data, a `card`, to the vane we specified. Any response will come along the same `duct` with the `wire` `return-path`. + +A `%give` `move` is analogous to a return: + +```hoon +[duct %give data=card] +``` + +Arvo pops the top `wire` off the `duct` and sends the given `card` back to the caller. + +Each vane defines a protocol for interacting with other vanes (via Arvo) by defining four types of cards for its own namespace: tasks, gifts, notes, and signs. + +> When one vane is `%pass`ed a `card` in its `task` (defined in `zuse`), Arvo activates the `+call` gate with the `card` as its argument. To produce a result, the vane `%give`s one of the `card`s defined in its `gift`. If the vane needs to request something of another vane, it `%pass`es it a `note` `card`. When that other vane returns a result, Arvo activates the `+take` gate of the initial vane with one of the `card`s defined in its `sign`. + +In other words, there are only four ways for Arvo and vanes to see a move: + +1. as a request seen by the caller, which is a `note`. +2. that same request as seen by the callee, a `task`. +3. the response to that first request as seen by the callee, a `gift`. +4. the response to the first request as seen by the caller, a `sign`. + + + +What does a vane call look like? They are specific to each vane. The top-level type for a `note` looks like this: + +```hoon ++$ note-arvo :: out request $-> + $~ [%b %wake ~] + $% [%a task:ames] + [%b task:behn] + [%c task:clay] + [%d task:dill] + [%e task:eyre] + [%g task:gall] + [%i task:iris] + [%j task:jael] + [%k task:khan] + [%l task:lick] + [%$ %whiz ~] + [@tas %meta vase] + == +``` + +`/sys/vane/behn` defines the following interface in `/sys/lull`: + +```hoon +:: :::: +:::: ++behn :: (1b) timekeeping + :: :::: +++ behn ^? + |% + +$ gift :: out result <-$ + $% [%doze p=(unit @da)] :: next alarm + [%wake error=(unit tang)] :: wakeup or failed + [%meta p=vase] + [%heck syn=sign-arvo] :: response to %huck + == + +$ task :: in request ->$ + $~ [%vega ~] :: + $% $>(%born vane-task) :: new unix process + [%rest p=@da] :: cancel alarm + [%drip p=vase] :: give in next event + [%huck syn=sign-arvo] :: give back + $>(%trim vane-task) :: trim state + $>(%vega vane-task) :: report upgrade + [%wait p=@da] :: set alarm + [%wake ~] :: timer activate + == + -- ::behn +``` + +From Behn's perspective, it can receive a task or a gift. Anything sent to another vane is a note or a sign. So if you issue a call to Behn from Gall, the lifecycle looks like this: + +1. Agent sends vane-specific `card` to Gall as `card:agent:gall`. +2. Gall `%pass`es a `note` to Arvo through the `+poke` arm. +3. Arvo `%pass`es a `task` to Behn through the `+call` arm. +4. When Behn wakes up on the timer, it `%give`s a `%gift` to Arvo through the `+poke` arm. +5. Arvo `%give`s a `sign` to Gall through the `+take` arm. +6. Gall passes a `card:agent:gall` to agent's `+on-arvo` arm. + +Besides a `%pass` (forward) or `%gift` (reverse) move, there is also a `%slip` move. + +> A `%slip` `move` is a cousin of `%pass`. Any `card` that can be `%pass`ed can also be `%slip`ed, but while a `%pass` says to "push this `wire` onto the `duct` and transfer control to the receiving vane", a `%slip` transfers control to the receiving vane without altering the `duct`. Therefore, a `%give` in response to a `%slip` will go to the caller of the vane that sent the `%slip` rather than the vane that actually sent the `%slip`. `%slip`s are much more rare than `%pass`es and `%give`s. In general, `%slip` and `%pass` `move`s are both referred to as "passes" and it should be clear from the context if one means to refer only to `%pass`es and not `%slip`s or vice versa. Lastly, we note that `%slip` is a code smell and should nearly always be avoided. It can result in unexpected behavior like receiving a gift from a vane you never passed a note to. + +In short, you `%slip` without pushing onto the `duct`so that control is given back to the top `wire` in the `duct`. + +When is `%slip` preferred? It's not—it's typically code smell. Since it returns not to you but to your caller, it violates layering and breaks abstraction. In practice, it's not used much anymore, but it shows up in a few sensitive places like the subscription lifecycle, `%init` tasks (which only happen once in the lifecycle of each ship), and in a Clay-initiated Arvo upgrade. (Joe argues this is probably legacy functionality, and can likely be cleaned up after breadth-first move ordering is completed.) + +The old docs refer to a `%unix` task, but this was always sort of ill-defined. `%lull`'s '[`$unix-task`](https://github.com/urbit/urbit/blob/5ff4e88890ee17413b35d2bd648392e3236ad9c7/pkg/arvo/sys/lull.hoon#L3488) defines a subset of expected tasks, but the way vanes work now, the `$unix-task` predicate is no longer needed so you don't think of it as a `%unix` task anymore. In actuality any arbitrary task can come in via `conn.c`. In practical terms `%aqua` (and likely `%pyro`) use this feature. + +#### Exercise + +- Produce a minimalist Arvo and vane system. This should look like three cores: an Arvo core which can handle the four basic move types; and two vanes (say `+lorem` and `+ipsum`) with minimalist `gift`/`task` interfaces and `+call`/`+take` arms. You can do this in a `%say` generator, for instance, or a `/lib`—don't worry about setting up a whole agent or modifying the kernel. (We'll build a vane later!) + +### Move Mechanics {#move-mechanics} + +Arvo basically only knows how to glue things together; in particular, it knows vanes by labels and simple interfaces alone. (See e.g. `+grow`.) Arvo's formal state (thus Urbit's formal state) is always just to be a gate which operates on an event to produce the next state. + +```hoon ++$ vane [=vase =worm] +``` + +A vane is a `vase` and a `worm`. The `vase` encompasses the `type` and the noun of the vane. Vanes are compiled with `%zuse` as their subject (thus the other inner cores as well). + +Vanes operate in vase mode. Any vane call is a `+slap` in vase mode. For instance, vanes can emit cards or metacards, a vase of a card. The `+va` and `+wa` engines assist with operating vanes. + +##### Aside: Metacards & `;;` micmic + +The vane interface is normally strictly typed, but using a metavase it can punch a hole through the `type` system. (This used to be the only way to do vase reduction before `!<` zapgal was introduced: a vane had to pass to Arvo to get into double vase mode, which Arvo would collapse into one vase mode and hand back.) + +Urbit core developers used to be more cautious about molding data. The mold system in particular was a total system prior to the introduction of structure mode in 2018: if the input mismatched, the mold would bunt (so the correct type was always returned). (Joe describes the mold system at this point as “a pile of hacks”.) If you knew the `type`, you could mold a value quickly to make it a static value. + +Since molds would bunt on input mismatch, you needed to have a fixed-point assertion that guaranteed validation, thus `;;` micmic. After the introduction of structure mode (“spec mode”) changes, molds crash if they fail (rather than bunting). So the primary reason `;;` micmic still exists is the easy of entering parser mode: + +```hoon +> (,[%foo %bar] [%foo %bar]) +[%foo %bar] + +> ;;([%foo %bar] [%foo %bar]) +[%foo %bar] +``` + +Other than that, it is essentially superfluous in contemporary Hoon. + +#### `+va`: Vane Operations + +Vanes have a definite interface as described above. `+va` is an engine for interacting with them. So the `+va` core is for actually running vanes. It accepts `$maze`s as input (like the `+call` arm, which advances the vane state). `+va` is basically a workhorse to handle vane transactions and there's not a lot to say about it as a vane. + +E.g. `+plow` is the sole arm of an inner core, designed to “operate in time and space” on a vane. It is a gate which attaches a `rook` (meta-namespace) to an Arvo-side interface for a vane and evaluates the vane within that namespace. + +Although not in `+va`, a related gate is the `+look` arm in the Section 3bE Arvo core. Each vane receives a `$roof` in its function call which it can call when it needs to scry. `+look` converts into a `+mink`-compatible version, acting as a bridge between `.^` dotket and the scry handler (`+look`) with access to the `roof` that came from the vane. This pattern imposes constraints on interpreters, but if you just gave agents a scry handler then its a gate closed over the entire vane state so one could access anything in any vane state. Thus `.^` dotket gives mediated access so you don't reveal state of the entire system in the subject (untyped permissions-free access). This uses the `+wa` interface (below) but doesn't cache because it's a stateful function, mold and path hoon compiler turns mold into a type and you get an untyped nest-check on the result. (This is where `scry-lost` comes from.) ~mastyr-bottec will ultimately replace this in-Arvo stateful cache with persistent memoization in the runtime. + +#### `+wa`: `worm` Cache + +All vane calls are fundamentally a `+slap` in vase mode, so formally Arvo runs the compiler all the time. To help with this, there's a `worm` cache to help speed that up. Thus `+wa` is where first-time computations are initiated and then stored in the cache. (The `worm` cache is part of Arvo's state; where did that slip in through the definition we had above?) + +The `worm` cache is manual memoization in the dumbest possible way. + +```hoon ++$ worm + $: :: +nest, +play, and +mint + :: + nes=(set ^) + pay=(map (pair type hoon) type) + mit=(map (pair type hoon) (pair type nock)) + == +``` + +- `nes` is s set of pairs of types that nest. (I.e., if it's in the set then it nests.) +- `pay` is a map of `type` and `hoon` to `type`. +- `mit` is a map of `type` and `hoon` to `type` and `nock`. + +There are no faces so lark notation is used throughout `+wa`. + +Each call to a vane falls into one of three categories: + +1. Cached: produce `%.y` and `wa-cache`. +2. Not cached and failed `nest-check`: produce a manual `printf`. +3. Not cached and succeeded `nest-check`: produce `%.n` and `wa-cache`. + +In Arvo, a lot of pieces have to be done manually. For instance, `+open:wa` is a manual `%~` censig on a door. (Joe points out that `+wa` is currently a messy core because half of the operations are on `vase`s and half are on `maze`s.) You have to operate untyped throughout: + +- `+slur` calls `+neat` instead of `+nest` +- `+neat` is either `vase` or `maze`, and calls `+nets` instead of `+nest` because it is untyped, then makes the `nest-check` with `+slum` (an untyped `+slam` working on a raw `nock`). +- E.g. when Arvo runs the `+scry` arm of a vane, it receives back a `vase` of `(unit (unit cage))` so there are multiple layers of nesting. This needs a sort of manual `!<` zapgal, not actually promoting into a type but still operating on it. (Both of these methods are hacks, but this is less dangerous (only true because vanes are trusted), whereas `!<` zapgal is a type hole implicating the whole system.) +- The other time we construct these is when we get raw events (any `$ovum`). + +Arvo maintains a `worm` cache for each vane, built when initializing. Ultimately this should be changed to runtime memoization instead. That will allow arbitrary associative memory (and presumably constant-time lookup), but these are currently double-`+mug` balanced treaps. + +### Runtime Connexions {#runtime-connexions} + +Two functions are of particular interest in beginning to see how the runtime handles an event: `u3_serf_work`, which applies events and produces effects; and `u3v_poke_sure`, which injects an event and saves the new state if successful. + +We'll see more about this in `ca05` when we look at the structure of Vere. + +### Breadth-First Move Ordering {#breadth-first-move-ordering} + +Planned for the future is breadth-first move ordering ([current PR](https://github.com/urbit/urbit/pull/6775)). What does this mean? (Much of this section quotes [~wicdev-wisryt's #6041 PR](https://github.com/urbit/urbit/pull/6041) description.) + +Arvo currently orders moves by depth first. Visually, when evaluating depth-first an event may look like this: + +``` +["" %unix %belt /d/term/1 ~2022.10.27..06.32.09..30db] +["|" %pass [%dill %g] [[%deal [~zod ~zod] %hood %poke] /] ~[//term/1]] +["||" %give %gall [%unto %poke-ack] i=/dill t=~[//term/1]] +["||" %pass [%gall %g] [[%deal [~zod ~zod] %dojo %poke] /use/hood/0w2.efXKi/out/~zod/dojo/drum/phat/~zod/dojo] ~[/dill //term/1]] +["|||" %give %gall [%unto %poke-ack] i=/gall/use/hood/0w2.efXKi/out/~zod/dojo/drum/phat/~zod/dojo t=~[/dill //term/1]] +["|||" %give %gall [%unto %fact] i=/gall/use/hood/0w2.efXKi/out/~zod/dojo/1/drum/phat/~zod/dojo t=~[/dill //term/1]] +["||||" %give %gall [%unto %fact] i=/dill t=~[//term/1]] +["|||||" %give %dill %blit i=/gall/use/herm/0w2.efXKi/~zod/view/ t=~[/dill //term/1]] +["|||||" %give %dill %blit i=/gall/use/herm/0w2.efXKi/~zod/view/ t=~[/dill //term/1]] +["|||||" %give %dill %blit i=/gall/use/herm/0w2.efXKi/~zod/view/ t=~[/dill //term/1]] +["|||" %give %gall [%unto %fact] i=/gall/use/hood/0w2.efXKi/out/~zod/dojo/1/drum/phat/~zod/dojo t=~[/dill //term/1]] +["||||" %give %gall [%unto %fact] i=/dill t=~[//term/1]] +["|||||" %give %dill %blit i=/gall/use/herm/0w2.efXKi/~zod/view/ t=~[/dill //term/1]] +["|||" %give %gall [%unto %fact] i=/gall/use/hood/0w2.efXKi/out/~zod/dojo/1/drum/phat/~zod/dojo t=~[/dill //term/1]] +["|||" %give %gall [%unto %fact] i=/gall/use/hood/0w2.efXKi/out/~zod/dojo/1/drum/phat/~zod/dojo t=~[/dill //term/1]] +``` + +The breadth-first equivalent looks like this: + +``` +["" %unix %belt /d/term/1 ~2022.11.2..00.18.15..ce5b] +["1" %pass [%dill %g] [[%deal [~zod ~zod] %hood %poke] /] ~[//term/1]] +["11" %give %gall [%unto %poke-ack] i=/dill t=~[//term/1]] +["12" %pass [%gall %g] [[%deal [~zod ~zod] %dojo %poke] /use/hood/0w2.Rh6DI/out/~zod/dojo/drum/phat/~zod/dojo] ~[/dill //term/1]] +["121" %give %gall [%unto %poke-ack] i=/gall/use/hood/0w2.Rh6DI/out/~zod/dojo/drum/phat/~zod/dojo t=~[/dill //term/1]] +["122" %give %gall [%unto %fact] i=/gall/use/hood/0w2.Rh6DI/out/~zod/dojo/1/drum/phat/~zod/dojo t=~[/dill //term/1]] +["123" %give %gall [%unto %fact] i=/gall/use/hood/0w2.Rh6DI/out/~zod/dojo/1/drum/phat/~zod/dojo t=~[/dill //term/1]] +["124" %give %gall [%unto %fact] i=/gall/use/hood/0w2.Rh6DI/out/~zod/dojo/1/drum/phat/~zod/dojo t=~[/dill //term/1]] +["125" %give %gall [%unto %fact] i=/gall/use/hood/0w2.Rh6DI/out/~zod/dojo/1/drum/phat/~zod/dojo t=~[/dill //term/1]] +["1221" %give %gall [%unto %fact] i=/dill t=~[//term/1]] +["1231" %give %gall [%unto %fact] i=/dill t=~[//term/1]] +["12211" %give %dill %blit i=/gall/use/herm/0w2.Rh6DI/~zod/view/ t=~[/dill //term/1]] +["12213" %give %dill %blit i=/gall/use/herm/0w2.Rh6DI/~zod/view/ t=~[/dill //term/1]] +["12215" %give %dill %blit i=/gall/use/herm/0w2.Rh6DI/~zod/view/ t=~[/dill //term/1]] +["12311" %give %dill %blit i=/gall/use/herm/0w2.Rh6DI/~zod/view/ t=~[/dill //term/1]] +``` + +These are the same 15 lines, and they're each at the same depth, but they're in a different order. Most code we write is agnostic to this order, because there are many circumstances where this order gets inverted compared to our expectations. + +The most obvious difference is that the four `%fact`s from `%dojo` to `%hood` happen one right after the other instead of being mingled with other moves. If some of those intermingled moves invoked dojo (which would be a form of reentrancy) and caused dojo to emit more facts, those facts would be given to hood _before_ the %facts which were already on the stack to be sent to hood. If this is textual output, then it will be in reverse order. The breadth-first move order fixes this problem completely by running all the facts that were issued at the same time before processing the moves that those themselves produced. + +This leads to a very important principle: breadth-first ordering guarantees that moves will be processed in the order they are emitted. Depth-first ordering constantly violates this, and in the presence of reentrancy this can cause extremely unexpected results. + +(The [#6041 PR](https://github.com/urbit/urbit/pull/6041) elaborates on the motivation for the current depth-first ordering.) + +Depth-first ordering will be deprecated in favor of breadth-first ordering, which should make the issuance of multiple cards more transparent to reason about. + +> he actually emailed me the other day (i haven't communicated with him in years) to exhort me to reject the breadth-first move ordering pr, and we had a little back-and-forth. he still believes you can use moves essentially as function calls, and if you just add enough queues to random places, it'll all work out cleanly, like adding epicycles + +#### Exercise + +- Follow a move trace. Type `|verb` into Dojo, followed by `+ls %`. The system will respond with a verbose description of how the move was processed. (Since the system will produce more output in response to subsequent events, it's easiest to copy this into a text editor for review.) You can turn `|verb` off again as a toggle. There is also an [annotated move trace](/reference/arvo/tutorials/move-trace) you can read for more perspective. + + +## OTAs {#otas} + +How does an over-the-air (OTA) update work? Essentially we transmit a noun to Arvo describing the new state and update (transition) rules. In practice, there are three possibilities when an upgrade command is sent to Arvo as a task using `+what`, picking the Hoon source out of the list there: + +1. No-op: empty list or the same version. +2. No Arvo change: just an upgrade of vanes or stdlib (`%lull`/`%zuse`). +3. New Arvo kernel, up to and including a language change. + +Any time you need your code to be flexible for future upgrades, you have to be inflexible about a few things—but as few as possible. The Arvo upgrade process is intended to constrain future Arvo as little as possible. + +Arvo knows about `/sys` and the relative precedence of `%hoon`/`%arvo`/`%lull`/`%zuse`/vanes so it can rebuild based on the topology of changes. If only `%lull`/`%zuse`/a vane are different, then you only need to build the new cores and provide them as inner cores for the new system. If, on the other hand, `%hoon` or `%arvo` are different, then we need to build the new `/sys/hoon`, compile the new `/sys/arvo`, then gather all persistent and ephemeral state plus the new upgrade state and hand that to the new `%arvo` core. (If Arvo itself changes, then it tries to get into the new world as quickly as possible.) + +This is an area where depth-first v. breadth-first move ordering makes a difference. Current DFMO means there is always a worklist to pass to the new Arvo, because there may still be moves to run. This acts as a constraint on new vanes because they may have to handle old-world moves still. BFMO means there will be more flexibility. + +Concretely, once you learn a new revision exists, you tell Clay to merge the desks. Clay talks to the publisher over Ames to retrieve the metadata, then requests the individual files using Fine (fee-NAY), the remote scry protocol. (In essence, this allows the subscribers to check out the desk updates to `%base` directly from the serving runtime rather than needing to request to the server's Arvo.) With that data, the commit process starts. This is handled by `+park` in Clay. + +`+park` has some state for the pending blob store &c. Then `+sys-update` is called, which sends the move to Arvo with all source code for the new commit. A blob is a content-addressed store of data. This is one place where `%slip` comes in handy: Clay `%slip`s a `%pork` to itself to trigger the continuation, so once Arvo is done upgrading, the next thing for Arvo to run is Clay with an empty `%pork` task—thus Clay goes right back into the same arm and continues where it left off in the commit process. + +One place complications arise is with `$type`, since a change to `$type` impacts vase mode handling. + +> Some subtleties regarding types arise when handling OTA updates, since they can potentially alter the type system. Put more concretely, the type of `type` may be updated. In that case, the update is an untyped Nock formula from the perspective of the old kernel, but ordinary typed Hoon code from the perspective of the new kernel. Besides this one detail, the only functionality of the Arvo kernel proper that is untyped are its interactions with the Unix runtime. + +Some particular details of an Arvo upgrade will be discussed later, since ultimately, OTA upgrades to Arvo have much in common with the boot process—next week's lesson, [*Arvo II: The Boot Sequence*](ca04.md), covers a new ship's boot process. + + +## Homework {#homework} + +- Annotate a move trace. Produce a move trace, such as from a generator invocation like `|pass [%d %text "foo"]`. Comment line-by-line on what is happening. (Some lines, like repeated `%hood` calls, can be grouped.) You can turn verbose logging off again using `|verb` to toggle. +- Produce a functional minimalist Arvo to boot on Vere. You need to make a gate that produce a core with four arms, each returning a gate with the correct type signature. (`/sys/arvo` can see all of `/sys/hoon`.) This should be built into a new boot pill, `baby.pill`. Phil walks through this whole process in [this video](https://www.youtube.com/watch?v=fOVhCx1a-9A). You could `%slog` input on a vane no-op. diff --git a/docs/core-academy/ca04.md b/docs/core-academy/ca04.md new file mode 100644 index 0000000..5374d59 --- /dev/null +++ b/docs/core-academy/ca04.md @@ -0,0 +1,626 @@ +--- +description: "Core Academy lesson on Arvo's boot sequence covering pill creation and loading, larval and adult phases, vane initialization, userspace bootstrapping, and the complete boot process." +layout: + title: + visible: true + description: + visible: false + tableOfContents: + visible: true + outline: + visible: true + pagination: + visible: true +--- + +# 5. Arvo II: The Boot Sequence + +*This lesson covers Arvo's boot sequence: pills, the larval phase, initializing vanes, initializing userspace.* + +> The formal state of an Arvo instance is an event history, as a linked list of nouns from first to last. The history starts with a bootstrap sequence that delivers Arvo itself, first as an inscrutable kernel, then as the self-compiling source for that kernel. (Whitepaper) + +When Arvo starts for the first time, how does it work? It tells you something of that process; as of Vere 2.12/Arvo 412 K, the boot sequence output looks like this: + +Fake ship ~fes: + +```hoon +~ +urbit 2.12 +boot: home is fes +loom: mapped 2048MB +lite: arvo formula 2a2274c9 +lite: core 4bb376f0 +lite: final state 4bb376f0 +boot: downloading pill https://bootstrap.urbit.org/urbit-v2.12.pill +boot: parsing %solid pill +dock: pace (live): configured at fes/.bin/pace +vere: binary copy succeeded +loom: mapped 2048MB +boot: protected loom +live: logical boot +boot: installed 661 jets +---------------- playback starting ---------------- +pier: replaying events 1-14 +arvo: metamorphosis +clay: kernel updated +clay: rebuilding %base after kernel update +gall: installing %acme +gall: installing %azimuth +gall: installing %dbug +gall: installing %dojo +gall: installing %eth-watcher +gall: installing %hood +drum: link [~fes %dojo] +kiln: boot +gall: installing %herm +gall: installing %lens +gall: installing %ping +gall: installing %spider +gall: installing %talk-ui +Not running %settings-store yet, got %poke +gall: installing %docket +gall: installing %treaty +gall: installing %hark-store +gall: installing %hark-system-hook +gall: installing %settings +gall: installing %settings-store +gall: installing %storage +gall: installing %reel +gall: installing %bait +gall: installing %vitals +gall: installing %growl +docket: fetching %http glob for %talk desk +docket: fetching %http glob for %garden desk +docket: fetching %http glob for %talk desk +docket: fetching %http glob for %garden desk +docket: fetching %http glob for %webterm desk +docket: fetching %http glob for %landscape desk +gall: installing %metadata-store +gall: installing %contact-store +gall: installing %chat-store +gall: installing %graph-store +gall: installing %group-store +%group-store: on-init +gall: installing %invite-store +gall: installing %s3-store +gall: installing %chat-hook +gall: installing %chat-view +gall: installing %clock +gall: installing %contact-hook +gall: installing %contact-pull-hook +gall: installing %contact-push-hook +gall: installing %contact-view +gall: installing %dm-hook +gall: installing %graph-pull-hook +gall: installing %graph-push-hook +gall: installing %group-pull-hook +gall: installing %group-push-hook +gall: installing %group-view +gall: installing %hark-chat-hook +gall: installing %hark-graph-hook +gall: installing %hark-group-hook +gall: installing %hark-invite-hook +gall: installing %invite-hook +gall: installing %invite-view +gall: installing %launch +gall: installing %metadata-hook +gall: installing %metadata-pull-hook +gall: installing %metadata-push-hook +gall: installing %observe-hook +gall: installing %sane +gall: installing %weather +gall: not running %file-server yet, got %poke +%group-store: on-peek on path /y/groups +%group-store: on-watch on path /groups +%group-store: on-watch on path /groups +%group-store: on-watch on path /groups +%group-store: on-watch on path /groups +%group-store: on-watch on path /groups +docket: fetching %http glob for %groups desk +gall: installing %groups +gall: installing %chat +gall: installing %contacts +%contacts: on-init +gall: installing %heap +gall: installing %diary +gall: installing %hark +gall: installing %notify +gall: installing %groups-ui +gall: installing %grouper +%contacts: on-poke with mark %noun +%contacts: on-agent on wire /migrate, %poke-ack +[%agent-giving-on-system-duct %diary %fact] +pier: (14): play: done +---------------- playback complete ---------------- +vere: checking version compatibility +loom: image backup complete +lick init mkdir fes/.urb/dev +ames: live on 31592 (localhost only) +conn: listening on fes/.urb/conn.sock +lick: %born failure; +http: web interface live on http://localhost:8080 +http: loopback live on http://localhost:12321 +pier (25): live +docket: fetching %http glob for %garden desk +ames: metamorphosis +; ~zod is your neighbor +~fes:dojo> +``` + +Live ship comet: + +``` +~ +urbit 2.12 +boot: home is /home/neal/comet-412 +loom: mapped 2048MB +lite: arvo formula 2a2274c9 +lite: core 4bb376f0 +lite: final state 4bb376f0 +Downloading pill https://bootstrap.urbit.org/urbit-v2.12.pill +Mining a comet. May take up to an hour. +If you want to boot faster, get an Urbit identity. +Found comet ~mipped-pinlug-loshec-tastun--tabfen-bitwex-norsul-wanzod +boot: verifying keys +Getting sponsor +boot: retrieving galaxy table +boot: retrieving network domains +boot: retrieving keys for sponsor ~wanzod +boot: retrieving keys for sponsor ~zod +boot: parsing %solid pill +pace (live): configured at /home/neal/comet-412/.bin/pace +vere: binary copy succeeded +loom: mapped 2048MB +boot: protected loom +logical boot +boot: installed 661 jets +---------------- playback starting ---------------- +pier: replaying events 1-14 +arvo: metamorphosis +gall: not running %azimuth yet, got %poke +arvo: kernel updated +clay: rebuilding %base after kernel update +gall: installing %acme +gall: installing %azimuth +gall: installing %dbug +gall: installing %dojo +gall: installing %eth-watcher +gall: installing %hood +link [~mipped-pinlug-loshec-tastun--tabfen-bitwex-norsul-wanzod %dojo] +kiln: boot +gall: installing %herm +gall: installing %lens +gall: nstalling %ping +gall: installing %spider +gall: installing %talk-ui +gall: not running %settings-store yet, got %poke +gall: installing %docket +gall: installing %treaty +gall: installing %hark-store +gall: installing %hark-system-hook +gall: installing %settings +gall: installing %settings-store +gall: installing %storage +gall: installing %reel +gall: installing %bait +gall: installing %vitals +gall: installing %growl +docket: fetching %http glob for %talk desk +docket: fetching %http glob for %garden desk +docket: fetching %http glob for %talk desk +docket: fetching %http glob for %garden desk +docket: fetching %http glob for %webterm desk +docket: fetching %http glob for %landscape desk +gall: installing %metadata-store +gall: installing %contact-store +gall: installing %chat-store +gall: installing %graph-store +gall: installing %group-store +%group-store: on-init +gall: installing %invite-store +gall: installing %s3-store +gall: installing %chat-hook +gall: installing %chat-view +gall: installing %clock +gall: installing %contact-hook +gall: installing %contact-pull-hook +gall: installing %contact-push-hook +gall: installing %contact-view +gall: installing %dm-hook +gall: installing %graph-pull-hook +gall: installing %graph-push-hook +gall: installing %group-pull-hook +gall: installing %group-push-hook +gall: installing %group-view +gall: installing %hark-chat-hook +gall: installing %hark-graph-hook +gall: installing %hark-group-hook +gall: installing %hark-invite-hook +gall: installing %invite-hook +gall: installing %invite-view +gall: installing %launch +gall: installing %metadata-hook +gall: installing %metadata-pull-hook +gall: installing %metadata-push-hook +gall: installing %observe-hook +gall: installing %sane +gall: installing %weather +gall: not running %file-server yet, got %poke +|«play»store: on-peek on path /y/groups +%group-store: on-watch on path /groups +%group-store: on-watch on path /groups +%group-store: on-watch on path /groups +%group-store: on-watch on path /groups +%group-store: on-watch on path /groups +docket: fetching %http glob for %groups desk +gall: installing %groups +gall: installing %chat +gall: installing %contacts +%contacts: on-init +gall: installing %heap +gall: installing %diary +gall: installing %hark +gall: installing %notify +gall: installing %groups-ui +gall: installing %grouper +%contacts: on-poke with mark %noun +%contacts: on-agent on wire /migrate, %poke-ack +[%agent-giving-on-system-duct %diary %fact] +pier: (14): play: done +---------------- playback complete ---------------- +vere: checking version compatibility +loom: image backup complete +lick init mkdir /home/neal/comet-412/.urb/dev +ames: live on 52253 +conn: listening on /home/neal/comet-412/.urb/conn.sock +lick: %born failure; +http: web interface live on http://localhost:8081 +http: loopback live on http://localhost:12322 +pier (25): live +docket: fetching %http glob for %garden desk +ames: czar zod.urbit.org: ip .35.247.119.159 +ames: metamorphosis +; ~zod is your neighbor +ames: czar at zod.ur +~mipped_wanzod:dojo> +``` + +At the 10,000' level, we can read the current boot process into a few discrete stages: + +1. Runtime startup +2. Boot sequence (pill) + 1. Arvo larval phase + 2. Arvo main sequence +3. Userspace startup (a lot of that output results from userspace slogs) + +In this lesson, we will examine each of these steps. + +## Runtime Startup {#runtime-startup} + +To start an Urbit ship for the first time, you have to provide a ship name and the corresponding private key. It's easiest to demonstrate this with a moon using the values obtained from `|moon`. After allocating memory, the logical boot process proceeds. + +The runtime spawns the king (`king.c`) and indirectly the serf (`serf.c`) processes. These will both run for the lifetime of the Urbit process. + +- The serf is the Nock runtime. It tracks the current state of Arvo as a noun, updating the state by poking it with nouns. It informs the king of the new state. + - Vere provides a standard serf, what was known formerly as the `urbit-worker` process. + - Sword (née Ares) can be used as a serf in its Nock interpreter capacity, but requires I/O driver support to function this way entirely for Urbit. +- The king manages snapshots of Arvo's state and interfaces with Unix. + - Vere is the only Urbit king currently. + - King Haskell was an alternative king process that was dropped for maintenance reasons. + +> The serf only ever talks to the king, while the king talks with both the serf and Unix. + +When the runtime begins, it drops into `vere/main.c` and checks the command-line options and commands. `main()` has to decide what it needs to do (i.e. the command) and then it accordingly sets global flags. If this is a first-time boot or a restart of a pier, then `main()` starts the king with `u3_king_commence()`. (In general, `main.c` isn't very Urbit-y, it's a fairly orthodox C startup file.) + +The first thing the king does is use `vere/dawn.c` to retrieve the state of Ethereum and the claimed ship's identity. If this can be verified, then the sponsor chain is retrieved and preparation for the Arvo bootstrap sequence is made. + +``` +boot: verifying keys +Getting sponsor +boot: retrieving galaxy table +boot: retrieving network domains +boot: retrieving keys for sponsor ~wanzod +boot: retrieving keys for sponsor ~zod +``` + +- See how a comet is mined in `vere/dawn.c:u3_dawn_come`. + +You can see `u3v_wish` present at several places, demonstrating the Arvo `+wish` evaluation arm. + +The boot sequence is set up in `vere/pier.c:u3_pier_boot()`, triggered by the king immediately after the `dawn.c` call. There are some runtime boilerplate issues to resolve, such as creating the `/.urb` folder for the event log and the loom. Snapshots are made and replays are checked, &c. + +Finally, the king hooks up the bootstrap from the supplied pill (`vere/king.c:_king_boot_ivory` → `noun/serial.c:u3s_cue_xeno`) and starts the main event loop (`uv_run`). `uv_run` is actually a loop handler from `libuv`, not a part of Urbit proper. It provides asynchronous I/O, which makes sense since every event in Urbit either comes from or results in a Unix system call. + +Regarding [`libuv`](https://github.com/libuv/libuv): + +> libuv's name and logo stand for "Unicorn Velociraptor", where: +> * U or Unicorn is a reference to universal and multi-platform. +> * V or Velociraptor is a reference to velocity and high-performance. + + +## Pill I {#pill-i} + +When you boot a ship, you need all the parts of the boot sequence that are not unique, as well as your private keys and up-to-date information about the PKI, and some entropy etc. The runtime provides some of this information. The pill is then the recipe for the bootstrap sequence. The bootstrap sequence is how you get to an Arvo kernel. Once you have an Arvo kernel, you can compute in the normal event timeline. + +A big part of the practical complexity is obtaining identity and keys from Azimuth. You need your own keys of course, but you need the public keys of anyone you need to talk to. So you start with the galaxy table (hard-coded) and can build the sponsorship chain by construction. Then you can get the rest from an Ethereum node. + +We also want to avoid booting into an invalid state. + +The pill contains: + +1. A list of Nock events to create an Arvo kernel. +2. A list of Arvo events to follow once the Arvo kernel has been created. +3. A list of userspace events to follow that setup. + +There are three main pill types: + +- An [**ivory pill**](https://github.com/urbit/urbit/blob/develop/pkg/arvo/gen/pill/ivory.hoon) is a runtime support pill compiled into the binary. It produces just the `%zuse` core for use by Vere's I/O process. (This prevents needing to redefine certain parts of the Hoon stdlib functionality in Vere.) + +```hoon +.ivory/pill +ivory %base +``` + +- A [**brass pill**](https://github.com/urbit/urbit/blob/develop/pkg/arvo/gen/pill/brass.hoon) is a complete bootstrap sequence including the vanes being recompiled against a target `%base` desk (the first argument). + +```hoon +.brass/pill +pill/brass %base +``` + +> A brass pill is a recipe for a complete bootstrap sequence, starting with a bootstrap Hoon compiler as a Nock formula. It compiles a Hoon compiler from source, then uses it to compile everything else in the kernel. (`~master-morzod`) + +For instance, the [developer pill](/blog/dev-pill) is produced as a brass pill. + +- A [**solid pill**](https://github.com/urbit/urbit/blob/develop/pkg/arvo/gen/pill/solid.hoon) is a kernel developer expedient, which doesn't recompile the vanes the way a brass pill does. + +- A **baby pill** is a minimalist pill, like the one you produced for [*Arvo II: The Boot Sequence*](ca04.md)'s homework. (~wicdev-wisryt walks through the process of creating a baby pill [here](https://www.youtube.com/watch?v=fOVhCx1a-9A).) + + +## Bootstrapping {#bootstrapping} + +> Before we plug the newborn node into the network, we feed it a series of bootstrap or ``larval'' packets that prepare it for adult life as a packet transceiver on the public network. The larval sequence is private, solving the secret delivery problem, and can contain as much code as we like. (Whitepaper) + +If Nock is a frozen function from nouns, then we can define an OS. That OS, Arvo, guarantees that the state of your ship is a pure function of the things that have happened to it. The event log is a linked list of events, operated on by what the whitepaper calls a “functional BIOS”: `[2 [0 3] [0 2]]`. + +So the formula starts with the first event. What's in the first event? It performs the bootstrap of Arvo itself then loops to take the events off one at a time. + +When we boot a ship, the runtime implements this directly using `u3v_boot`, which is given a list and runs the formula from a solid pill. + +An `$ovum` is a pair of `wire` (routing data) and `card`. A `$card` is raw event datum, a pair of a `term` tag and an arbitrary noun. + +### The First Five Events {#the-first-five-events} + +The `+eden` core supplies the first five events to create the event series that will result in Arvo and its lifecycle function. + +- Event One, `+aeon`. Start the event loop. +- Event Two, `+boot`. Bootstrap an `arvo` kernel from source. +- Event Three, `+fate`. Produce the Hoon bootstrap compiler. +- Event Four, `+hoon`. Produce the compiler source. +- Event Five, `+arvo`. Produce the kernel source. + +#### Event One: `+aeon` + +```hoon +++ aeon + ^- * + => *log=[boot=* tale=*] + != + =+ [arvo epic]=.*(tale.log boot.log) + |- ^- * + ?@ epic arvo + %= $ + epic +.epic + arvo .*([arvo -.epic] [%9 2 %10 [6 %0 3] %0 2]) + == +``` + +`+aeon` is the first function run on any ship. The gate on the outer edge of Arvo is retrieved, then all of the events are run in a list processing loop, invoking Arvo in each one. This is Hoon code to just produce raw Nock using `!=` zaptis. The `=>` tisgar asserts that we expect the subject (outside the Nock) to look like the first thing in the event log and the rest of the log. This is the first event of the log. The subject that it expects is the rest of the log (which will be evaluated using `[2 [0 3] [0 2]]`.) So `boot.log` is Event Two and `tale.log` is Events Three through Infinity. + +This produces `arvo`, the stateless kernel, and `epic`, the rest of the log. The formula in Event Two can take as many events as it needs from the sequence to construct `arvo`, then the incremental process can continue. If `epic` is an atom `~` sig (the null terminator), then `arvo` is ready. Arvo is the result of calling `arvo` on `-.epic`, the next event. + +#### Event Two: `+boot` + +```hoon +++ boot + ^- * + => *log=[fate=* hoon=@ arvo=@ epic=*] + != + :: + :: activate the compiler gate. the product of this formula + :: is smaller than the formula. so you might think we should + :: save the gate itself rather than the formula producing it. + :: but we have to run the formula at runtime, to register jets. + :: + :: as always, we have to use raw nock as we have no type. + :: the gate is in fact ++ride. + :: + ~> %slog.[0 leaf+"1-b"] + =/ compiler-gate .*(0 fate.log) + :: + :: compile the compiler source, producing (pair span nock). + :: the compiler ignores its input so we use a trivial type. + :: + ~> %slog.[0 leaf+"1-c (compiling compiler, wait a few minutes)"] + =/ compiler-tool + ~> %bout + .*([compiler-gate noun/hoon.log] [%9 2 %10 [6 %0 3] %0 2]) + :: + :: switch to the second-generation compiler. we want to be + :: able to generate matching reflection nouns even if the + :: language changes -- the first-generation formula will + :: generate last-generation spans for `!>`, etc. + :: + ~> %slog.[0 leaf+"1-d"] + =. compiler-gate ~>(%bout .*(0 +.compiler-tool)) + :: + :: get the span (type) of the kernel core, which is the context + :: of the compiler gate. we just compiled the compiler, + :: so we know the span (type) of the compiler gate. its + :: context is at tree address `+>` (ie, `+7` or Lisp `cddr`). + :: we use the compiler again to infer this trivial program. + :: + ~> %slog.[0 leaf+"1-e"] + =/ kernel-span + ~> %bout + -:.*([compiler-gate -.compiler-tool '+>'] [%9 2 %10 [6 %0 3] %0 2]) + :: + :: compile the arvo source against the kernel core. + :: + ~> %slog.[0 leaf+"1-f"] + =/ kernel-tool + ~> %bout + .*([compiler-gate kernel-span arvo.log] [%9 2 %10 [6 %0 3] %0 2]) + :: + :: create the arvo kernel, whose subject is the kernel core. + :: + ~> %slog.[0 leaf+"1-g"] + ~> %bout + [.*(+>.compiler-gate +.kernel-tool) epic.log] + -- +``` + +The next event noun bootstraps a kernel from its source (`arvo`). + +#### Event Three: `+fate` + +The next noun (event) is the Hoon bootstrap compiler as source. + +Details of this process are supplied in [`/lib/pill`](https://github.com/urbit/urbit/blob/develop/pkg/base-dev/lib/pill.hoon). + +#### Event Four: `+hoon` + +Next we produce the compiler source. + +Details of this process are supplied in [`/lib/pill`](https://github.com/urbit/urbit/blob/develop/pkg/base-dev/lib/pill.hoon). + +#### Event Five: `+arvo` + +Then we produce the kernel source. + +Details of this process are supplied in [`/lib/pill`](https://github.com/urbit/urbit/blob/develop/pkg/base-dev/lib/pill.hoon). + +#### `/lib/pill` + +- To see how the events are created, let's take a look at the `+brass` arm in `/lib/pill`. This uses vase mode (see e.g. the `swat` which is a delayed `slap` in a trap) to produce the cores and events. + +### Lifecycle {#lifecycle} + +Once you have all of this, you have completed the lifecycle evaluation of the bootstrap sequence and can run the rest of the event log. + +Arvo enters the larval phase during the boot sequence and during certain OTAs. An OTA can just change `%lull`/`%zuse`/a vane, which doesn't touch Arvo; or it can changes `/sys/hoon` or `/sys/arvo`, in which case it needs to handle an Arvo upgrade. From the big picture, we build the new Arvo, compile Hoon and Arvo, gather all persistent and ephemeral state plus any new upgrade state, and hand that to the new Arvo (new world). + +The larval core is the outermost core in `/sys/arvo`. When you first bootstrap, the core that the runtime talks to is the larval phase. The larval core is designed to accumulate preconditions and then metamorphose into the adult Arvo. It needs the current time, entropy, the identity, and the standard library. + +> Symmetry breaking---the event that defines the identity of the com\-pu\-ter---is exempt from this requirement. Once identity is established, it can't be updated. If you want a new identity, create a new instance. (Whitepaper) + +The larval stage was introduced into the boot and upgrade sequence as a way to solve a practical problem in self-reference. If the ship is not known, then what should happen? If you bunt, then everything is ~zod. If you have a `(unit @p)`, then all code using `our` becomes cumbersome. So while identity is injected early into the kernel, it hasn't happened yet. Specifically, it takes place when it acquires identity and entropy and sheds the larval core. This is called “breaking symmetry” because prior to this point every Urbit is identical. (This was not always true, as Joe Bryan notes [in this talk](https://youtu.be/V7XPSvVs5TQ?t=681) at 10:21ff.) The larval stage performs the following steps in order: + +- The standard library, `zuse`, is installed. +- Entropy is added +- Identity is added +- Metamorphosis into the next stage of Arvo + +```hoon +|% +++ load :: +4 + |= hir=heir + ?: ?=(%grub -.hir) + ~>(%mean.'arvo: larval reboot' !!) :: XX support + (^load hir) +:: +++ peek _~ :: +22 +++ poke :: +23 + |= [now=@da ovo=ovum] + ^- ^ + ~| poke/p.card.ovo + =/ wip + ~> %mean.'arvo: bad wisp' + ;;(wisp card.ovo) + :: + =. ..poke + ?- -.wip + %verb ..poke(lac ?~(p.wip !lac u.p.wip)) + %wack ..poke(eny `p.wip) + %what ..poke(gub (what gub p.wip)) + %whom ..poke(who ~|(%whom-once ?>(?=(~ who) `p.wip))) + :: + %wyrd ?. (sane:wyrd kel.p.wip) + ~>(%mean.'wyrd: insane' !!) + %- %+ need:wyrd kel.p.wip + ^- wynn + :* hoon/hoon-version + arvo/arvo + ?~ lul ~ + :- lull/;;(@ud q:(slap $:u.lul limb/%lull)) + ?~ zus ~ + [zuse/;;(@ud q:(slap $:u.zus limb/%zuse)) ~] + == + ..poke(ver `p.wip) + == + :: + :: upgrade once we've accumulated necessary state + :: + ?~ hir=(molt now gub) + [~ ..poke] + ~> %slog.[0 leaf+"arvo: metamorphosis"] + (load u.hir) +:: +++ wish :: +10 + |= txt=* + q:(slap ?~(zus pit $:u.zus) (ream ;;(@t txt))) +-- +``` + +- `+load`/`+4` passes through to the new world Arvo, because the larval stage is a trivial core wrapped around the mature core. +- `+wish`/`+10` evaluates as normal. +- `+peek`/`+22` produces the type of `~`; i.e., every scry blocks. +- `+poke`/`+23` handles the upgrade-related events. Larval events are actually quite simple: + +```hoon ++$ wisp + $% $>(?(%verb %what) waif) :: update from files (event from anywhere) + $>(?(%wack %wyrd) wasp) :: iterate entropy (event from runtime) + [%whom p=ship] :: acquire identity (frozen after boot) + == +``` + +#### `+le` + +`+le` is the Arvo event-loop engine. It provides an `+abet`-pattern-driven core for building worklists. + +- `+peek` handles any reads into the larval stage block. +- `+poke` is how you compute an event, as normal. +- `+load` is how Arvo transfers state to a future version of itself. +- `+wish` is of course the `urbit eval` wrapper and generates no events. + +See also: + +- `+what`, the update engine, which handles a kernel update and continuation from the worklist. + +#### Metamorphosis + +Metamorphosis means producing the parent core, the mature Arvo. This takes place through `+load`, which hands itself forward. + +The larval stage upgrade mechanism is just a pass-through or it crashes. + +`+molt` takes care of the process in a practical way by gathering the known system information and packaging it for `+load`. + +When bootstrapping is done, the runtime strips off the gate to access the real Arvo core. (This is something of a manual trick.) This is how the structural interface Arvo is accessed, through the hardcoded arm addresses. + +- [~master-morzod, “Annotation on the Boot Process”](https://groups.google.com/a/urbit.org/g/dev/c/ESrqJb3Ol54/m/bns0S1QkBAAJ) +- [~master-morzod, ~lagrev-nocfep “Dev Chat: Joe Bryan on the Boot Sequence”](https://www.youtube.com/watch?v=V7XPSvVs5TQ) + +### Main Sequence (Mature Arvo) {#main-sequence-mature-arvo} + +Once the outer larval core has been shed, the system is back in the Arvo main sequence ([*Arvo II: The Boot Sequence*](ca04.md)). + + +### Vanes & Userspace {#vanes-userspace} + +`%lull` is compiled against `..part`, the first half of Arvo. + + +## Shutdown {#shutdown} + +In a general sense, Urbit is only aware of the world while it lives. But of course on a real machine, the ship will be shut down, migrated, and execute on different runtimes. + +- What happens when you run `|exit`? Trace out that process and back into the king for graceful shutdown. diff --git a/docs/core-academy/ca05.md b/docs/core-academy/ca05.md new file mode 100644 index 0000000..2a0c9ba --- /dev/null +++ b/docs/core-academy/ca05.md @@ -0,0 +1,287 @@ +--- +description: "Core Academy lesson on Vere runtime architecture covering the u3 noun system, king/serf processes, the Mars/Urth split, event log management, jet dashboard, and the Arvo/Vere interface." +layout: + title: + visible: true + description: + visible: false + tableOfContents: + visible: true + outline: + visible: true + pagination: + visible: true +--- + +# 6. Vere I: u3 and the Serf + +*This lesson covers the Arvo/Vere interface, including Arvo's `+peek` and `+poke` arms. We also examine the Mars/Urth split, the event log, snapshots, and the jet dashboard.* + +_We need some C background to fluently read the runtime source. If you are not familiar with C, here are some good resources:_ + +- [Beginner C resource list](https://stackoverflow.com/questions/562303/the-definitive-c-book-guide-and-list) +- [Brian Kernighan & Dennis Richie, _The C Programming Language_](https://en.wikipedia.org/wiki/The_C_Programming_Language) +- [Ben Klemens, _21st Century C: Tips from the New School_](https://www.amazon.com/dp/1449327141) +- [Lawrence Angrave, _System Programming](https://github.com/angrave/SystemProgramming/wiki) + + +## Runtime Structure and Responsibilities {#runtime-structure-and-responsibilities} + +If the vision of Urbit is to implement `[2 [0 3] 0 2]` as a frozen lifecycle function, then it needs some scaffolding on any real system. Real computers have memories, chipset architectures, operating system conventions, and other affordances (and limitations). Conventionally, an operating system takes explicit care of such niceties, which is one reason why calling Urbit an “OS” has been controversial. The runtime interpreter is designed to take Nock as a specification and produce a practical computer of it. + +Today, there are two primary Nock executable runtimes: Vere and Sword (née Ares). (Jaque, the JVM experiment, and King Haskell have fallen by the wayside.) + +- Vere is written in C and is the standard runtime for Arvo. +- Sword is written in Rust and aims to solve some deep theoretical challenges to producing efficient Nock on a contemporary chipset. Sword is under development by [Zorp](https://zorp.io) and formerly with contributions from [Tlon Corporation](https://tlon.io), and the [Urbit Foundation](https://urbit.org). + +We will take Vere as the normative runtime for Core Academy. + +As we mentioned last time in the boot sequence lesson, the runtime spawns the king (`king.c`) and indirectly the serf (`serf.c`) processes. These both run for the lifetime of the Urbit process. + +There are two competing frames for how to structure the Urbit process: king/serf and urth/mars. + +King v. serf separates the Nock and Arvo material from the I/O and event log material. It has the advantage that (per the whitepaper), “The serf only ever talks to the king, while the king talks with both the serf and Unix.” + +The king process is in charge of: + +- IPC +- Event log +- Unix effects including I/O +- Stateless Nock interpreter + +The serf process is the Nock runtime and bears responsible for: + +- Nock virtual machine (tracking current state of Arvo as a noun and `+poke`ing it with nouns) +- Bytecode interpretation +- Jet dashboard +- Snapshotting +- Noun allocation for Arvo + +The Mars/Urth split reframes the worker process so that it includes the event log with the current serf responsibility (“Mars”), thus enabling online event log management and truncation. + +### The Structure of Vere’s Source {#the-structure-of-veres-source} + +Vere is provided in the [`urbit/vere`](https://github.com/urbit/vere) repo. It is built from the `pkg/` directory and contains the following top-level folders: + +``` +. +├── c3 +├── ent +├── noun +├── ur +├── urcrypt +└── vere +``` + +- `/c3` contains the types and definitions to enable [the `c3` logical system](/reference/runtime/c). + + `c3` is the set of C conventions which Vere enforces. These include well-specified integer types, tooling for loobeans (instead of booleans), and motes (`#define`s for short Urbit words). “The C3 style uses Hoon style TLV variable names, with a quasi Hungarian syntax.” There are no Urbit-specific requirements for C3, which could otherwise just be a general-purpose C discipline. + + Like aura bitwidth markers, C documents programmer intent but does not generally enforce it. Most of the parts of `c3` are simply lapidary terms for C99 types. + + - Scan the files in `/c3`. + +- `/ent` provides entropy for the runtime. Entropy is derived from [`/dev/urandom`](https://en.wikipedia.org/wiki//dev/random), which is a special file that provides pseudorandom numbers derived from system noise. `/dev/urandom` produces [machine randomness as close to true randomness as possible](https://www.2uo.de/myths-about-urandom/#before-linux-48), including seeds like network latency and keystroke latency to seed the cryptographically secure pseudo-random number generator (CSPRNG). + +- `/noun` is the gorilla, containing `u3` (the noun library) and the jets. We'll go into it in detail with the system architecture in a moment in Section `u3`. + +- `/ur`, is like `/ent` a single-purpose library, in this case for bitstreams and serialization. + +- `/urcrypt` is a C library to standardize cryptographic calls across a number of libraries. + + This library is a dependency for both Vere and Ares, and is in the process of being moved into a standalone repo. + +- `/vere` contains the runtime architecture itself, the king and the serf and related tooling, as independent from `u3`. + +| file | purpose | +| -- | -- | +| `auto.c` | I/O drivers | +| `benchmarks.c` | performance tests | +| `dawn.c` | key validation for bootstrapping | +| `disk.c` | database reads and writes for event log | +| `foil.c` | file synching | +| `king.c` | main runtime loop | +| `lord.c` | manage IPC between king and serf | +| `main.c` | setup and entrypoint for runtime execution | +| `mars.c` | Mars event log replay (see Mars/Urth split above) | +| `newt.c` | noun blob messages | +| `pier.c` | manage pier (files on host OS disk) | +| `save.c` | save events to pier | +| `serf.c` | the serf itself | +| `time.c` | Unix/Urbit time operations | +| `vere.h` | shared Vere-specific `struct`s | +| `ward.c` | lifecycle management for structures | + +#### `u3` + +##### Nouns + +A noun is either an atom or a cell. However, we have to decide what this implementation looks like in a language like C, that prefers arrays and pointers. `u3` is the noun library, which features Urbit-specific memory operations, tracing and profiling tools, and so forth. + +A `u3_noun` is a 32-bit `c3_w` = `uint32_t`. The first bits indicate what kind of value the noun is and thus how to approach it: + +| Bit 31 | Bit 30 | Meaning | +| -- | -- | -- | +| `1` | `1` | Indirect cell (`pom`) | +| `1` | `0` | Indirect atom (`pug`) | +| `0` | ·— | Direct atom (`cat`) | + +An indirect noun is a `dog`. For indirect nouns, bits 29–0 are a word pointer into the loom. In addition, `0xffff.ffff` is `u3_none`, which is “not a noun”. + +A common pattern is to extract values from a noun into C-typed values, carry out the manipulation, and then wrap them back into the noun. Furthermore, the value from an arbitrary atom may in fact be a bignum, and so [GMP](https://en.wikipedia.org/wiki/GNU_Multiple_Precision_Arithmetic_Library) is used to manage these values. + +- Examine `/noun/jets/a/add.c`, in particular `u3qa_add`. + +One of the painful parts of working with `u3` is the reference counting system. [Reference counting](https://en.wikipedia.org/wiki/Reference_counting) is an expedient to handle tracking the number of pointers to an object in memory so that the memory can be freed at the appropriate time. Since C doesn't provide reference counting support in the language, we must manually track these and free the value only when the refcount goes to zero. The relevant functions are `u3k` to gain a refcount and `u3z` to lose one. + +There are also two different protocols for reference counting, used by different parts of the system: + +- `transfer` semantics relinquishes a refcount of any sent values. Most functions behave this way, which means that you don't have to think about de-allocating values if they've been sent elsewhere. +- `retain` semantics hold onto the refcount even if the value is sent elsewhere. The functions which use `retain` semantics tend to inspect or query nouns rather than make or modify nouns. + +> The `u3` convention is that, unless otherwise specified, **all functions have transfer semantics** - with the exception of the prefixes: `u3r`, `u3x`, `u3z`, `u3q` and `u3w`. Also, within jet directories `a` through `f` (but not `g`), internal functions retain (for historical reasons). + +- Compare `u3ka_add` and `u3qa_add`. + +`u3` is designed to make some guarantees for the programmer. It's not Urbit itself, but it's designed to be an implementation platform for Urbit. Thus: + +- Every event is logged internally before it enters `u3`. +- A permanent state noun maintains a single reference. +- Any event can be aborted without damaging the permanent state (“solid state”). +- We snapshot the permanent state and can prune logs. + +We will discuss the specifics of the memory model next week in `ca06` when we discuss the loom and the road model. + +- [“Land of Nouns”](/reference/runtime/nouns); note particularly the section `u3: reference protocols`, labeled ``THIS IS THE MOST CRITICAL SECTION IN THE `u3` DOCUMENTATION.`` Read that if nothing else. + +##### Library + +The contents of `/noun` constitute the `u3` noun library. Functions are organized by file and prefix into certain namespaces by operation. Because `u3` is a library, we can't cleanly separate it into serf/king components, although certain modules do have close identification with one or the other. + +| prefix | purpose | `.h` | `.c` | +| -- | -- | -- | -- | +| `u3a_` | allocation | [`allocate.h`](https://github.com/urbit/vere/blob/develop/pkg/noun/allocate.h) | [`allocate.c`](https://github.com/urbit/vere/blob/develop/pkg/noun/allocate.c) | +| `u3e_` | persistence | [`events.h`](https://github.com/urbit/vere/blob/develop/pkg/noun/events.h) | [`events.c`](https://github.com/urbit/vere/blob/develop/pkg/noun/events.c) | +| `u3h_` | hashtables | [`hashtable.h`](https://github.com/urbit/vere/blob/develop/pkg/noun/hashtable.h) | [`hashtable.c`](https://github.com/urbit/vere/blob/develop/pkg/noun/hashtable.c) | +| `u3i_` | noun construction | [`imprison.h`](https://github.com/urbit/vere/blob/develop/pkg/noun/imprison.h) | [`imprison.c`](https://github.com/urbit/vere/blob/develop/pkg/noun/imprison.c) | +| `u3j_` | jet control | [`jets.h`](https://github.com/urbit/vere/blob/develop/pkg/noun/jets.h) | [`jets.c`](https://github.com/urbit/vere/blob/develop/pkg/noun/jets.c) | +| `u3l_` | logging | [`log.h`](https://github.com/urbit/vere/blob/develop/pkg/noun/log.h) | [`log.c`](https://github.com/urbit/vere/blob/develop/pkg/noun/log.c) | +| `u3m_` | system management | [`manage.h`](https://github.com/urbit/vere/blob/develop/pkg/noun/manage.h) | [`manage.c`](https://github.com/urbit/vere/blob/develop/pkg/noun/manage.c) | +| `u3n_` | nock computation | [`nock.h`](https://github.com/urbit/vere/blob/develop/pkg/noun/nock.h) | [`nock.c`](https://github.com/urbit/vere/blob/develop/pkg/noun/nock.c) | +| `u3o_` | command-line options | [`options.h`](https://github.com/urbit/vere/blob/develop/pkg/noun/options.h) | [`options.c`](https://github.com/urbit/vere/blob/develop/pkg/noun/options.c) | +| `u3r_` | noun access (error returns) | [`retrieve.h`](https://github.com/urbit/vere/blob/develop/pkg/noun/retrieve.h) | [`retrieve.c`](https://github.com/urbit/vere/blob/develop/pkg/noun/retrieve.c) | +| `u3s_` | noun serialization | [`serial.h`](https://github.com/urbit/vere/blob/develop/pkg/noun/serial.h) | [`serial.c`](https://github.com/urbit/vere/blob/develop/pkg/noun/serial.c) | +| `u3t_` | profiling | [`trace.h`](https://github.com/urbit/vere/blob/develop/pkg/noun/trace.h) | [`trace.c`](https://github.com/urbit/vere/blob/develop/pkg/noun/trace.c) | +| `u3u_` | urth (memory management) | [`urth.h`](https://github.com/urbit/vere/blob/develop/pkg/noun/urth.h) | [`urth.c`](https://github.com/urbit/vere/blob/develop/pkg/noun/urth.c) | +| `u3v_` | arvo | [`vortex.h`](https://github.com/urbit/vere/blob/develop/pkg/noun/vortex.h) | [`vortex.c`](https://github.com/urbit/vere/blob/develop/pkg/noun/vortex.c) | +| `u3x_` | noun access (error crashes) | [`xtract.h`](https://github.com/urbit/vere/blob/develop/pkg/noun/xtract.h) | [`xtract.c`](https://github.com/urbit/vere/blob/develop/pkg/noun/xtract.c) | +| `u3z_` | memoization | [`zave.h`](https://github.com/urbit/vere/blob/develop/pkg/noun/zave.h) | [`zave.c`](https://github.com/urbit/vere/blob/develop/pkg/noun/zave.c) | +| `u3k[a-g]` | jets (transfer, C args) | `jets/k.h` | `jets/[a-g]/*.c` | +| `u3q[a-g]` | jets (retain, C args) | `jets/q.h` | `jets/[a-g]/*.c` | +| `u3w[a-g]` | jets (retain, nock core) | `jets/w.h` | `jets/[a-g]/*.c` | + +- `u3a` defines memory allocation functions. These are used throughout, but we'll discuss it a bit more when we talk about the king. You will quickly run into reference counting features, like `u3k` (`u3a_gain()`) to gain a refcount and `u3z` (`u3a_lose()`) to lose one. +- `u3e` manages the loom. +- `u3h` provides fast custom hashing for the runtime. +- `u3i` puts a value (expected to be a `c3` type) into a noun. (Look at this one now.) +- `u3l` supports logging. +- `u3m` manages the system: boots `u3`, makes a pier, handles crashes, etc. +- `u3n` implements the Nock bytecode interpreter. +- `u3o` parses the manifold command-line options of Urbit and writes them into globals. +- `u3r` extracts a value from a noun, with a `u3_weak` on failure. (Look at this one now.) +- `u3s` implements noun serialization (`+jam` and `+cue`). +- `u3t` provides tracing for crashes. +- `u3u` offers memory management tooling (deduplication and memory mapping). +- `u3v` supports Arvo interaction. +- `u3x` extracts a value from a noun., with a crash on failure +- `u3z` supports `~+` siglus rune memoization. + +If you work much in Vere, you will get used to seeing these. There are basically two broad categories of functions: single-use functions (like starting a pier, `u3m_pier`) and utility functions (like writing a value to a noun, `u3i_word`). + +- Return to `/noun/jets/a/add.c` and look at `u3wa_add` and `u3ka_add`. + + +- [“C Runtime System”](/reference/runtime/runtime) +- [“Land of Nouns”](/reference/runtime/nouns) + + +## The Serf {#the-serf} + +The serf process is the Nock runtime and bears responsible for: + +- Nock virtual machine (tracking current state of Arvo as a noun and `+poke`ing it with nouns) +- Bytecode interpretation +- Jet dashboard + +If you examine `/vere/serf.c`, you can get a feel for how it is organized. See e.g. `u3_serf_work` and callees. + +### Arvo Noun Management {#arvo-noun-management} + +- `/vere/vortex.c`, e.g. `u3v_peek`, `u3v_wish`, and `u3v_poke_sure`. + +### Nock Bytecode Interpreter (`u3n`) {#nock-bytecode-interpreter-u3n} + +- `/noun/nock.c`, e.g. `u3n_nock_on`, `u3n_slam_on` (calling convention for gates). + +The end result of the Hoon compilation process is Nock code as a noun. This noun is evaluated by the runtime, but it is not actually directly run as such. Instead, the runtime builds an efficient bytecode stream and executes that instead to complete the calculation. + +The Nock bytecode for any expression can be obtained using the `%xray` raw hint. + +```hoon +> ~> %xray =+(2 [- +(-)]) +{[litb 2] snol head swap head bump ault halt} +[2 3] + +> ~> %xray =+(2 [(add - -) +(-)]) +{[litb 2] snol [fask 4095] [kicb 1] snoc head swap [fabk 6] swap [fabk 6] auto musm [kicb 0] swap head bump ault halt} +[4 3] +``` + +The Nock bytecode is defined in the `OPCODES` macro in `/noun/nock.c` and evaluated by `_n_burn` in that same folder. The `OPCODES` `#define` uses the [X macro](https://en.wikipedia.org/wiki/X_Macro), which is a bit of C deep lore. + +As a consequence of the architecture of Vere today, we see a lot of expensive call overhead. For instance, when you wrap an `%xray` hint around a core, you don't get the core itself—instead you get the formula that invokes the code. + +```hoon +> ~> %xray (met 3 (jam .)) +{[fask 1023] [kicb 3] snol head swap tail [lilb 3] swap tail [fask 1023] [kicb 2] snol head swap tail musm [kicb 1] auto musm [ticb 0] halt} +984.339 +``` + +Since many things are computed in virtual Nock, `+mock`, we have bail/trace/bounded computation at the price of slow virtualization. + +One objective of Sword (née Ares), subject knowledge analysis, is to improve on Nock bytecode generation. This is being implemented into Vere as well. + +### Jet Dashboard (`u3j`) {#jet-dashboard-u3j} + +As we summarized when first introducing jets in `ca00`, the runtime manages jets, including re-running them when playing back the event log history. + +> The jet dashboard is the system in the runtime that registers, validates, and runs jets: specific pieces of Nock code reimplemented in C for performance. + +The jet dashboard maintains three jet state systems: + +1. `cold` state results from the logical execution history of the pier and consists of nouns. `cold` jet state registers jets as they are found. `cold` state ignore restarts. +2. `hot` state is the global jet dashboard and describes the actual set of jets loaded into the pier for the current running process. Calls to `hot` state result from Nock Nine invocations of a core and an axis. `hot` state is thus tied to process restart. +3. `warm` lists dependencies between `cold` and `hot` state. `warm` state can be cleared at any time and is cleared on restart. + +The jet dashboard (`u3j`, `/noun/jets.c`) will not be explored in detail in Core Academy, but we do want to look at a couple of actual jets. + +- [“API Overview by Prefix: `u3j` jets”](/reference/runtime/api#u3j-jets) +- [~timluc-miptev, “Jets in the Urbit Runtime”](https://github.com/timlucmiptev/docs-runtime/blob/master/jets1_current.md) + +### Jets {#jets} + +- Examine `/noun/jets/b/lent.c`, `/noun/jets/b/turn.c`, `/noun/jets/c/turn.c`, `/noun/jets/e/rs.c`, `/noun/jets/e/slaw.c`. + +Many Urbit contributors may find jet composition to be their first serious encounter with the runtime. On the bright side, jetting is a fairly constrained and well-understood space. However, it has a complex interface for unpacking calls and nouns, including reference counting requirements. + +- `u3w` functions are the main entry point (as identified in `/noun/tree.c`). These unpack and sanity-check the sample, then call either `u3q` or `u3k` variants of the jet. The unpacking axes are hard-coded in `/noun/xtract.h`. +- By convention, `u3q` and `u3w` functions have `transfer` semantics. +- `u3k` functions have `retain` semantics, so they are responsible to `u3z` free their values after the computation completes. +- `u3_none` (`0x7fff.ffff`) is NOT the same as `u3_nul`. A jet that returns `u3_none` punts the value back to the Hoon/Nock version. + +- [“Writing Jets”](/reference/runtime/jetting) +- [~timluc-miptev, “Jets in the Urbit Runtime”](https://github.com/timlucmiptev/docs-runtime/blob/master/jets1_current.md) + +### Snapshotting {#snapshotting} + +We'll cover snapshotting in the next lesson, `ca06`. diff --git a/docs/core-academy/ca06.md b/docs/core-academy/ca06.md new file mode 100644 index 0000000..6eb1062 --- /dev/null +++ b/docs/core-academy/ca06.md @@ -0,0 +1,418 @@ +--- +description: "Core Academy lesson on Vere's memory management covering the loom allocator, noun memory layout, reference counting, structural sharing, the road system, metacircularity, paging, and snapshots." +layout: + title: + visible: true + description: + visible: false + tableOfContents: + visible: true + outline: + visible: true + pagination: + visible: true +--- + +# 7. Vere II: The Loom + +_In this lesson we explain the memory allocator: the loom, noun memory layout, reference counting, structural sharing, raods, and metacircularity. We'll also see how paging and snapshots work, including page faults, memory protection, the guard page, and demand paging._ + +## The Loom <a href="#the-loom" id="the-loom"></a> + +Vere's main memory model is called the _loom_. (Presumably this is from the roads shuttling back and forth, and perhaps mixing a metaphor.) A contiguous block of memory, formerly 2GB but now specified by the runtime flag, is allocated for the loom. This is the noun arena, and to work on it we need to use special `u3`-specific allocators (`u3a`). + +One standard contemporary memory model afforded by an operating system has a [heap](https://en.wikipedia.org/wiki/Memory_management#HEAP) for manual dynamic memory allocation (C `malloc()`) and a [stack](https://en.wikipedia.org/wiki/Stack-based_memory_allocation) for local variable data (last-in-first-out, C `alloca()` but also implicit). The heap grows from the bottom up and the stack from the top down. (The alternative model today is to use SunOS-style `mmap()` to allow virtual memory paging anywhere in memory.) + +``` + 0 brk ffff + | heap | stack | + |------------#################################+++++++++++++| + | | | + 0 sp ffff +``` + +* `brk` is the [`brk()` System V call](https://utcc.utoronto.ca/~cks/space/blog/unix/SbrkVersusMmap), which marked the limit of the heap arena. +* `sp` is the stack pointer. + +`u3` differs from this model in one particular: by permitting the heap and stack to point either way, we can efficiently nest pairs of stack and heap. We call such a pair (and their free memory arena) a _road_. The outermost road is the _surface road_, and inner roads are created in alternating directions when dependent calculations are embarked upon. + +When a new inner road is created, it switches direction from the outer road. This puts its heap up against the outer road's stack, and its stack up against the outer road's heap. But when the road is terminated, its stack is freed while its heap becomes part of the outer road's stack. + +A conventional heap-low-stack-high road is a north road: + +``` + 0 rut hat ffff + | | | | + |~~~~~~~~~~~~-------##########################+++++++$~~~~~| + | | | | + 0 cap mat ffff +``` + +while a reversed heap-high-stack-low road is a south road: + +``` + 0 mat cap ffff + | | | | + |~~~~~~~~~~~~$++++++##########################--------~~~~~| + | | | | + 0 hat rut ffff +``` + +* `cap` is the top of the stack. +* `mat` is the bottom of the stack (`ffff` in the example surface road). +* `rut` is the bottom of the heap arena (not `0` because of immutable storage). +* `hat` is the top of the heap arena. +* `~` is deep storage (immutable). +* `-` is durable storage (heap). +* `+` is temporary storage (stack). +* `$` is the allocation frame. +* `#` is free memory. + +The motivation for the road model is that you need not update refcounts in senior memory. This diminishes the downsides of reference counting. (The upside of refcounting is deterministic finalization: automatic memory management comes from the properties of the computation itself not from external preemptive events. Refcounting gives eager determinism.) + +When we need to process an event or perform any kind of complicated calculation, we process it using an inner road. Because the roads alternate direction, any data from an inner heap that needs to be preserved must be copied back out. The advantages, tho, are: + +1. The surface road is left read-only by `u3` and thus clean. Thus when snapshotting, clean pages are kept clean. +2. An inner calculation can be aborted without affecting the surface. +3. The surface is not fragmented because the inner results are copied in when necessary. + +Vere ends up in operation with nested roads during a computation: + +``` + 0 ffff + |------- ++++++++++++| surface road + |~~~~~~~$+++++++++ ---------~~~~~~~~~~~~| inner road 1 + |~~~~~~~~~~~~~~~~~--------- +++++$~~~~~~~~~~~~~~~~~~~~~| inner road 2 + | ##&## | free memory + +``` + +The Vere interpreter runs in a road, and you can check if you're on the surface road or in an inner road. (Most of the time you should just assume you're on an inner road.) This distinction matters, such as `c3_assert()`, which produces an exception with stack trace on an inner road, but kills the process on an outer road. + +The current road is `u3R`, a global. Within Arvo, a new road is currently begun in the following cases: + +* every event +* every read from the namespace (Arvo scry gate into vane doesn't, userspace `.^` dotket does) +* every call to `+mink` (and `+mock`, etc.) + +Any work that will be done where you only want to keep one portion of it is a good candidate for a road. + +The loom is organized into pages, each 16 KB in size. There is a guard page `&` in the middle of free memory to make sure that the stack and heap do not overwrite each other. The guard page is adjusted in `u3e_ward()` when necessary: + +> When a fault is detected in the guard page, the guard page is recentered in the free space of the current road. if the guard page cannot be recentered, then memory exhaustion has occurred. + +There is a hope to add both a raw hint to suggest to the system to use a new road and a bump-allocation mode to permit turning off refcounting on inner roads when practical. (Cf. [#6805](https://github.com/urbit/urbit/issues/6805#issuecomment-1754208392).) + +#### `u3a` Allocator + +To work with `u3` memory, use the `u3a` memory allocation functions: + +* `u3a_malloc()` +* `u3a_free()` +* `u3a_realloc()` + +You should never call `malloc()` in the loom (but can, of course, in the Vere layer above `u3`). + +> Of course, we don't always know how large our atom will be. Therefore, the standard way of building large atoms is to allocate a block of raw space with `u3i_slab_init()`, then chop off the end with `u3i_slab_malt()` (which does the measuring itself) or `u3i_slab_mint()` in case you've measured it yourself. + +Keep in mind that atoms do not retain leading zeros. + +The reference counters introduced last time, `u3a_gain()`=`u3k()` and `u3a_lose()`=`u3z()`, are also part of `u3a`. However, other than these you typically use `u3a` indirectly through `u3i` and `u3r`/`u3x`. + +Some details of the allocator [are in flux right now](https://github.com/urbit/urbit/issues/6805#issuecomment-1754208392): “As an experiment, \[\~master-morzod has] rewritten the serf to a) stop allocating events, effects, and IPC messages on the home road, and b) keep the Arvo kernel on an inner road for as long as possible (i.e. until we need to save/pack/meld/\&c.).” + +* [“Land of Nouns”, section “`u3`: the road model”](../runtime/nouns.md#u3-the-road-model) +* [“API overview by prefix”, section “`u3a`: allocation functions”](../runtime/api.md#u3a-allocation-functions) + +## The King (Urth) <a href="#the-king-urth" id="the-king-urth"></a> + +The king process is in charge of: + +* IPC +* Event log +* Unix effects including I/O +* Stateless Nock interpreter (the “ghost ship” ivory pill material from `ca05`) + +(We discussed the serf/Mars in `ca05`.) + +### Event Log & Snapshotting <a href="#event-log-snapshotting" id="event-log-snapshotting"></a> + +The event log is the ordered list of all of the Arvo events (completed moves) resulting in the present state of Arvo. Events are handled at two levels: the event log, which is written consistently at each event, and the snapshot of the loom, which allows rapid recovery of the current state. + +> In practice, event logs become large and unwieldy over time. Periodically a snapshot of the permanent state is taken, so the entire event log needn't be replayed on reboot. You're still able to rebuild your state down to the last keystroke. This is due to the practice of persistence.\ +> Persistence, in the context of storing data in a computer system, means that data is stored in a non-volatile manner and that input must be recorded before the output result is performed. Thus, every event must be written to disk - or must be _persisted_ - before the event effects actually take place. + +The snapshot of the loom allows the last few events from the event log to be replayed to recover the present state. + +In fact, with current usage patterns (\~2023.10.12), there's a problem: + +> Every few minutes, the runtime applies a patch to its on-disk snapshot. This pauses the process, so the size is important. In particular, for many large ships (\~nibset-napwyn, \~wicdev-wisryt, \~natnex-ronret), this is around 600MB. If the maximum acceptable pause is 1 second, this requires a disk which can handle 600MB/s of throughput (or maybe twice that, because it writes the patchfile, then applies it?), which is extremely high. ([#6805](https://github.com/urbit/urbit/issues/6805)) + +#### `noun/error.h` + +Before we look at the particulars of the event log and snapshotting code, let's take a brief detour to see how assertion errors are handled. + +```c +# define u3_assert(x) \ + do { \ + if (!(x)) { \ + fflush(stderr); \ + fprintf(stderr, "\rAssertion '%s' " \ + "failed in %s:%d\r\n", \ + #x, __FILE__, __LINE__); \ + u3m_bail(c3__oops); \ + abort(); \ + } \ + } while(0) +``` + +An error here then triggers into `u3m_bail()`, the primary crash handler. + +* Review `u3m_bail` in `noun/manage.c`. + +#### Event Log + +What is an event on disk? Urbit maintains an [LMDB](https://en.wikipedia.org/wiki/Lightning_Memory-Mapped_Database) transactional database for key–value pairs, with a `META` table for metadata and an `EVENTS` table for event number–event data pairs, sequentially ordered. + +An event is a `u3_fact`, a `struct` including the timestamp and event `ovum`: + +```c +/* u3_fact: completed event +*/ + typedef struct _u3_fact { + c3_d eve_d; // event number + c3_l mug_l; // kernel mug after + u3_noun job; // (pair date ovum) + struct _u3_fact* nex_u; // next in queue + } u3_fact; +``` + +Event log replay thus refers to retrieving the sequence of events from the pier's database instance and playing back each sequential event. + +* Examine `u3_mars_play()` for details of how playback works. + * `_mars_play_batch()` + * `_mars_poke_play()` + * `u3v_poke_raw()` and we're back to a conventional Arvo poke as we saw in `ca05`. + +The introduction of epochs will enable finer-grained system recovery when necessary: + +> Historically, Vere has stored a single event log and snapshot. To facilitate replay across different binary versions more convenient and less error-prone, an improved design is the "epoch" system.\ +> In the epoch system, Vere breaks up the event log into "epoch"s, where an epoch represents a snapshot and some events after that snapshot.\ +> An epoch lives in its own folder, named after the first event in that epoch.\ +> In addition to storing a snapshot and a log of events, each epoch folder also stores a version file indicating which version of Vere originally ran these events -- this makes replay across different binary versions much easier, especially in the case of a jet mismatch in an old binary. + +#### Snapshots + +> Replay is how Vere computes the state of a ship's Arvo instance from the event log after a ship reboots. In order to avoid replaying the entire event log, Replay takes a snapshot of the current state of the ship approximately once every ten minutes. Then when a ship reboots, Replay loads the most recent snapshot and replays events from the event log up to the most recent event. + +`u3e_save()` saves the loom (snapshots), often called via `u3m_save()`. + +#### Demand Paging + +> Demand paging refers to the ability to load only needed pages of memory into RAM, leaving other pages on disk, to reduce memory use. + +Pages are marked as clean (`PROT_READ`) or dirty (`PROT_READ|PROT_WRITE`) or guard (`PROT_NONE`). The access pattern assumes all pages are accessed from the outside inwards (another advantage of the loom model). + +* [`urbit/vere` #402](https://github.com/urbit/vere/pull/402) (merged) +* [`urbit/vere` #410](https://github.com/urbit/vere/pull/410) (merged) + +### Unix I/O <a href="#unix-io" id="unix-io"></a> + +The king is responsible for the I/O operations of the communicating vanes: Ames, Behn, Clay, Dill, Eyre, Iris, Khan, Lick. (The other two vanes, Gall and Jael, are landlocked and only interact within Urbit.) + +We will cover the I/O drivers in a later lesson `ca11` after we have covered the major vanes which need to interface with the host OS. + +### IPC <a href="#ipc" id="ipc"></a> + +The Urbit runtime has two categories of IPC: + +1. King/serf interprocess communication +2. Vane-driven interprocess communication +3. `%khan`/`conn.c`-based sockets +4. `%lick`-based communications + +In general, [POSIX IPC](https://www.geeksforgeeks.org/inter-process-communication-ipc/) is “a mechanism that allows processes to communicate with each other and synchronize their actions.” This can be done by sharing memory directly between the processes or by passing messages. Vere does a little of both: the loom is the shared memory arena, and sometimes messages are used. + +For instance, in Vere's pier management (mainly `vere/pier.c`), the [lord](https://github.com/urbit/vere/blob/ea3eeee0d5efc198c279f2c916b73fc8df283af6/pkg/vere/lord.c#L313) coordinates the king and the serf through messages. Like Arvo, the king and the serf thus need to have the right API shape to connect to each other. The lord coordinates using `writ` to pass a value from the king to the serf, and `plea` to pass a value from the serf to the king. (These are rather like Arvo passes and gifts, but can be initiated from either side rather than just the pass/give pattern.) Whimsically, this is defined in Hoon inside a C comment in `vere/lord.c`: + +```hoon +|% +:: +writ: from king to serf +:: ++$ writ + $% $: %live + $% [%cram eve=@] + [%exit cod=@] + [%save eve=@] + [%meld ~] + [%pack ~] + == == + [%peek mil=@ sam=*] + [%play eve=@ lit=(list ?((pair @da ovum) *))] + [%work mil=@ job=(pair @da ovum)] + == +:: +plea: from serf to king +:: ++$ plea + $% [%live ~] + [%ripe [pro=%1 hon=@ nok=@] eve=@ mug=@] + [%slog pri=@ tank] + [%flog cord] + $: %peek + $% [%done dat=(unit (cask))] + [%bail dud=goof] + == == + $: %play + $% [%done mug=@] + [%bail eve=@ mug=@ dud=goof] + == == + $: %work + $% [%done eve=@ mug=@ fec=(list ovum)] + [%swap eve=@ mug=@ job=(pair @da ovum) fec=(list ovum)] + [%bail lud=(list goof)] + == == + == +-- +``` + +The procedure for IPC needs to establish communications, which is a follow-on from the king starting the serf. (Vere is single-threaded but runs two processes.) + +`$writ` from king to serf: + +* `%live` is a request to start up the serf. +* `%peek` is a request for data from Arvo in the serf. +* `%play` is a request to the serf to play an event (in an event playback). +* `%work` is a request to the serf to carry out a computation in Arvo. + +`$plea` from serf to king: + +* `%live` tells if the serf is alive. +* `%ripe` tracks the serf startup state. +* `%slog` is an output request. +* `%flog` is a debug output request. + * `%peek` is a response to the king with a scry result. + * `%play` is a response to an event playback. + * `%work` is a response to an injected `ovum` event. + +Other parts of IPC depend on the atom-framing implementations in `vere/newt.c`, another key part of king–serf IPC. `newt.c` produces noun blobs that have a five-byte header and a variable-length payload. The header has a one-byte version tag, typically `0x0`, followed by a four-byte little-endian message byte count. The payload is the `+jam`med noun. This is used by `urbit eval` for `stdin` computed against the ivory pill ghost ship, for instance: + +```sh +$ echo "(add 1 41)" | urbit eval +loom: mapped 2048MB +lite: arvo formula 2a2274c9 +lite: core 4bb376f0 +lite: final state 4bb376f0 +eval (run): +42 +``` + +`eval` supports several options for processing Hoon nouns as input to or output from `conn.c`: + +* `-j`, `--jam`: output result as a jammed noun +* `-c`, `--cue`: read input as a jammed noun +* `-n`, `--newt`: write output / read input as a newt-encoded jammed noun, when paired with `-j` or `-c` respectively +* `-k`: treat the input as the jammed noun input of a `%fyrd` request to `conn.c`; if the result is a `goof`, pretty-print it to `stderr` instead of returning it + +In `vere/newt.c`, see particularly: + +* `u3_newt_send()` transmits a jammed noun (using `u3s_jam_xeno()`, for instance) to a task buffer for `libuv`. (Recall that `libuv` is the main event loop driver for the king process.) +* `u3_newt_read()` pulls out the jammed noun from the buffer. + +Khan and Lick both use `newt.c`. + +* Demonstrate invoking a statement at the CLI with the `urbit` executable. + +#### Example: `|meld` Trace + +* Let's walk through the lifecycle of a command-line initiated `./zod/.run meld`. + * `vere/main.c` + * `noun/urth.c` +* Compare the lifecycle of `|meld`. + * `/gen/hood/meld` + * `/lib/helm/kiln` + * `/lib/hood/kiln` + * `/sys/clay` + * `vere/lord.c` + * `vere/serf.c` + +We will discuss Khan and Lick in detail during the next lesson, but here's a quick recap of their functionality. + +#### Khan + +> Khan is the "control plane" and thread-runner vane. Its main purpose is to allow external applications to run [threads](../../urbit-os/base/threads/README.md) via a Unix Socket and receive the result. + +A [socket](https://en.wikipedia.org/wiki/Unix_domain_socket) is an endpoint for data communications. In the runtime, it is implemented by [`conn.c`](https://github.com/urbit/vere/blob/develop/pkg/vere/io/conn.c), the runtime counterpart to `%khan`. + +* [`%khan` Overview](../../urbit-os/kernel/khan/README.md) + +#### Lick + +> Lick manages IPC ports, and the communication between Urbit applications and POSIX applications via these ports. Other vanes and applications ask Lick to open an IPC port, notify it when something is connected or disconnected, and transfer data between itself and the Unix application. + +* [`%lick` Overview](../../urbit-os/kernel/lick/README.md) + +## Debugging the Runtime <a href="#debugging-the-runtime" id="debugging-the-runtime"></a> + +To conclude today's material, we would like to briefly demonstrate several debugging principles with VereAres. + +### `printf` <a href="#printf" id="printf"></a> + +`fprintf`-based output should be done using `fprintf()` to `stderr`. Use both and to achieve line feed (move cursor down one line) and carriage return (move it to the left). You can also use `u3l_log` which does not require `\r`, but should not be used in cases where the I/O drivers have not yet been initialized or can no longer be relied upon, e.g. crashing or shutdown. + +### `gdb` <a href="#gdb" id="gdb"></a> + +> For C, make heavy use of `gdb`. `lldb` is far worse than `gdb` for debugging Urbit, so it's worth developing on a Linux box even if that means `ssh`ing into a server. (\~wicdev-wisryt) + +`gdb` works best when you build with debugging symbols. + +```sh +bazel build :urbit --compilation_mode=dbg + +- OR - + +bazel build :urbit --copt=-DU3_CPU_DEBUG + +- OR - + +bazel build :urbit --copt=-DU3_MEM_DEBUG +``` + +When using GDB, before attaching to the process you should set the following: + +```gdb +set follow-fork-mode child +handle SIGSEGV nostop noprint +``` + +If you are debugging jets or the serf, then you want to attach to the serf at `urbit-worker`. + +```sh +gdb --args ./bazel-bin/pkg/vere/urbit --args zod + +- OR - + +gdb attach <PID> +``` + +```gdb +break jet_file:jet_name +``` + +### Valgrind <a href="#valgrind" id="valgrind"></a> + +[Valgrind](https://valgrind.org/) is a memory profiling tool for diagnosing memory usage and memory leaks. As a memory tracking tool, Valgrind uses several times more memory than the native application requires. + +The possible expedient to make it viable is: + +* Decrease the loom size with `--loom` (e.g. `--loom 29` or something similarly constrained). + +[`urbit/urbit` #5161](https://github.com/urbit/urbit/issues/5161) has some discussion about using Valgrind. (The memory leak in question was ultimately resolved by [#5614](https://github.com/urbit/urbit/pull/5614), which presents an instructive insight.) + +### Changing the Serf <a href="#changing-the-serf" id="changing-the-serf"></a> + +The process call to invoke the serf is hard-coded in the king. To run Sword (née Ares) as a serf, for instance, you need to change the `protocol` entry in `vere/lord.c:u3_lord_init()` to the Sword executable as a full-path string literal. + +## Exercises <a href="#exercises" id="exercises"></a> + +* Turn on `LORD_TRACE_CUE` and boot a new fake ship. +* Examine the new `/ted/runtime-version` thread which is in the latest release (412 K). How does it work? Trace the types and values back through to see how the values are recovered from the runtime. diff --git a/docs/core-academy/ca07.md b/docs/core-academy/ca07.md new file mode 100644 index 0000000..0c5644f --- /dev/null +++ b/docs/core-academy/ca07.md @@ -0,0 +1,567 @@ +--- +description: "Core Academy lesson on Arvo vanes covering the Behn timer vane, Dill terminal driver vane, Khan thread runner vane, Lick IPC vane." +layout: + title: + visible: true + description: + visible: false + tableOfContents: + visible: true + outline: + visible: true + pagination: + visible: true +--- + +# 8. Vanes I: Behn, Dill, Kahn, Lick + +*In this lesson we'll look at Arvo's timer vane, Behn, and Dill, the terminal driver vane. This includes discussion of the Dojo terminal, and the userspace/kernelspace interfaces `%helm` and `%hood`.* + +A **vane** is an Arvo kernel module that performs essential system operations. The vanes are: + +- Ames, the peer-to-peer networking vane. +- Behn, the timer vane. +- Clay, the filesystem, revision-control and build system vane. +- Dill, the terminal-driver vane. +- Eyre, the HTTP vane. +- Gall, the application vane. +- Iris, the server HTTP vane. +- Jael, the security vane. +- Khan, the control vane. +- Lick, the interprocess communication (IPC) vane. + +> As described above, we use Arvo proper to route and control the flow of `move`s. However, Arvo proper is rarely directly responsible for processing the event data that directly causes the desired outcome of a `move`. This event data is contained within a `card`. Instead, Arvo proper passes the `card` off to one of its vanes, which each present an interface to clients for a particular well-defined, stable, and general-purpose piece of functionality. + + +## Vane Interface {#vane-interface} + +Arvo is a message dispatcher, which doesn't really know about the vanes except via their existence in a `van=(map term vane)` in Arvo's `$soul`. + +Formally, a vane must be a “vane-shaped noun”—an interface presenting the arms: + +```hoon +|% +:: +call: handle a +task request +++ call +:: +load: migrate an old state to a new vane version +++ load +:: +scry: view vane state at a particular /path +++ scry +:: +stay: extract state before reload +++ stay +:: +take: handle $response sign +++ take +-- +``` + +- `+load` and `+stay` are necessary to update the vane. +- `+call` is used to pass a request in (“advance to target”). +- `+scry` exposes the read-only scry namespace of the vane. +- `+take` receives a response from another vane (“retreat along call stack”). + +(Now the formerly-elided distinctions between `sign`, `gift`, `task`, and `note` start to matter.) + + + +- A `note` is sent by a vane to the Arvo kernel's `+call` arm. +- Arvo dispatches a `task` to a vane's `+call` arm. +- The vane performs the work. +- If a result needs to be passed back, it is emitted as a `gift` along the `duct` back to Arvo's `+take` arm. +- Arvo dispatches a `sign` to the original caller's `+take` arm. + +The actual mechanics of this are that the moves are placed into the appropriate `duct`, which is a `(list wire)`, simply an ordered collection of moves representing the causal history. + +`/sys/arvo` tracks what little it knows about vanes at a few points, e.g.: + +```hoon +:: van: vanes while we desire it (in larval stage) +van=(map term (trap vase)) +:: +++ grow + |= way=term + ?+ way way + %a %ames + %b %behn + %c %clay + %d %dill + %e %eyre + %g %gall + %i %iris + %j %jael + %k %khan + %l %lick + == +:: ++$ vane [=vase =worm] +``` + +- `vase` is of course a generic vase, but specifically it expects a noun with the correct `$type`. +- `worm` is the worm cache as discussed in [*Arvo II: The Boot Sequence*](ca04.md). + +Arvo interacts with vanes in vase mode; for instance, a scry takes place via a call to the `+scry` arm via a `+slap` against the `%limb` named `%scry`: `(~(slap wa sac) rig [%limb %scry])`. As usual, working in vase mode permits dynamic updates to the source. + +Vanes have as their subject: + +- `/sys/hoon` for language definitions. +- `/sys/arvo` for message dispatch. +- `/sys/lull` for a shared interface definition. +- `/sys/zuse` for various stdlib utilities. + +In particular, `/sys/lull` acts as a header so that vanes can “see” each other's interface. + +### Updates {#updates} + +As with other parts of the system, vanes are rebuilt if the inner core on which they rely has been updated or if the vane itself has changed. + +An update to a vane is triggered by `+mod:what:pith` in the `+le` event-loop engine. (Recall from `ca05` that `+what` is involved in a system upgrade.) While there are some unfamiliar types here, note particularly the `%=` centis clause building each vane. + +```hoon +++ mod + |= [del=news all=?] + ^+ ..pith + =^ job=oped fat.mod.sol (~(adorn adapt fat.mod.sol) del all) + =? lul.mod.sol ?=(^ lul.job) + (smit:va "lull" pit /sys/lull/hoon u.lul.job) + =? zus.mod.sol ?=(^ zus.job) + (smit:va "zuse" lul.mod.sol /sys/zuse/hoon u.zus.job) + %- %+ need:wyrd kel.ver.zen + :~ lull/;;(@ud q:(slap lul.mod.sol limb/%lull)) + zuse/;;(@ud q:(slap zus.mod.sol limb/%zuse)) + == + %= ..pith + van.mod + %+ roll van.job + |= [[nam=term txt=cord] van=_van.mod.sol] + ^+ van + =/ nex (create:va our zus.mod.sol nam /sys/vane/[nam]/hoon txt) + =/ nav (~(get by van) nam) + =? nex ?=(^ nav) (update:va vase.u.nav nex) + (~(put by van) nam (settle:va nex)) + == +``` + +The recompilation against `%zuse` takes place in `+adorn:adapt:part`. ([Arvo Pärt](https://www.youtube.com/watch?v=jNxbT0MESTY)) + +```hoon +:: kernel modules +:: +:: %zuse is the subject of the vanes; force all if we have a new %zuse +:: +=. all |(all ?=(^ zus)) +=| nav=(map term cord) +=? nav all + %- ~(gas by nav) + %+ turn + ~(tap by dir:(~(dip of fat) /sys/vane)) + |=([name=@ta _fat] [`@tas`name (sole (need fil))]) +``` + + +## Behn {#behn} + +Behn is a timer/wake-up call system. Since it's a simple vane, let's approach it obliquely, by looking at a generator that calls it. + +- Open `/base/gen/timers/hoon` and examine the code. + +```hoon +.^((list [date=@da =duct]) %bx (en-beam [our %$ [%da now]] /debug/timers)) +``` + +- [“Behn Overview”](/reference/arvo/dill/behn) + +### `/sys/lull` Definition {#syslull-definition} + +The interface to Behn is defined in `/sys/lull`: + +```hoon +:: :::: +:::: ++behn :: (1b) timekeeping + :: :::: +++ behn ^? + |% + +$ gift :: out result <-$ + $% [%doze p=(unit @da)] :: next alarm + [%wake error=(unit tang)] :: wakeup or failed + [%meta p=vase] + [%heck syn=sign-arvo] :: response to %huck + == + +$ task :: in request ->$ + $~ [%vega ~] :: + $% $>(%born vane-task) :: new unix process + [%rest p=@da] :: cancel alarm + [%drip p=vase] :: give in next event + [%huck syn=sign-arvo] :: give back + $>(%trim vane-task) :: trim state + $>(%vega vane-task) :: report upgrade + [%wait p=@da] :: set alarm + [%wake ~] :: timer activate + == + -- ::behn +``` + +(A `%vega` task informs the vane that the kernel has been upgraded.) + +### Structure {#structure} + +```hoon ++$ behn-state + $: %2 + timers=(tree [key=@da val=(qeu duct)]) + unix-duct=duct + next-wake=(unit @da) + drips=drip-manager + == +``` + +- How does Behn think of timers? + +`/sys/behn` presents three primary cores: + +1. A type definition core. +2. A helper core. +3. The primary vane interface. + +Behn only has two kinds of moves: `%wait` `task`s and `%wake` `gift`s. How are these processed and what do they result in? (See the `$timer-map` structure too.) + +- What is a drip? How is it used? + +> Say an app (the Target) is subscribed to updates from Clay (the Client). If Clay `%give`s updates to the app directly and the app crashes, this may cause Clay to crash as well. If instead Clay `%pass`es Behn a `%drip` `task` wrapping the update `gift`, Behn will set a timer for `now` that, when fired, will cause the update `gift` to be given. If it causes a crash then it will have been in response to the `%drip` move, thereby isolating Clay from the crash. Thus `%drip` acts as a sort of buffer against cascading sequences of crashes. + +### The Nested Core Pattern {#the-nested-core-pattern} + +(~rabsef-bicrym calls this the “`+abet` engine” and while it's not a popular term inside of the core development team, I like the pithiness of it. “Engine” as a term of art is frequently used in the kernel to refer to `+le` two-letter doors so this usage is not completely inconsistent, just more specialized in application.) + +The basic concept of the nested core pattern is to have an outer core which builds a list of cards and state changes, then produces the queued changes all at once. (I always think about this as being one of those wind-up cars that you crank and then set down to whir away.) + +Behn's nested core pattern is pretty simple: it has an alias to `this`, one `+emit` arm to prepend a move to a list of moves, and an `+abet` arm to yield the `[moves state]`. The `+per-event` core is used to script the neighboring `+scry` and `+call` arms for the vane without leaking state invariants. Behn's instantiation of the `+abet` pattern centralizes the helper outer core as a centralized state machine. + +These are some common `+abet` pattern arms. These are not all unique, and many cores will omit all or most of these. + +- `+abed`—initialize. Set up the state of the inner core. +- `+yoke`—initialize. Start from a particular value. +- `+abet`—finalize. Exit from an inner core to an outer core, taking changes. Commonly, take a modified valued and overwrite it into the final state with a `+put:by`. +- `+abut`—finalize. Alternative exit from `+abet` for a deletion. +- `+move`—send a move. Generalization for `+pass`/`+give`. +- `+pass`—request an action. Prepend a `%pass` move to the current list of moves. +- `+give`—return a result. Prepend a standard `%give` to the current list of moves. +- `+emit`—submit a card. Prepend a card to the current list of cards. +- `+emil`—submit cards. Prepend a list of cards to the current list of cards. + +If some state needs to be maintained, this can be built in a door, but Behn's particular example is even more basic. In files with associated doors or with multiple nested core instances, it is common to prepend a two-letter identifier to disambiguate which outer core is being scripted at any given time, such as `+mo-abet` or `+ap-emil`. + +- [The Engine Pattern](../../hoon/engine-pattern.md) + +### Vere I/O Driver: `vere/io/behn.c` {#vere-io-driver-vereiobehnc} + +Arvo acquires its timer updates from Unix via `vere/io/behn.c`. This file presents its primary interface at `u3_behn_io_init()` to initialize a timer. This simply retrieves the current Unix time for a starting point and sets up the interface. + +Each communicating vane is linked various I/O drivers in `vere/auto.c`. (These do not correspond one-to-one with vanes.) These are registered into `car_u`, a global `_u3_auto` used for I/O driver invocations and callbacks. + +For Behn, the timer is set in `_behn_ef_doze()` using `uv_timer_start()`. The corresponding wakeup timer is emitted in `_behn_time_cb()` when the `libuv` main event loop handler. + +Finally, we can examine how the injection comes back into Arvo in `_behn_time_cb()`. An `ovum` is produced by `u3_ovum_init()`, manually injected using `u3_auto_plan()`, and subscribed to with `u3_auto_peer()`. `/sys/behn` then processes this wakeup event as a `%wake` via its `+call` arm. + + +## Dill {#dill} + +Dill is Urbit's terminal driver. + +```hoon +|pass [%d %text "hello world"] +``` + +Dill as a vane is mostly responsible for actually constructing terminal sessions and coordinating input and output. Thus most of the terminal stack actually lives in userspace (instrumented by Gall) rather than in `/sys/dill`. + +What do we mean when we talk about a terminal? Originally, of course, computer were directly programmed by moving wires between vacuum tubes or chips; later, this evolved to the ability to read and output cards. Computer terminals with CRT-based character displays began to be used in the late 1950s and gradually became more common. In fact, the original plasma display screens were used with PLATO in the 1970s–1990s. + + + +When we refer to the terminal today, we typically mean a modern [terminal emulator](https://en.wikipedia.org/wiki/Terminal_emulator), which presents a terminal-like text user interface (TUI) for software to treat as if it were an actual character display. Terminal emulators need to track information like dimensions ($x$, $y$), content layout, active sessions or connexions, and cursor position. They provide affordances like a color space, escape codes, layout libraries, and scrollable sessions. + +Dill is responsible for interfacing with keystrokes and with the terminal emulator session. Since Urbit can be run in a daemon mode, it's not necessary for Dill to actually have a terminal session for Urbit to run. + +Due to terminal emulator limitations, Dill sessions are only properly supported today in the `%webterm` app. + +- [“Developer Call: Urbit’s improved Terminal Stack”](https://www.youtube.com/watch?v=E-6E-l1SxFw) +- [“Dill Overview”](/reference/arvo/dill/dill) + +### `/sys/lull` Definition {#syslull-definition} + +The `/sys/lull` interface specification for Dill is more complicated than that of Behn. Unlike Behn, a number of supporting types are necessary to produce the basic pair of `gift`/`task` for Dill. + +```hoon +:: :::: +:::: ++dill :: (1d) console + :: :::: +++ dill ^? + |% + +$ gift :: out result <-$ + $% [%blit p=(list blit)] :: terminal output + [%logo ~] :: logout + [%meld ~] :: unify memory + [%pack ~] :: compact memory + [%trim p=@ud] :: trim kernel state + [%logs =told] :: system output + == :: + +$ task :: in request ->$ + $~ [%vega ~] :: + $% [%boot lit=? p=*] :: weird %dill boot + [%crop p=@ud] :: trim kernel state + [%flog p=flog] :: wrapped error + [%heft ~] :: memory report + $>(%init vane-task) :: after gall ready + [%logs p=(unit ~)] :: watch system output + [%meld ~] :: unify memory + [%pack ~] :: compact memory + [%seat =desk] :: install desk + [%shot ses=@tas task=session-task] :: task for session + $>(%trim vane-task) :: trim state + $>(%vega vane-task) :: report upgrade + [%verb ~] :: verbose mode + [%knob tag=term level=?(%hush %soft %loud)] :: deprecated removeme + session-task :: for default session + told :: system output + == :: + :: :: + +$ session-task :: session request + $% [%belt p=belt] :: terminal input + [%blew p=blew] :: terminal config + [%flee ~] :: unwatch session + [%hail ~] :: terminal refresh + [%open p=dude:gall q=(list gill:gall)] :: setup session + [%shut ~] :: close session + [%view ~] :: watch session blits + == :: + :: :: + +$ told :: system output + $% [%crud p=@tas q=tang] :: error + [%talk p=(list tank)] :: tanks (in order) + [%text p=tape] :: tape + == :: + :: + :::: :: (1d2) + :: + +$ blew [p=@ud q=@ud] :: columns rows + +$ belt :: client input + $? bolt :: simple input + [%mod mod=?(%ctl %met %hyp) key=bolt] :: w/ modifier + [%txt p=(list @c)] :: utf32 text + == :: + +$ bolt :: simple input + $@ @c :: simple keystroke + $% [%aro p=?(%d %l %r %u)] :: arrow key + [%bac ~] :: true backspace + [%del ~] :: true delete + [%hit x=@ud y=@ud] :: mouse click + [%ret ~] :: return + == :: + +$ blit :: client output + $% [%bel ~] :: make a noise + [%clr ~] :: clear the screen + [%hop p=$@(@ud [x=@ud y=@ud])] :: set cursor col/pos + [%klr p=stub] :: put styled + [%mor p=(list blit)] :: multiple blits + [%nel ~] :: newline + [%put p=(list @c)] :: put text at cursor + [%sag p=path q=*] :: save to jamfile + [%sav p=path q=@] :: save to file + [%url p=@t] :: activate url + [%wyp ~] :: wipe cursor line + == :: + +$ dill-belt :: arvo input + $% belt :: client input + [%cru p=@tas q=(list tank)] :: errmsg (deprecated) + [%hey ~] :: refresh + [%rez p=@ud q=@ud] :: resize, cols, rows + [%yow p=gill:gall] :: connect to app + == :: + +$ dill-blit :: arvo output + $% blit :: client output + [%qit ~] :: close console + == :: + +$ flog :: sent to %dill + $% [%crop p=@ud] :: trim kernel state + $>(%crud told) :: + [%heft ~] :: + [%meld ~] :: unify memory + [%pack ~] :: compact memory + $>(%text told) :: + [%verb ~] :: verbose mode + == :: + :: :: + +$ poke :: dill to userspace + $: ses=@tas :: target session + dill-belt :: input + == :: + -- ::dill +``` + +The main concepts to keep in mind: + +- Dill receives `%belt` `task`s and sends `%blit` `gift`s. +- `%belt` `task`s result from keystrokes, terminal resizing, +- `%blit` `gift`s result from output events: putting a character, clearing the screen, placing the cursor. + +### Structure {#structure} + +Dill's primary state is its `$axle` with a logging level: + +```hoon +|% :: console protocol ++$ axle :: + $: %7 :: + hey=(unit duct) :: default duct + dug=(map @tas axon) :: conversations + eye=(jug @tas duct) :: outside observers + ear=(set duct) :: syslog listeners + lit=? :: boot in lite mode + egg=_| :: see +take, removeme + == :: ++$ axon :: dill session + $: ram=term :: console program + tem=(unit (list dill-belt)) :: pending, reverse + wid=_80 :: terminal width + == :: ++$ log-level ?(%hush %soft %loud) :: none, line, full +-- +``` + +As with `/sys/behn`, the primary cores of Dill include: + +1. Type definitions (two cores). +2. Helper core (`+as` per-cause engine). +3. The primary vane interface. + +A good way to familiarize yourself with Dill operations is to follow the full lifecycle of input and output in the next section. + +Dill is the first vane in the boot sequence, and is used to boot Jael. (Compare `%aqua`, which does not need to start Dill and can initialize Jael directly.) + +### Vere I/O Driver: `vere/io/term.c`, `ptty.c` {#vere-io-driver-vereiotermc-pttyc} + +The main entrypoint for the terminal is `u3_term_io_init()`, which simply sets up the interface and callbacks. + +As we noted before, each communicating vane is linked various I/O drivers in `vere/auto.c`. Here the global `car_u` has `term.c` connected for invocations and callbacks. + +**`vere/vere.h`**: + +```c +// u3_term_start_spinner(): prepare spinner state. RETAIN. +void u3_term_start_spinner(u3_noun say, c3_o del_o); + +// u3_term_stop_spinner(): reset spinner state and restore input line. +void u3_term_stop_spinner(void); + +// u3_term_get_blew(): return window size [columns rows]. +u3_noun u3_term_get_blew(c3_l tid_l); + +// u3_term_ef_winc(): window change. +void u3_term_ef_winc(void); + +// u3_term_ef_ctlc(): send ^C. +void u3_term_ef_ctlc(void); + +// u3_term_io_init(): initialize terminal I/O. +u3_auto* u3_term_io_init(u3_pier* pir_u); + +// u3_term_io_hija(): hijack console for cooked print. +FILE* u3_term_io_hija(void); + +// u3_term_it_log(): writes a log message +void u3_term_io_log(c3_c* line); + +// u3_term_io_loja(): release console from cooked print. +void u3_term_io_loja(int x, FILE* f); + +// u3_term_log_init(): initialize terminal for logging +void u3_term_log_init(void); + +// u3_term_log_exit(): clean up terminal. +void u3_term_log_exit(void); + +// u3_ptty_init(): initialize platform-specific tty. +u3_utty* u3_ptty_init(uv_loop_t* lup_u, const c3_c** err_c); +``` + +For instance, a keystroke is processed in the following way: + +- `uv_read_start()` is the `libuv` event loop injector. +- `_term_read_cb()` is the character keystroke callback. +- `_term_suck()` processes input. +- `_term_io_suck_char()` decides if it's an `xterm` terminal emulator issue or something for Arvo to know about. +- `_term_io_spit()` inputs the buffer and belt. +- `_term_io_belt()` actually sends a value along the belt (as an `ovum`). + +Now in Arvo, the keystroke is routed to Dill: + +- `+call` takes the `task` from Arvo. +- `+call:as` receives the input and dispatches on `%belt`. +- `+send:as` sends the action to the proper session. +- `+deal:as` signals to pass the keystroke to Gall. +- `+pass:as` executes the pass to Gall. + +Output can happen in three ways: + +1. Some output traverses a path from (say) Gall outbound. These are conventionally known as `%slog`s. + + - Somehow a noun is marked for output using a `%slog` hint: + - `~&` sigpam does this directly. + - `~>` siggar can do this with a `%slog` hint and a priority value. + - `+slog` wraps this as a function. + + Back in the runtime: + + - Once we have a `%slog` hint for the runtime, it can be emitted from the Nock processor via Nock Eleven. `noun/nock.c:_n_bint()` dispatches this via `SLOG` and thence `do_slog` in the bytecode processor. + - `noun/trace.c:u3t_slog()` prints a value directly through the `u3C.slog_f` print handler, which is `_cw_serf_send_slog()`. + - `vere/main.c::_cw_serf_send_slog()` sends the hint output to the serf. + - `vere/main.c:_cw_serf_send()` is a plea handler to send pleas to the daemon. + - `vere/newt.c:u3_newt_send()` sends a `+jam`med noun of the output (`u3s_jam_xeno()`) as a buffer to a stream. In this case, the value ends up at the `libuv` buffer using `uv_write()`. + + In this case, output results in the fact of the layout of the terminal handler rather than being explicitly known about by Dill. As a consequence, if you are building a TUI application and you don't want to have misaligned output, you need to build it directly using `%blit`s and suppress `%slog`s within your app. (See `tui-toys` for an example in `%snek`.) + +2. Other output goes via Dill because the terminal vane explicitly needs to know about its position. This are `%blit`s. + + - A program (like `%snek`) can specify to manually output a `%blit` like `%klr` (styled text) or `%put` (plain text) at the cursor. + - [`/lib/etui`](https://github.com/urbit/urbit/blob/wip/tui-toys/pkg/demo/lib/etui.hoon) offers some interesting demonstrations in this vein; see the `+zo` engine. + - `vere/io/term.c:_term_io_kick()` applies effects sent to `term.c`, including blits. It was registerd as the effect handler when the I/O drivers were registered. + - `vere/io/term.c:_term_ef_blit()` switches on the type of blit (notice it can also track a cursor position). + - `vere/io/term.c:_term_it_show_tour()` emits UTF-32 to the cursor location. + - Finally, `vere/io/term.c:_term_it_show_line()` prints at the actual cursor position. + +3. The runtime terminal manager automatically handles aspects of layout such as maintaining the input line at the bottom of the screen. See `vere/io/term.c:_term_it_restore_line()` for details. (CSI = Control Sequence Introducer sequence) A different terminal handler (like `%webterm`) may handle these decisions differently. + +Other output may follow yet a different path; for instance, `u3l_log()` directly prints using `vsnprintf()`to `stderr`. Some output goes via the `libuv` event loop, such as `u3_ptty_init()`.. + +- Examine `u3_term_log_init()` in `vere/io/term.c`. +- Examine `_term_it_show_line()` in the same file. + +### Dojo, `%hood`, `%helm`, `%drum` {#dojo-hood-helm-drum} + +Dojo is Urbit's primary CLI interface, and while it is too complicated to delve deeply into here, the major parts to consider include: + +1. Hoon parser. Real-time parsing of input, which evaluates Hoon code for syntactical correctness. (This is the reason that typing at the Dojo prompt is frequently slower than typing at other CLIs.) +2. Specialized syntax. Besides Hoon input, generators and pokes can be set up and invoked directly from the Dojo prompt. `%say` and `%ask` generators are head-tagged pairs with gates following that return `sole-result`s. + 1. `+generator` prefixes cause Dojo to look in `/gen` for a particular generator file. + 2. `|generator` is a Hood generator, on which more in a moment. + 3. `+desk!generator` invokes a generator on a particular desk. + 4. `:agent|generator` takes the output from a generator (at `/gen/agent/generator`) and feeds it as a noun to the agent's `+on-poke` arm. + 5. `-thread` invokes a thread, with a similar desk-specific prefix as above. +3. Hood. Most of the interesting Urbit instrumentation is provided to Dojo by the Hood/Helm agent pipeline. + 1. `%hood` is the overarching system app, to which Dojo redirects generator invocations prefixed with `|` bar such as `|pass` and `|install`. (Thus, `|pass` is in fact shorthand for `:hood|pass`.) + 2. `%helm` provides the interface for kernel and system functionality, such as `|verb`, `|moon`, etc. Hood calls into Helm. + 3. `%drum` manages the active CLI apps (`|dojo/link`, `Ctrl+X`). + 4. `%kiln` instruments filesystem operations using Clay. + +The overall call web of these is surprisingly tangled; as an example, let's trace `|dojo/link`, which tells Dojo to register a CLI interface to a Gall agent. + +- `/gen/hood/link` +- `/app/hood` +- `/lib/drum`→`+poke` w/ `%drum-link` +- `/lib/drum`→`+poke-link` +- `/lib/drum`→`+se-link` +- `eel` (what is this?) + + +## Exercises {#exercises} + +- Trace the entire lifecycle of `|pass [%d %text "Hello Mars!"]`. Include a function-by-function annotation and commentary. +- Write a basic vane, `/sys/vane/`. `~&` on receiving a `task`. diff --git a/docs/core-academy/ca08.md b/docs/core-academy/ca08.md new file mode 100644 index 0000000..fb61f26 --- /dev/null +++ b/docs/core-academy/ca08.md @@ -0,0 +1,626 @@ +--- +description: "Core Academy lesson on Ames covering the encrypted peer-to-peer protocol, its packet structure, message flows, cryptographic operations, integration with Urbit ID, and the Fine remote scry protocol." +layout: + title: + visible: true + description: + visible: false + tableOfContents: + visible: true + outline: + visible: true + pagination: + visible: true +--- + +# 9. Vanes II: Ames + +*This lesson covers Ames: Urbit's networking protocol. We'll see how it relates to Urbit ID and Fine, the remote scry protocol.* + +Urbit knows about three kinds of networking: Ames and Fine over Ames and HTTP over Eyre. (That is, the network protocol over the implementing vane.) Ames is the name of both the network protocol and the implementing vane. Ames is particularly for ship-to-ship communication, while Fine acts as a dispatcher for efficient data requests (such as desk updates). In this lesson, we will focus on Ames first and then foray into Fine, since Fine is in more flux. + +Ames is a good example of a vane that is conceptually straightforward but mechanically complicated. It handles networking, but has to track peer state, message flows, individual packets, network weather, etc. as gracefully as possible. + + +## Network Protocol: Ames {#network-protocol-ames} + +> Ames is an encrypted peer-to-peer network running as an overlay over UDP. Ames does not have separate addressing and identity layers (like IP and DNS). An Ames address is an identity, mapped to a phonemic string to create a memorable pseudonym, and bound to a public key for encrypted communication. (Whitepaper) + +From a kernel perspective, the point of Ames is to extend `move` semantics across more than one Arvo instance. Local vanes (such as Gall) pass a `%plea` request to Ames, which sends the message to the peer Ames over the wire. That peer's Ames then dispatches the message to the destination vane on the peer. One advantage of Ames is that it wraps all of the peer negotiation and message delivery details such that the calling vane need not be aware of these. Among Ames’ guarantees: + +1. Messages within a flow are processed in order. +2. Messages will be delivered only once to a destination vane. (“Ames can guarantee exactly-once delivery because both ships are transactional (so if they give an ack, we know for sure they have received it permanently and won't forget about it.” [~wicdev-wisryt](https://groups.google.com/a/urbit.org/g/dev/c/y_gaSpn9mxM/m/zNy1l9ufAgAJ)) + +Networking in some ways is like a duct. It requires you to keep track of the forward and reverse causal history and content. However, there are two kinds of data transmissions: commands and facts. + +> There is a categorical difference between a bus, which transports commands, and a network, which transports packets. You can drop a packet but not a command; a packet is a fact and a command is an order. Facts are inherently idempotent; learning fact _X_ twice is the same as learning it once. You can drop a packet, because you can ignore a fact. Orders are inherently sequential; if you get two commands to do thing _X_, you do thing _X_ twice. (Whitepaper) + +The Ames vane is responsible for sending and receiving messages of arbitrary length. This means that it needs to know how to build and reconstruct component packets of a message, how to route, and how to encrypt and decrypt. Ames does handle some aspects of encryption and decryption but not all. For networking, Ames gets its public keys (and breach notifications) from Jael, which in turn gets them from an Azimuth userspace agent. The actual cryptographic operations may live in `/sys/zuse` but are applied by Ames as appropriate. + +- [“Ames Overview”](../../urbit-os/kernel/ames) +- [Curtis Yarvin `~sorreg-namtyv`, Philip Monk `~wicdev-wisryt`, Anton Dyudin, and Raymond Pasco, “Urbit: A Solid-State Interpreter” (“Whitepaper”)](http://media.urbit.org/whitepaper.pdf), sections 9–10 +- [“Ames Security Audit and the Future of the Protocol”](https://urbit.org/blog/security-audit) + +### Packet Protocol {#packet-protocol} + +> Ames receives packets as Arvo events and emits packets as Arvo effects. The runtime is responsible for transferring the bytes in an Ames packet across a physical network to another ship. (Ames Tutorial) + +Ames packets have a 32-bit header followed by a variable-length body. + +**Header** + +| Bits | Representative Value | Meaning | +| --- | --- | --- | +| 31–29 | `000` | reserved bits | +| 28 | `1` | Ames or Fine? | +| 27–25 | `000` | Ames protocol version | +| 24–23 | `11` | sender address size | +| 22–21 | `11` | receiver address size | +| 20–1 | `1000.0001.1101.0010.1111` | checksum | +| 0 | `1` | is this relayed? | + +The 2-bit address size refers to the address space rank (gathering galaxies and stars together as routers). + +```hoon +++ ship-meta + |= =ship + ^- [size=@ =rank] + =/ size=@ (met 3 ship) + ?: (lte size 2) [2 %0b0] + ?: (lte size 4) [4 %0b1] + ?: (lte size 8) [8 %0b10] + [16 %0b11] +``` + +A `relay` means that the packet is not at its destination here and should be passed forward. (This is handled by `+on-hear-forward` in `/sys/ames`.) + +> If a relay responds to a scry request from its cache without asking the host, the relay should include an origin containing the last known IP and port of the host. … \[The] protocol should be resilient against the origin pointing at an unreachable IP and port. + +**Body** + +| Number of Bits | Representative Value | Meaning | +| --- | --- | --- | +| 4 bits | `0000` | sender life (mod 16) | +| 4 bits | `0000` | receiver life (mod 16) | +| variable | `0110.0111.0110.1011` | sender address | +| variable | `1111.1100.0111.0011.0000.0101.0000.0000` | receiver address | +| 48 bits | — | `origin` (if relayed) | +| 128 bits | — | `SIV` synthetic initialization vector for AES-256 | +| 16 bits | — | ciphertext size | +| variable | — | ciphertext | + +Address size is determined by the header. + +Here if the relay bit is set then 32 bits of the `origin` are the last known IPv4 address and 16 bits are the port. + +> The ciphertext is formed by `+jam`ming a `$shut-packet` and then encrypting using [`+en:sivc:aes:crypto`](../../hoon/cryptography.md#en). + +The ciphertext results from `+jam`ming the message noun into an atom then breaking the result into 1 KB or smaller payloads. Packets are numbered so that they can be ordered upon receipt. These message fragments are then assembled into a single large atom and `+cue`d back into the noun. + +Urbit messages result in raw nouns. Since Nock-derived languages are homoiconic, we could treat this noun as code directly, but instead we treat it as a cask (pair of mark and noun). We don't transmit vases over the network, but require the recipient to build the code locally. + +> Ames messages are typed; the type itself is not sent, just a label (like a MIME type) that the recipient must map to a local source path. Validation failure causes a silent packet drop, because its normal cause is a recipient that has not yet received a new protocol update; we want the sender to back off. Ames also silently drops packets for encryption failure; error reports are just an attack channel. + +#### UDP Packet Format + +> There's a lot you can do with a stateful UDP server, especially one whose semantics are reasonably formal. (CGY) + +At the host system level, the runtime communicates using the [User Datagram Protocol](https://en.wikipedia.org/wiki/User_Datagram_Protocol) (UDP) specification. UDP messages are “transaction oriented, and delivery and duplicate protection are not guaranteed.” (To compensate for this, Ames employs a unique system of acks and nacks, covered below.) Each UDP message has a brief header including destination, source, length, and checksum. It’s rather a “minimum viable” packet system. + +> A UDP datagram consists of a datagram _header_ followed by a _data_ section (the payload data for the application). The UDP datagram header consists of 4 fields, each of which is 2 bytes (16 bits). +> UDP is faster but less reliable than TCP, another common transport protocol. In a TCP communication, the two computers begin by establishing a connection via an automated process called a ‘handshake.’ Only once this handshake has been completed will one computer actually transfer data packets to the other. ([Wikipedia](https://en.wikipedia.org/wiki/User_Datagram_Protocol)) + +Urbit compensates for this lower reliability by sending until receiving an appropriate ack or nack (negative acknowledgment) in reply. + +> UDP is commonly used in time-sensitive communications where occasionally dropping packets is better than waiting. Voice and video traffic are sent using this protocol because they are both time-sensitive and designed to handle some level of loss. For example VOIP (voice over IP), which is used by many internet-based telephone services, operates over UDP. This is because a staticky phone conversation is preferable to one that is crystal clear but heavily delayed. ([Wikipedia](https://en.wikipedia.org/wiki/User_Datagram_Protocol)) + +- [RFC 768](https://tools.ietf.org/html/rfc768) (UDP specification) + +### Acks and Nacks {#acks-and-nacks} + +```hoon +:: $ack: positive ack, nack packet, or nack trace +:: ++$ ack + $% [%ok ~] + [%nack ~] + [%naxplanation =error] + == +``` + +If every message is a transaction (or event), then what is Ames acknowledging (ack) or negatively acknowledging (nack)? “A successful transaction has no result; a failed transaction is a negative ack and can contain an error dump.” + +- An _ack_ means that a piece of information has been received successfully. +- A _nack_ means that a piece of information has been received but failed to process for some reason. + +> Ames has an unusual system of acks and nacks (“negative acknowledgments”, but not like TCP’s packets of the same name; Ames nacks mean the packet was received but the message resulted in an error). In brief, each Ames packet of a message should get either an ack or a nack. In the current system, the nack may include an error message (e.g., an error code or a stack trace). ([~wicdev-wisryt](https://groups.google.com/a/urbit.org/g/dev/c/y_gaSpn9mxM/m/njlRhYZHBwAJ)) + +Each Ames packet will merit either an ack or a nack, in other words. A nack may optionally include an error trace with it (`[tag=@tas =tang]`). Ames will adaptively continue to send messages until the appropriate acks or nacks have been received. + +(A TCP nack means that the numbered packet was never received.) + +Ames will send messages and acks until a replying ack is received. “Ames guarantees that a message will only be delivered once to the destination vane.” Thus nacks allow us to also guarantee notification that a request was completed or failed. + +> 1. Always ack a dupe; never ack an ack. It's okay to ack a nack as long as you never nack a nack. ([Urbit Precepts B.1](https://urbit.org/blog/precepts)) + +If a remote ship sends a nack in response to a `%plea`, Ames waits until it receives a follow-up naxplanation and then delivers both to the local source vane. The flow blocks on needing to receive the naxplanation. + +> When a new socket is opened, the client can resend (at-least-once delivery) or fail to resend (at-most-once). The programmer has to understand that the socket is not really a bus, and make sure the POST is actually an idempotent fact rather than an imperative command. (The idempotence problem is often punted to the human layer: “Please click only once to make your purchase.”) (Whitepaper) + +Because Ames and Urbit assume several nines of uptime, sessions between ships are treated as persistent. + +> The basic argument for including end-to-end acks (and by extension, nacks) is that they’re necessary for everything except those things which we don’t care whether the message was received at all. Thus, for Ames to give the guarantee that “if you give me a payload I will get it to the other side exactly once” isn’t useful in itself, because no application cares about that. They either (1) don’t care whether it gets there or (2) care whether the request itself was “completed”, in an application-defined sense. ([Phillip Monk, `~wicdev-wisryt`](https://groups.google.com/a/urbit.org/g/dev/c/y_gaSpn9mxM/m/njlRhYZHBwAJ)) + +Keep in mind Postel’s law, also known as the robustness principle: “Be conservative in what you send, and liberal in what you accept.” + +- [Saltzer, Reed, and Clark, “End-to-End Arguments in System Design”](http://web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf) on nacks + +### Cryptography {#cryptography} + +Almost every Ames packet is encrypted using [AES-256](https://en.wikipedia.org/wiki/Advanced_Encryption_Standard). (The exception is comet self-attestation packets.) + +Urbit's cryptographic suite for jets has been organized to present a uniform interface (currently in `urcrypt/`, migrating to its own repo). This eases the development of alternative runtimes since the C functions can be utilized as an FFI (foreign function interface) with uniform call signatures and behavior. + +The details of the Azimuth PKI are discussed in `ca13`, _quod vide_. + +- [“Ames: Cryptography”](../../urbit-os/kernel/ames/cryptography.md) + +### Routing {#routing} + +> The runtime tells Ames which physical address a packet came from, represented as an opaque atom. Ames can emit a packet effect to one of those opaque atoms or to the Urbit address of a galaxy (root node), which the runtime is responsible for translating to a physical address. (See `$lane`.) + +```hoon +:: $address: opaque atomic transport address to or from unix ++$ address @uxaddress +:: $lane: ship transport address; either opaque $address or galaxy ++$ lane (each @pC address) +``` + +The `@uxaddress` value is an “opaque” address, in reality an IPv4 address for the runtime's use. + +Since galaxy addresses are provided to the runtime on boot (from an RPC call to a roller or Ethereum node, see `vere/dawn.c:_dawn_eth_rpc()`), a route is always findable for any active point. Galaxy ports are hardcoded to be at 31337 or 13337 plus the galaxy numeric offset. + +- What does `@pC` mean? + +When we say that galaxies handle routing in Ames today (but stars will play a role later), this is the part of the system to which we refer. + +### `/sys/lull` Definition {#syslull-definition} + +The `/sys/lull` interface definition for Ames is quite long and should be reviewed in its entirety. A structural summary: + +```hoon +++ ames ^? + |% + :: $task: job for ames + +$ task + $+ ames-task + $% [%hear =lane =blob] + [%dear =ship =lane] + [%heed =ship] + [%jilt =ship] + [%cork =ship] + [%tame =ship] + [%kroc bones=(list [ship bone])] + $>(%plea vane-task) + [%deep =deep] + :: + [%keen spar] + [%yawn spar] + [%wham spar] + :: + $>(%born vane-task) + $>(%init vane-task) + [%prod ships=(list ship)] + [%sift ships=(list ship)] + [%snub form=?(%allow %deny) ships=(list ship)] + [%spew veb=(list verb)] + [%cong msg=@ud mem=@ud] + [%stir arg=@t] + $>(%trim vane-task) + $>(%vega vane-task) + == + :: $gift: effect from ames + + +$ gift + $% [%boon payload=*] + [%clog =ship] + [%done error=(unit error)] + [%lost ~] + [%send =lane =blob] + :: + [%tune spar roar=(unit roar)] + :: + [%turf turfs=(list turf)] + == +``` + +#### Tasks + +``` +:: $task: job for ames +:: +:: Messaging Tasks +:: +:: %hear: packet from unix +:: %dear: lane from unix +:: %heed: track peer's responsiveness; gives %clog if slow +:: %jilt: stop tracking peer's responsiveness +:: %cork: request to delete message flow +:: %tame: request to delete route for ship +:: %kroc: request to delete specific message flows, from their bones +:: %plea: request to send message +:: %deep: deferred calls to %ames, from itself +:: +:: System and Lifecycle Tasks +:: +:: %born: process restart notification +:: %init: vane boot +:: %prod: re-send a packet per flow, to all peers if .ships is ~ +:: %sift: limit verbosity to .ships +:: %snub: set packet blocklist to .ships +:: %spew: set verbosity toggles +:: %cong: adjust congestion control parameters +:: %stir: recover from timer desync and assorted debug commands +:: %trim: release memory +:: %vega: kernel reload notification +``` + +Ames has a rather bohemian set of messaging names. Among others: + +- `%hear` a packet +- `%heed` or `%jilt` a peer +- `%plea` to send a message (common from vanes) + +Most other tasks are not used by userspace but by internal Ames state management. These are complemented by types like these: + +- `$hoot` a request packet payload +- `$yowl` a serialized response packet payload + +#### Notes + +``` +:: Messaging Gifts +:: +:: %boon: response message from remote ship +:: %clog: notify vane that %boon's to peer are backing up locally +:: %done: notify vane that peer (n)acked our message +:: %lost: notify vane that we crashed on %boon +:: %send: packet to unix +:: %tune: peek result +:: %turf: domain report, relayed from jael +``` + +Every vane can receive a `%plea` note from Ames (except Behn, Dill, Iris, Khan, Lick). This is a redirection mechanism used to forward messages that a peer's vane passed to its own Ames en route to your peer's Ames and thence to your vane. + +#### State + +``` +:: $ames-state: state for entire vane ++$ ames-state + $+ ames-state + $: peers=(map ship ship-state) + =unix=duct + =life + =rift + crypto-core=acru:ames + =bug + snub=[form=?(%allow %deny) ships=(set ship)] + cong=[msg=_5 mem=_100.000] + :: + $= dead + $: flow=[%flow (unit dead-timer)] + cork=[%cork (unit dead-timer)] + == == +``` + +- `$peers` are the state of connections to other ships, where `$ship-state` is either `%alien` or `%known`. + - `%alien` means we have no PKI data and we must queue moves until we learn how contact that ship. The `$alien-agenda` stores messages, packets, and remote scry `keen`s. + - `%known` means that we do have the peer state, on which more later. +- `$unix-duct` is a duct of moves to be sent to the host OS. +- `$life` is our own `life`, or how many times we rekeyed. +- `$crypto-core` is a handle to the cryptographic tools core. +- `$bug` describes the debug level (`|ames/verb`). +- `$snub` tracks a blocklist for incoming packets (`|ames/snub`). +- `$cong` tracks whether a flow should be considered clogged. +- `$dead` sets how long dead flows last and if they need to be restarted. + +Ames maintains a duct (queue) of ordered messages. These are passed to and received from the runtime, and represent Arvo events. Each message is encrypted at the source and decrypted at the destination using a symmetric public-key system. A message may be a `%plea` (sent to another ship); in response, Ames can receive zero or more `%boon`s. The ack–nack system is explained above; note that nacks are in response to event crashes. + +In `/sys/vane/ames`, there is a layer of versioning cruft to permit upgrades of the types (e.g. `$ames-state-5`). + +#### Peer State + +```hoon +:: $peer-state: state for a peer with known life and keys +:: +:: route: transport-layer destination for packets to peer +:: qos: quality of service; connection status to peer +:: ossuary: bone<->duct mapper +:: snd: per-bone message pumps to send messages as fragments +:: rcv: per-bone message sinks to assemble messages from fragments +:: nax: unprocessed nacks (negative acknowledgments) +:: Each value is ~ when we've received the ack packet but not a +:: nack-trace, or an error when we've received a nack-trace but +:: not the ack packet. +:: +:: When we hear a nack packet or an explanation, if there's no +:: entry in .nax, we make a new entry. Otherwise, if this new +:: information completes the packet+nack-trace, we remove the +:: entry and emit a nack to the local vane that asked us to send +:: the message. +:: heeds: listeners for %clog notifications +:: closing: bones closed on the sender side +:: corked: bones closed on both sender and receiver +:: ++$ peer-state + $+ peer-state + $: $: =symmetric-key + =life + =rift + =public-key + sponsor=ship + == + route=(unit [direct=? =lane]) + =qos + =ossuary + snd=(map bone message-pump-state) + rcv=(map bone message-sink-state) + nax=(set [=bone =message-num]) + heeds=(set duct) + closing=(set bone) + corked=(set bone) + keens=(map path keen-state) + == +``` +### Structure {#structure} + +Ames’ formal interface is included more than once (like Arvo) as the “external vane interface” and the “adult ames”, for instance. + +```hoon +++ call :: handle request stack +++ take :: handle response $sign +++ stay :: extract state before reload +++ load :: load in old state after reload +++ scry :: dereference namespace +``` + +There is a collection of `ames-helper` cores as well to handle many specific cases for unpacking and routing messages. Ames uses a more sophisticated nested core pattern than Behn did. To that end, it presents five `+abet` cores: + +- `ev` event handling core +- `mi` message receiver core +- `mu` message sender core +- `pe` per-peer processing core +- `pu` packet pump + +There is a substantial amount of legacy Ames state upgrade debris in the file as well. + +#### Scries + +As typical, scries expose internal vane state. Ames has a richer inner life than some other vanes, so you can check on peer state and snubs and message flow details. + +```hoon +.^((map ship ?(%alien %known)) %ax /=//=/peers) + +.^(ship-state:ames %ax /=//=/peers/~zod) + +!< message-pump-state:ames .^(vase %ax /=//=/snd-bones/~zod/0) +``` + +Most Ames scries aren't particularly useful to us directly unless we want to do direct network negotiation. Ames is used frequently by Gall but, from the agent's perspective, incidentally. + +### Messages & Flows {#messages-flows} + +We've looked at the packet protocol before; now let's look at Ames' mechanics of message management. + +```hoon ++$ fragment @uwfragment ++$ fragment-num @udfragmentnum ++$ message-blob @udmessageblob ++$ message-num @udmessagenum +``` + +Messages are separated into 1 KB (or smaller) fragments and sequentially numbered. + +- Examine `+split-message` to see how messages are broken up into pieces. (There's a neat optimization therein.) + +Messages are of course sent and received in fragments. The messages from a `lane` accrue for a particular `bone` using `+mi`, the message receiver core (internal alias `sink`). + +- Examine `+hear` and `+assemble-fragments`. +- Start the debug server (`|start %dbug`) and navigate to `/~debug`. Select `ames` to see message flows. + +The message pump manages unsent messages, dispatching them to the packet pump when next in the queue. + +> When we pop a message off .unsent-messages, we push as many fragments as we can into |packet-pump, which sends every packet it eats. Packets rejected by |packet-pump are placed in .unsent-fragments. +> When we hear a packet ack, we send it to |packet-pump to be removed from its queue of unacked packets. +> When we hear a message ack (positive or negative), we treat that as though all fragments have been acked. + +There are a ton of other edge cases and consistency/sanity checks on messaging, one of the reasons that Ames is relatively complicated. + +> At the end of a task, |message-pump sends a %halt task to |packet-pump, which can trigger a timer to be set or cleared based on congestion control calculations. When the timer fires, it will generally cause a packet to be re-sent. +> Message sequence numbers start at 1 so that the first message will be greater than .last-acked.message-sink-state on the receiver. + +```hoon ++$ message-pump-state + $+ message-pump-state + $: current=_`message-num`1 + next=_`message-num`1 + unsent-messages=(qeu message-blob) + unsent-fragments=(list static-fragment) + queued-message-acks=(map message-num ack) + =packet-pump-state + == +:: ++$ static-fragment + $: =message-num + num-fragments=fragment-num + =fragment-num + =fragment + == +:: ++$ partial-rcv-message + $: num-fragments=fragment-num + num-received=fragment-num + fragments=(map fragment-num fragment) + == +``` + +> A vane can pass Ames a `%heed` `task` to request Ames track a peer's responsiveness. If our `%boon`s to it start backing up locally, Ames will `give` a `%clog` back to the requesting vane containing the unresponsive peer's Urbit address. + +```hoon ++$ qos + $~ [%unborn *@da] + [?(%live %dead %unborn) last-contact=@da] +``` + +To cork a flow (or “cork a bone”) closes the flow. A dangling bone refers to an incorrect bone (a message flow was closed on one side before all message fragments were received, for instance). + +A message flow organizes a sequence of message fragments together. Within a flow, data order is guaranteed; however due to network traffic flows may arrive out of order. + +```hoon ++$ bone @udbone +:: ++$ ossuary + $: =next=bone + by-duct=(map duct bone) + by-bone=(map bone duct) + == +``` + +A `$bone` is a duct handle, a way of identifying a particular message flow over the network. + +Each `bone` increments by 4 since each flow includes a least-significant bit indicating if we send or receive pleas and a second-least-significant bit indicating if we are a diagnostic flow (naxplanation) or not. + +```hoon +> *bone +0 + +> .^([snd=(set bone) rcv=(set bone)] %ax /=//=/bones/~nes) +[snd={0} rcv={}] +``` + +The `$ossuary` holds the bone↔duct bijection and the `next-bone` to map to a duct. (Thus the increment-by-four noted above.) + +```hoon +:: $pump-metrics: congestion control state for a |packet-pump +:: +:: This is an Ames adaptation of TCP's Reno congestion control +:: algorithm. The information signals and their responses are +:: identical to those of the "NewReno" variant of Reno; the +:: implementation differs because Ames acknowledgments differ from +:: TCP's, because this code uses functional data structures, and +:: because TCP's sequence numbers reset when a peer becomes +:: unresponsive, whereas Ames sequence numbers only change when a +:: ship breaches. +:: +:: A deviation from Reno is +fast-resend-after-ack, which re-sends +:: timed-out packets when a peer starts responding again after a +:: period of unresponsiveness. +:: +:: If .skips reaches 3, we perform a fast retransmit and fast +:: recovery. This corresponds to Reno's handling of "three duplicate +:: acks". +:: +:: rto: retransmission timeout +:: rtt: roundtrip time estimate, low-passed using EWMA +:: rttvar: mean deviation of .rtt, also low-passed with EWMA +:: ssthresh: slow-start threshold +:: cwnd: congestion window; max unacked packets +:: ++$ pump-metrics + $: rto=_~s1 + rtt=_~s1 + rttvar=_~s1 + ssthresh=_10.000 + cwnd=_1 + counter=@ud + == +``` + +### Vere I/O Driver: `vere/io/ames.c` {#vere-io-driver-vereioamesc} + +As elsewhere, the `libuv` event loop processor with callback functions responds to Ames-specific initiating events, in this case, the receipt of a UDP packet. The C side of Ames handles constructing and dispatching the UDP packets that underlie Ames communications, but perhaps surprisingly `ames.c` is actually less complicated and interesting than `ames.hoon`. (There's some serialization handling too.) + +- `_ames_czar_cb()` for galaxy address resolution +- `_ames_send_cb()` for UDP transmission +- `_ames_recv_cb()` for UDP reception + +## Network Protocol: Fine {#network-protocol-fine} + +A scry is a read-only request into the scry namespace. Historically, only local scries were supported, and these were instrumented synchronously using `.^` dotket. With the addition of remote scry, a new use case and use pattern emerged: asynchronous reads over the network. + +> A ship that wants to read from a remote part of the namespace will have to pass a `%keen` task to its Ames, which then cooperates with Vere to produce the desired data. In some future event when the result is available, Ames gives it back as a `%tune` gift. From the requester's perspective, this is the entire default lifecycle of a remote scry request. + +```hoon +:: Remote Scry Tasks +:: +:: %keen: peek: [ship /vane/care/case/spur] +:: %yawn: cancel request from arvo +:: %wham: cancels all scry request from any vane +:: +``` + +Fine maintains its own state, but other than having its own types its operation is not so different from Ames that we need to delve into it hear. + +```hoon ++$ keen-state + $+ keen-state + $: wan=((mop @ud want) lte) :: request packets, sent + nex=(list want) :: request packets, unsent + hav=(list have) :: response packets, backward + num-fragments=@ud + num-received=@ud + next-wake=(unit @da) + listeners=(set duct) + metrics=pump-metrics + == ++$ want + $: fra=@ud + =hoot + packet-state + == ++$ have + $: fra=@ud + meow + == +:: ++$ meow :: response fragment + $: sig=@ux :: signature + num=@ud :: number of fragments + dat=@ux :: contents + == +:: ++$ peep :: fragment request + $: =path + num=@ud + == +:: ++$ wail :: tagged request fragment + $% [%0 peep] :: unsigned + == +:: ++$ roar :: response message + (tale:pki:jael (pair path (unit (cask)))) +:: ++$ purr :: response packet payload + $: peep + meow + == +``` + +(Having worked with remote scries some in userspace, I recommend tombstoning old endpoints when they are done being used.) + +### Runtime Scry Dispatch {#runtime-scry-dispatch} + +Remote scries are handled by the runtime rather than generating an Arvo event. + +In `vere/io/ames.c`, a scry hashtable `sac_p` is created. `ames_hear()` decides whether to inject the packet into Arvo (Ames protocol) or handle in Vere (Fine protocol). + +- `_fine_hear_request()` to receive a request +- `_fine_hear_response()` to receive the response +- `_fine_get_cache()` +- `_fine_put_cache()` + +There are also provisions for Fine scry path length etc. therein. + +- [“Guides: Remote Scry”](../userspace/remote-scry.md) +- [~rovnys-ricfer, “Remote Scry Protocol Proposal”](https://gist.github.com/belisarius222/d9a9c164817d3e8bbda3c45f7d2000b9) diff --git a/docs/core-academy/ca09.md b/docs/core-academy/ca09.md new file mode 100644 index 0000000..09728e8 --- /dev/null +++ b/docs/core-academy/ca09.md @@ -0,0 +1,502 @@ +--- +description: "Core Academy lesson on HTTP vanes covering the Eyre vane's HTTP server functionality, the Iris vane's HTTP client operations, web request/response patterns and authentication mechanisms." +layout: + title: + visible: true + description: + visible: false + tableOfContents: + visible: true + outline: + visible: true + pagination: + visible: true +--- + +# 10. Vanes III: Eyre, Iris + +*This lesson covers Arvo's HTTP server vane (Eyre) and its HTTP client vane, Iris. We discuss how Arvo interacts with clients over HTTP.* + +## Servers {#servers} + +One of Urbit's primary use cases is to act as a “personal server”. To examine this statement, we need to consider what a server does. Etymologically, a server serves a service. Generally speaking, it is the locus of a computation and coordination process. A server program is a system daemon—and since Gall agents are essentially daemons in many respects, Urbit's execution model fulfills this niche nicely. + +Some servers are physical or logical devices which talk to other devices as clients. Internet webpage and application servers typically follow this model. Other servers are software processes that run on the same hardware or local network as the client process, e.g. mail servers, print servers, and file servers. + + + +The two major operational models for servers are the _request–response_ model and the _publish–subscribe_ (pub-sub) model. The request–response pattern corresponds to pokes and gifts in Arvo terms, while the pub-sub pattern is supplied by subscriptions and updates. + +- A _client_ originates and submits requests, and receives responses. +- A _server_ accepts requests and replies with responses. + +A client and a server need to agree on a communications protocol. There are many of these, but the basis for the World Wide Web is the HyperText Transfer Protocol (HTTP). + +### Serving a Web Page {#serving-a-web-page} + +Two of the simplest actions one can take with a basic web server are to simply post a web page to any clients and to respond to interactions with that web page. Some interactions take place purely in the client session (form entry in the browser before submission), but then are propagated to the server. + +#### Requests + +HTTP requests are like Gall agent pokes: they are messages to trigger some action on the server. A method is specified (like `GET`, `PUT`, or `POST`) and the associated service-specific data follow. + +It consists of a block of request headers, a block of general headers, and a block of representation headers. These may by followed by the body. + + + +- `GET` means a read-only request for information (like an Urbit scry but without the bound namespace). +- `PUT` requests a state creation or update. +- `POST` asks for the server to process data. (Both `PUT` and `POST` are analogous to Urbit pokes.) + +#### Responses + +A [server response to a web page](https://developer.mozilla.org/en-US/docs/Web/HTTP/Messages) looks like this: + + + +The [response code](https://en.wikipedia.org/wiki/HTTP_response_code) is normatively `200 OK` for a successful page access, but `404 Not Found` and other errors and special messages also occur frequently. (It would be very interesting if Urbit would implement `402 Payment requested`.) + +The actual mechanics of communicating both kinds of communications are wrapped by `/lib/server` and Eyre. Generally speaking, an agent will receive HTTP requests in `+on-poke`, and commonly includes a `+handle-http` arm to deal with `inbound-request:eyre` values. `/lib/server` has request header parsers and response handlers which make it easy to respond appropriately (e.g. `(send:server ~ [%login-redirect './apps/my-agent'])`). + + +## Eyre {#eyre} + +Eyre is an HTTP server, which receives HTTP messages from Unix and produces HTTP messages in reply. Your agent can register endpoints which a browser or other tool can interact with. Eyre can be instrumented to work with threads and generators. + +HTTP requests include a method tag. While other methods exist, we are primarily interested in `POST`, `PUT`, and `GET` requests. Since we don't want to deal with client-side code yet, we're going to use `curl` to send requests here. + +- `POST` is only used with Eyre to obtain a cookie. + +```sh +$ curl -i localhost:8080/~/login -X POST -d "password=lidlut-tabwed-pillex-ridrup" +HTTP/1.1 204 ok +Date: Tue, 19 Jul 2022 16:28:05 GMT +Connection: keep-alive +Server: urbit/vere-1.9 +set-cookie: urbauth-~nec=0v3.pis4a.sfdhv.f1p6i.lttba.gp93q; Path=/; Max-Age=604800 +``` + +This cookie should be included in subsequent requests. + +- `PUT` requests are used to send actions to Eyre: pokes, subscriptions, acks, unsubscribe requests, and channel deletions. + +- `GET` requests are used to connect to a channel and receive any pending events. (Remember how Urbit prefers a dataflow computing model?) + +### `/sys/lull` Definition {#syslull-definition} + +```hoon +:: :::: +:::: ++eyre :: (1e) http-server + :: :::: +++ eyre ^? + |% + +$ gift + $% $>(?(%boon %done) gift:ames) :: Ames responses + [%set-config =http-config] :: configure external HTTP server + [%sessions ses=(set @t)] :: valid auth cookies + [%response =http-event:http] :: response to event from Earth + [%bound accepted=? =binding] :: response to %connect or %serve + [%grow =path] :: notification on cache entry change + == + :: + +$ task + $~ [%vega ~] + $% $>(%init vane-task) :: initialize ourself with an identity + $>(%born vane-task) :: new unix process + $>(%plea vane-task) :: network request + $>(%trim vane-task) :: trim state (memory pressure) + $>(%vega vane-task) :: report upgrade + :: + [%live insecure=@ud secure=(unit @ud)] :: live HTTPS ports + [%rule =http-rule] :: update HTTP configuration + [%eauth-host host=(unit @t)] :: set base URL for eauth + [%request secure=? =address =request:http] :: handle inbound + [%request-local secure=? =address =request:http] :: handle backdoor + [%cancel-request ~] :: cancel previous request + [%connect =binding app=term] :: connects a binding to an app + [%serve =binding =generator] :: connect binding to generator + [%disconnect =binding] :: disconnect binding to generator + :: + [%code-changed ~] :: web login code changed + [%approve-origin =origin] :: accept CORS requests from origin + [%reject-origin =origin] :: reject CORS requests from origin + [%spew veb=@] :: set verbosity + [%set-response url=@t entry=(unit cache-entry)] :: cache mapping + == +-- ::eyre +``` + +Eyre is responsible for a few subsystems that facilitate userspace applications (unlike, say, Behn or Ames, most of what Eyre does is to support userspace). + +- Authentication +- Channels +- Threads and generators +- HTTP request handling +- Scry interface + +Several of these are handled by the `$action` dispatch system, invoked when a binding matches a known path. + +> A `$binding` is a system unique mapping for a path to match. A `$binding` must be system unique because we don't want two handlers for a path; what happens if there are two different actions for `[~ /]`? + +```hoon +:: +binding: A rule to match a path. +:: ++$ binding + $: site=(unit @t) :: site: the site to match (~ for your.urbit.org) + path=(list @t) :: path: matches this prefix path + == +:: +action: the action to take when a binding matches an incoming request +:: ++$ action + $% [%gen =generator] :: dispatch to a generator + [%app app=term] :: dispatch to an application + [%authentication ~] :: internal authentication page + [%eauth ~] :: cross-ship authentication handling + [%logout ~] :: internal logout page + [%channel ~] :: gall channel system + [%scry ~] :: gall scry endpoint + [%name ~] :: respond w/ @p requester is authenticated as + [%host ~] :: respond w/ @p of the ship serving the response + [%four-oh-four ~] :: respond with the default file not found page + == +``` + +`/sys/vane/eyre` is relatively more straightforward than (say) Ames. There is only one interface (search for `~% %http-server`). + +#### Authentication + +Client sessions typically require a login. (This is not true for materials served to the clearweb, e.g. via `%blog`.) A cookie is generated for each session in response to a login using `+code`. + +```hoon +:: +authentication-state: state used in the login system +:: ++$ authentication-state + $: sessions=(map @uv session) :: map of cookies to session information + visitors=(map @uv visitor) :: visitors: in-progress incoming eauth flows + visiting=(map ship logbook) :: visiting: outgoing eauth state per ship + endpoint=[user=(unit @t) auth=(unit @t) =time] :: endpoint: hardcoded + :: local eauth endpoint + :: for %syn and %ack + == +``` + +(Visits are part of the EAuth system, _q.v._) + +Authentication is handled by `+authentication`. + +- See `+authentication` in `/sys/vane/eyre`. Locate where the session cookie is created and logged. + +Authentication is enforced by `+request-is-logged-in` and `+request-is-authenticated`. + +- See `+request-is-logged-in` in `/sys/vane/eyre`. + +#### Channels + +> Channels are the main method where a webpage communicates with Gall apps. Subscriptions and pokes are issues with `PUT` requests on a path, while `GET` requests on that same path open a persistent `EventSource` channel. +> The `EventSource` API is a sequence number based API that browser provide which allow the server to push individual events to the browser over a connection held open. In case of reconnection, the browser will send a `'Last-Event-Id'` header to the server; the server then resends all events since then. + +An [`EventSource`](https://developer.mozilla.org/en-US/docs/Web/API/EventSource) interface is a way to track server-sent events for a client session. The JS on the browser/client-side receives `text/event-stream` formatted events. So a channel is a given connection to a browser including the `EventSource` connection. + +```hoon +:: channel: connection to the browser +:: ++$ channel + $: mode=?(%json %jam) + =identity + state=(each timer duct) + next-id=@ud :: next-id: next sequence number to use + last-ack=@da :: last-ack: time of last client ack + events=(qeu [id=@ud request-id=@ud =channel-event]) :: unacked events + unacked=(map @ud @ud) :: unacked event counts by request-id + subscriptions=(map @ud [ship=@p app=term =path duc=duct]) :: gall subs + heartbeat=(unit timer) :: sse heartbeat timer + == +:: +session: server side data about a session +:: ++$ session + $: =identity :: authentication level & id of this session + expiry-time=@da :: when this session expires + channels=(set @t) :: channels opened by this session + == +:: channel-state: state used in the channel system +:: ++$ channel-state + $: session=(map @t channel) :: mapping b/w an arbitrary key to a channel + duct-to-key=(map duct @t) :: mapping from ducts to session key + == +:: channel-event: unacknowledged channel event, vaseless sign +:: ++$ channel-event + $% $>(%poke-ack sign:agent:gall) + $>(%watch-ack sign:agent:gall) + $>(%kick sign:agent:gall) + [%fact =desk =mark =noun] + == +``` + +Conventional channels communicate in JSON. Values passed into Urbit can be sent through a mark file to be transformed into a `%noun` or other type automatically. On the way out, a similar transformation can take the values back into MIME types. + +- Trace `%poke` and `%poke-json` in `/sys/vane/eyre`. +- Examine [`/lib/schooner`](https://github.com/urbit/yard/blob/main/desk/lib/schooner.hoon) and the `/mar` files in [the `%yard` desk](https://github.com/urbit/yard). How does it handle JSON transformations? What about binary types like `audio/mpeg` (MP3)? + +Noun channels make it possible for external applications to speak Urbit nouns. This means that you can communicate with an Urbit ship in a way other than using a JSON payload. The `content-type` is marked as `application/x-urb-jam`. Nouns are `+jam`med when sent into Eyre. + +- Locate where `x-urb-jam` is processed in `/sys/vane/eyre` and in `/mar/noun`. + +Thanks to the mark system and `+find-channel-mode`, it is straightforward on Urbit's side to implement noun channels. However, on the other side you need something that speaks nouns, such as [`noun.py`](https://github.com/urbit/tools/blob/master/pkg/pynoun/noun.py). + +- [“Guide: Noun channels”](../../urbit-os/kernel/eyre/noun-channels.md) +- [~nordus- mocwyl, “bird-brained guide to noun channels”](https://bird.howm.art/guide/noun-channels) + +#### HTTP request handling + +All else set aside, the real purpose of Eyre is to act as the HTTP server for an Urbit ship. Eyre maintains a server configuration. There is an `$inbound-request` type to receive an HTTP request, but the main HTTP types are actually in another arm, `+http`. + +A raw HTTP request handle happens like this: + +- Eyre subscribes to an app at `/http-response/[eyre-id]`. +- Eyre pokes the app with `%handle-http-request` and the ID. +- The app produces `%fact`s of `?(%http-response-header %http-response-data %http-response-cancel)`. + +```hoon +:: +http-config: full http-server configuration +:: ++$ http-config + $: secure=(unit [key=wain cert=wain]) :: PEM-encoded RSA private key and + :: cert or cert chain + proxy=_| :: reverse TCP proxy HTTP(s) + log=? :: keep HTTP(s) access logs + redirect=? :: send 301 redirects to upgrade HTTP to HTTPS + == +:: +http-rule: update configuration +:: ++$ http-rule + $% [%cert cert=(unit [key=wain cert=wain])] :: set/clear cert and keypair + [%turf action=?(%put %del) =turf] :: add/remove established dns binding + == +:: +address: client IP address +:: ++$ address + $% [%ipv4 @if] + [%ipv6 @is] + :: [%ames @p] + == +:: +inbound-request: +http-request and metadata +:: ++$ inbound-request + $: authenticated=? :: has a valid session cookie + secure=? :: whether this request was encrypted (https) + =address :: the source address of this request + =request:http :: the http-request itself + == +``` + +- Examine `+http`. Find the `request` and `response` handlers. In particular, see `$simple-payload`. +- Examine `/lib/server`, which contains wrapper arms for mere mortals. + +In some ways, although this is the meat-and-potatoes of Eyre, it's all rather straightforward. + +To actually get a value into userspace, Eyre sends the `response` to Gall: + +- `+request-to-app` to dispatch an `%app` `$action` to Gall. +- `+deal-as` to `%pass` to Gall. + +#### Threads and generators + +We can treat a (local) ship as a “serverless” function call for a client. + +A generator is a standalone computation based on arguments. Eyre supports generators explicitly: + +```hoon ++$ generator + $: =desk :: desk on current ship that contains the generator + path=(list @t) :: path on :desk to the generator's hoon file + args=* :: arguments passed to the gate + == +``` + +and runs them in `+request`, branch `%gen`. + +- How does the generator run? Note the `$roof` and the `+mock` call. + +Having bound a generator `/gen/eyre-gen` + +```hoon +|= [[now=@da eny=@uvJ bec=beak] ~ ~] +|= [authenticated=? =request:http] +^- simple-payload:http +=/ msg=@t + ?~ body.request + (scot %da now) + (cat 3 (cat 3 (scot %da now) 10) q.u.body.request) +=/ data=octs + (as-octs:mimes:html msg) +=/ =response-header:http + [200 ['Content-Type' 'text/plain']~] +[response-header `data] +``` + +to an endpoint `/mygen`, + +```hoon +|pass [%e [%serve `/mygen %base /gen/eyre-gen/hoon ~]] +``` + +a client may initiate a generator call by posting a `PUT` request thus: + +```sh +curl -i http://localhost:8080/mygen --data 'blah blah blah' +``` + +- Examine the [“Eyre Guide: Generators”](../../urbit-os/kernel/eyre/guide.md#generators) example. + +A thread is a transient standalone computation similar in some regards to a generator. Spider provides thread support using an Eyre binding. + +```hoon +++ handle-http-request + ~/ %handle-http-request + |= [eyre-id=@ta =inbound-request:eyre] + ^- (quip card _state) + ?> authenticated.inbound-request + =/ url (parse-request-line:server url.request.inbound-request) + ?> ?=([%spider @t @t @t @t ~] site.url) + =* desk i.t.site.url + =* input-mark i.t.t.site.url + =* thread i.t.t.t.site.url + =* output-mark i.t.t.t.t.site.url + =/ =tid (new-thread-id thread) + =. serving.state (~(put by serving.state) tid [`eyre-id output-mark desk]) + =/ tube (convert-tube %json input-mark desk bowl) + ?> ?=(^ body.request.inbound-request) + =/ body=json (need (de-json:html q.u.body.request.inbound-request)) + =/ input=vase (slop !>(~) (tube !>(body))) + =/ boc bec + =/ =start-args:spider [~ `tid boc(q desk, r da+now.bowl) thread input] + (handle-start-thread start-args) +``` + +- Examine the [“HTTP API”](../../urbit-os/base/threads/http-api.md) example. + +#### EAuth + +Eyre's EAuth system “a mechanism by which HTTP clients may authenticate themselves as a specific urbit on HTTP endpoints served by any other urbit.” In other words, you can provide a comet-like client to an arbitrary client. + +- How robust to collision is random EAuth assignment? ($\frac{1}{2^{128}-2^{64}} \approx 3^{-39}$, or one in 100 undecillion) +- If you are interested in investigating EAuth in detail, see `$visitor`, `$logbook`, `$eauth-plea`, `$eauth-boon` as well as the source description at [~palfun-foslup, “mirage (eauth)”](https://gist.github.com/Fang-/41ed84b2a6dd96ca67c6a5dbda1fd35d) and the app [~paldev, %chat-stream](https://pal.dev/lobby/). + +### SSL {#ssl} + +If you are working locally, you typically just have HTTP set up instead of secure HTTPS. SSL is a transport layer protocol formerly used for client–server encrypted channels, but now HTTPS actually uses TLS. + +The `%acme` agent configures a certificate if you have a domain set up to use with Urbit. + +- Read the `+install` arm in `/app/acme`. + +#### Scry interface + +Eyre exposes some information about bindings and connections, such as the sessions and cookies: + +```hoon +.^(authentication-state:eyre %e /=authentication-state=) + +.^((list [binding:eyre duct action:eyre]) %e /=bindings=) +``` + +### Vere I/O Driver: `vere/io/http.c` {#vere-io-driver-vereiohttpc} + +The runtime counterpart to Eyre is `vere/io/http.c`, which is the HTTP server. + +Vere's `http.c` uses `libh2o` as its HTTP server: + +> H2O is a new generation HTTP server that provides quicker response to users with less CPU utilization when compared to older generation of web servers. Designed from ground-up, the server takes full advantage of HTTP/2 features including prioritized content serving and server push + +Take especial note of the following functions: + +- `u3_http_io_init()` to start the HTTP server manager. +- `_http_serv_listen_cb()`, the callback for receiving a value +- `_http_serv_accept()` +- `_http_seq_accept()` to process a new HTTP request +- `_http_hgen_send()` to send an HTTP response + + +## Iris {#iris} + +Iris is an HTTP client. It is not currently widely used since Urbit ships do not often serve as HTTP clients (rather as peers). + +### `/sys/lull` Definition {#syslull-definition} + +```hoon +:: %iris http-client interface +:: +++ iris ^? + |% + :: +gift: effects the client can emit + :: + +$ gift + $% [%request id=@ud request=request:http] :: outbound http-request + [%cancel-request id=@ud] :: tell earth to cancel a + :: previous %request + [%http-response =client-response] :: response to the caller + == + :: + +$ task + $~ [%vega ~] + $% $>(%born vane-task) :: system started up; reset open connections + $>(%trim vane-task) :: trim state (in response to memory pressure) + $>(%vega vane-task) :: report upgrade + [%request =request:http =outbound-config] :: fetches a remote resource + [%cancel-request ~] :: cancels a previous fetch + [%receive id=@ud =http-event:http] :: receives http data from outside + == + :: +client-response: one or more client responses given to the caller + :: + +$ client-response + $% $: %progress :: periodic update along %fetch source duct + =response-header:http :: full transaction header + bytes-read=@ud :: bytes fetched so far + expected-size=(unit @ud) :: size if response had content-length + incremental=(unit octs) :: data received since last update + == + [%finished =response-header:http full-file=(unit mime-data)] :: final + :: response + [%cancel ~] :: canceled by runtime system + == + :: mime-data: externally received but unvalidated mimed data + :: + +$ mime-data [type=@t data=octs] + :: +outbound-config: configuration for outbound http requests + :: + +$ outbound-config + $: redirects=_5 :: number of times to follow 300 before error + retries=_3 :: number of retries before failing + == + -- +``` + +### Structure {#structure} + +`/sys/vane/iris` is quite short and legible. Most of the vane is either tracking connection state as a client or sending updates for data transmission progress. + +- Examine the [“Iris Guide: Example”](../../urbit-os/kernel/iris/example.md) thread. +- Use this as a springboard for examining how the response header and body are constructed in Urbit. + +Iris has relatively little information to expose at any given time, and has an extremely minimal scry interface: + +- `%i %x %whey`, show memory usage. + +### Vere I/O Driver: `vere/io/cttp.c` {#vere-io-driver-vereiocttpc} + +The runtime counterpart to Iris is `vere/io/cttp.c`, which is the HTTP client. + +- Read `u3_cttp_io_init()`, which initializes the client manager state. + +Like Eyre, Vere's Iris uses [`libh2o`](https://powerdns.org/libh2o/) as its HTTP server/client library. + +- Read `_cttp_creq_on_body()`, the callback upon receiving a response body +- `_cttp_creq_respond()` +- `_cttp_http_client_receive()` + + +## Exercise {#exercise} + +- Produce an app which allows a clearweb login. This can be done using EAuth but it would be interesting to implement standard username/password login as well. diff --git a/docs/core-academy/ca10.md b/docs/core-academy/ca10.md new file mode 100644 index 0000000..4022d23 --- /dev/null +++ b/docs/core-academy/ca10.md @@ -0,0 +1,1107 @@ +--- +description: "Description of Clay, Arvo's revision-controlled filesystem vane. Also covers the build system, marks, scry cares, %kiln." +layout: + title: + visible: true + description: + visible: false + tableOfContents: + visible: true + outline: + visible: true + pagination: + visible: true +--- + +# 11. Vanes IV: Clay + +*This lesson covers Clay, Arvo's revision-controlled filesystem vane. We also look at the build system, marks, scry cares, software upgrades, and the `%kiln` agent.* + +Clay is Urbit's version-controlled, referentially-transparent, globally-addressable filesystem. All data in Clay are typed and most are convertible to other types. + +- *Version controlled*: Clay natively supports Git-like history, branching, merging, and checkpointing. A particular continuity (“desk”) is a series of numbered commits. + +- *Referentially transparent*: for Clay, _referential transparency_ means “a request must always yield the same result for all time.” + +- *Globally addressable*: the standard resource identifier format includes the ship and time, which means that a fully-specified Clay path is similar to a [URI](https://en.wikipedia.org/wiki/Uniform_Resource_Identifier). + +- *Persistent*: Clay inherits from Arvo that all events are persisted to disk. + +- *Typed*: Clay attaches identification tags to any data and has ready to hand a set of conversion routines appropriate to the data type. These ID tags are called “marks,” and they act like MIME types. + +To adequately address Clay we need to consider its quite varied functionality. To that end, we will not make an exposition for Clay per `/sys/lull` but instead break it up topically. + +```hoon +:: clay (4c), revision control +:: +:: The way to understand Clay is to take it section-by-section: +:: +:: - Data structures. You *must* start here; make sure you understand +:: the entire contents of +raft. +:: +:: - Individual reads. +aver is the entry point, follow it through +:: +read-at-tako to understand each kind of read. +:: +:: - Subscriptions. +wake is the center of this mechanism; nothing +:: else responds to subscriptions. +wake has no arguments, which means +:: every subscription response happens when something in Clay's *state* +:: has changed. No edge-triggered responses. +:: +:: - Receiving foreign data. For individual requests, this is +:: +take-foreign-answer. For sync requests (%many, which is %sing %v +:: for a foreign desk), this is +foreign-update. +:: +:: - Ford. +ford builds hoon files and gives files their types. +:: Read +build-file for the first, and +read-file is the second. +:: +:: - Writing to a desk. Every write to a desk goes through +park, read +:: it thoroughly. +:: +:: - Merges. Control flow starts at +start-merge, then +merge, but +:: everything is scaffolding for +merge-by-germ, which is the ideal of +:: a merge function: it takes two commits and a merge strategy and +:: produces a new commit. +:: +:: - Tombstoning. This is in +tomb. +``` + +We will distribute the types as appropriate and organize these conceptually into our Core Academy approach: + +1. File and desks +2. Subscriptions and desk distribution (including OTAs) +3. Marks, tubes, and `+ford` +4. Merges and desk writes +5. Move handler +6. The scry interface +7. Solid-state subscriptions + +A warning: Clay is very old, and represents some of the darkest [jungle](https://www.youtube.com/watch?v=o1tj2zJ2Wvg) of Ye Olde Urbyt. The names and connexions are often obscure but we will bushwhack a trail through `/sys/vane/clay` together. + +- [“Clay: Architecture”](../../urbit-os/kernel/clay/architecture.md) + + +## Files and Desks {#files-and-desks} + +### Files {#files} + +> What is a file system? It’s a way to pretend that a hard drive isn’t really a bunch of spinning magnetic platters that can store bits at certain locations, but rather a hierarchical system of folders-within-folders containing individual files that in turn consist of one or more strings of bytes. (Joel Spolsky, [“The Law of Leaky Abstractions”](https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-abstractions/)) + +What is a file? Per the context set by the quote above, it is a string of bytes at the level we wish to consider it. For all purposes in Urbit as a single-level store, we ignore file fragmentation and we try to ignore [endianness](https://en.wikipedia.org/wiki/Endianness) except in certain specific cases. The “hierarchical system of folders-within-folders” has little bearing on actual storage in Clay, as the identifying `path` is simply a tag. That tag is hashed and put into a lookup table, and when you request a resource Clay checks its `map` and produces the file as a noun for you. + +Using a `%cx` scry, we can examine a particular resource on the `%base` desk as either a byte string or an ASCII text string: + +```hoon +`@ux`.^(@ %cx /===/gen/cat/hoon) +.^(cord %cx /===/gen/cat/hoon) +``` + +A small fib in the above statement is that the path includes more than just an arbitrary “file path”. In particular, every file `path` really includes a `beak` at its head (as text). (There's some waffling here between values and `knot`s in various parts of the system, since a `path` is formally `(list knot)`.) + +```hoon ++$ beak [p=ship q=desk r=case] :: path prefix +:: ++$ beam [[p=ship q=desk r=case] s=path] :: global name +:: + ++$ aeon @ud :: version number + +``` + +#### `beak` and `ship` + +The top-level determiner for Clay is the `beak`: `(p=ship q=desk r=case)`. The ship is straightforward, merely the actual ship on which a resource resides. Clay is a globally-addressable filesystem so we can refer to a hypothetical resource on any ship. (This does not mean that the value actually exists, of course.) + +```hoon ++$ ship @p +``` + +#### `desk` + +The second element of the `beak` is the `desk`. Clay organizes the world into `desk`s, which are also the most logical unit for app distribution. Essentially a desk is an organized collections of files with a common history. + +```hoon ++$ desk @tas +``` + +The structure of a desk matters for several parts of the Urbit system. Some parts are requirements: notably `/mar` which contains details on how to load the file resources as nouns, and its dependencies in `/sur` (structure files) and `/lib` (library files). By convention, the following are also present: + +- `/app` (agent files, recognized by Gall) +- `/gen` (generators, recognized by Dojo) +- `/sys` (recognized by Arvo generally but only really active on `%base`) +- `/ted` (thread files, recognized by `%spider` and `%khan`) + +(Keep in mind that inclusion of a file several times, like a mark file, need not lead to OS bloat since noun deduplication can store multiple references to a single identical resource.) + +Some desks include `/tests` for unit tests. A few other agents like `%docs` use their own conventions as well (`/doc`). + +#### `case` + +```hoon ++$ case + $% [%da p=@da] :: %da date + [%tas p=@tas] :: %tas label + [%ud p=@ud] :: %ud sequence + [%uv p=@uv] :: %uv hash + == +``` + +Most commonly, the `case` is `now`, the timestamp which would refer to the file in its current state. Internally, Clay stores everything as a sequential value `aeon` in the `%ud` format and converts to equivalent `case`s when necessary (see e.g. `+aeon-to-tako`). + +Any part of the `beak` can typically be replaced with `=` in a statement to get a default value. (Where in the parsers is this handled?) + +```hoon +> /=== +[~.~zod ~.base ~.~2022.7.17..23.50.01..3305 ~] + +> /=landscape= +[~.~zod %landscape ~.~2022.7.17..23.50.05..9d0d ~] + +> /~nec== +[~.~nec ~.base ~.~2022.7.17..23.50.13..dd5d ~] +``` + +### Desks and Commits {#desks-and-commits} + +Why bother with the `case`? Enter the _commit_, which refers to a particular revision. A desk is a collection of commits as a particular continuity. (You can think of a desk as being like a Git branch.) Each commit, or `$yaki`, is a node in a historical state graph. It includes its parents and its namespace, as well links to any associated data to change. (While Clay can support file diffs, right now it does not have that feature turned on—it simply stores entire files; see `%info` below.) + +```hoon ++$ yaki :: commit + $: p=(list tako) :: parents + q=(map path lobe) :: namespace + r=tako :: self-reference + t=@da :: date + == :: +:: ++$ tako @uvI :: yaki ref ++$ lobe @uvI :: blob ref +``` + +In a while, we will look at how commits change desk state via merging, but we can punt on that question for now. + +Knowing what commits are, we are finally prepared to examine desk state, the `$dome`: + +```hoon ++$ dome + $: let=aeon :: top id + hit=(map aeon tako) :: versions by id + lab=(map @tas aeon) :: labels + tom=(map tako norm) :: tomb policies + nor=norm :: default policy + mim=(map path mime) :: mime cache + fod=flue :: ford cache + wic=(map weft yoki) :: commit-in-waiting + liv=zest :: running agents + ren=rein :: force agents on/off + == :: +``` + +- `let` is the version number at the latest point in the desk's history. +- `hit` is the collection of version numbers pointing to commit hashes to arrive at the current state. +- `lab` is the set of labels pointing to `aeon`s (for `case`). +- `tom` is the set of tombstone policies. +- `nor` is the default tombstone policy. +- `mim` is the MIME cache. Resources are converted to and from Unix frequently without changes so by caching we can sometimes short-circuit this. +- `fod` is `+ford`'s build cache. +- `wic` is the collection of commits pending to be applied (as in a system upgrade). +- `liv` is the list of agents currently active on this desk, derived from `desk.bill` and Gall. +- `ren` is the set of agents to force on or off. + +### Notes & Gifts {#notes-gifts} + +There aren't many notes or gifts directly associated with file management since you often locally scry out individual files. (This is as opposed to desk management, which has ample moves associated therewith.) + +```hoon +:: :::: +:::: ++clay :: (1c) versioning + :: :::: +++ clay ^? + |% + +$ gift :: out result <-$ + $% [%writ p=riot] :: response + [%wris p=[%da p=@da] q=(set (pair care path))] :: many changes + == :: + +$ task :: in request ->$ + $% [%warp wer=ship rif=riff] :: internal file req + [%werp who=ship wer=ship rif=riff-any] :: external file req + == :: + -- +``` + +Basically, a `%warp` request (to either a local or a foreign ship) may result in a `%writ` gift in response. (This does require setting permissions ) + +- [“`%warp` - Remote”](../../urbit-os/kernel/clay/examples.md#foreign-ships) + +Desk management has the following associated notes. (There are no gifts since per-desk information isn't exposed in the API this way.) + +```hoon +:: :::: +:::: ++clay :: (1c) versioning + :: :::: +++ clay ^? + |% + +$ task :: in request ->$ + $% [%drop des=desk] :: cancel pending merge + [%info des=desk dit=nori] :: internal edit + [%into des=desk all=? fis=mode] :: external edit + $: %merg :: merge desks + des=desk :: target + her=@p dem=desk cas=case :: source + how=germ :: method + == :: + $: %fuse :: merge many + des=desk :: target desk + bas=beak :: base desk + con=(list [beak germ]) :: merges + == :: + [%park des=desk yok=yoki ran=rang] :: synchronous commit + [%pork ~] :: resume commit + [%prep lat=(map lobe page)] :: prime clay store + [%rein des=desk ren=rein] :: extra apps + [%tire p=(unit ~)] :: app state subscribe + [%tomb =clue] :: tombstone specific + [%zeal lit=(list [=desk =zest])] :: batch zest + [%zest des=desk liv=zest] :: live + == :: + -- +``` + +### Scries {#scries} + +The following scries specifically apply to files (single resources), commits, and desk-wide operations: + +- [`%d` - List desks.](/reference/arvo/clay/scry#d---list-desks) +- [`%p` - Return file permissions.](/reference/arvo/clay/scry#p---file-permissions) +- [`%r` - Return file as vase.](/reference/arvo/clay/scry#r---file-as-vase) +- [`%t` - List files.](/reference/arvo/clay/scry#t---list-files) +- [`%u` - Check if resource exists.](/reference/arvo/clay/scry#u---check-exists) +- [`%v` - Retrieve desk state.](/reference/arvo/clay/scry#v---desk-state) +- [`%w` - Show revision number of a given `case`.](/reference/arvo/clay/scry#w---revision-number) +- [`%x` - Read file.](/reference/arvo/clay/scry#x---read-file) +- [`%y` - Read `arch` (file hash or directory contents).](/reference/arvo/clay/scry#y---read-arch) +- [`%z` - Show content hash.](/reference/arvo/clay/scry#z---content-hash) + +Most of the time you will use `%x` or `%y` from userspace, but in the kernel you may need more sophisticated information. + + +## Subscriptions and Desk Distribution {#subscriptions-and-desk-distribution} + +A desk can subscribe to another remote desk as its upstream, meaning that any changes on the remote are automatically propagated to subscribers. Apps are typically distributed this way (the alternative being an installation from source). + +A subscription means that the upstream sponsor maintains a list of requested downstream sponsees in its `qyx.dojo` state for the appropriate desk. You can access this through a `%cx` `/cult` scry: + +```hoon +.^((set [@p rave:clay]) %cx /=//=/cult/base) +``` + +The `%next` and `%mult` requests typically result from your own Gall agent wanting to know if the desk is updated (e.g. via a `|commit`). The `%sing` requests result from subscriptions and reads. + +> `+wake` is the center of this mechanism; nothing else responds to subscriptions. `+wake` has no arguments, which means every subscription response happens when something in Clay's *state* has changed. No edge-triggered responses. + +Desk distribution is largely the same for userspace app updates and system OTAs, except that userspace apps never require a system upgrade and do not have a separate `%kids` desk. + +### OTAs {#otas} + +The most important desk subscription is your `%base` desk to your sponsor's `%kids` desk. This is how Urbit OS updates (over-the-air updates or OTAs) are propagated. The lifecycle of an OTA: + +1. The sponsor syncs her `%base` desk to her `%kids` desk. (This takes place via a `%merg` task.) See [`MAINTAINERS.md`](https://github.com/urbit/urbit/blob/develop/MAINTAINERS.md) for one procedure. + - Why have a separate `%kids` desk? +2. The sponsor's `%kids` desk notifies all of its subscribers/sponsees (in `qyx.dojo`). (This notification comes over Ames to Clay.) +3. The sponsee receives the files via remote scry. +4. Once these have arrived, then the remote update is applied via a merge. (See `+apply-foreign-update`, in particular the definition of `nut` and `hut`.) See the discussion of merges below. +5. If a system update (to Hoon, Arvo, or the vanes) is involved, then handle the OTA as discussed in [*Arvo II: The Boot Sequence*](ca04.md). This may involve simply recompiling the vanes and migrating the state, or it may require updating everything all the way back to `/sys/hoon` including a worklist. (In this latter case, see `+sys-update`, `+park` (including the `kelvin` check), and how the `%pork` `%slip` is managed.) +6. This should unblock any desks that are blocked on the system kelvin version. See `+goad` and `+wick`. +7. If this ship has any sponsees, propagate the OTA to the `%kids` desk and thence forward to the daughter points. +8. If the state of the desk is requested, calculate and produce values like the `%cz` hash. (See `/gen/vats` and `/sur/hood`'s `+report-prep` and `+report-vat`, and `/sys/clay`'s `+content-hash`, for instance.) + +## Merges and desk writes {#merges-and-desk-writes} + +A commit describes a set of changes to be made to a desk to result in a new desk state. Since desks are fundamentally collections of path-addressed resources, this is functionally similar to a regular Git-style version control system. + +Thus we need to be able to produce commits (`$yaki`s) and apply them. This process is called a merge. A merge means that we have to decide how to reconcile two claims about reality into one. This results in several possible merge types in Clay. Most of the time you'll use `%init` implicitly (via `|new-desk`, for instance) or `%only-that` when you're trying to fix a desk mismatch. + +These are ultimately concerned with reconciliation strategies involving commit types: + +```hoon ++$ miso :: file delta + $% [%del ~] :: delete + [%ins p=cage] :: insert + [%dif p=cage] :: mutate from diff + [%mut p=cage] :: mutate from raw + == :: ++$ soba (list [p=path q=miso]) :: delta +:: ++$ misu :: computed delta + $% [%del ~] :: delete + [%ins p=cage] :: insert + [%dif p=lobe q=cage] :: mutate from diff + == :: ++$ suba (list [p=path q=misu]) :: delta +:: ++$ nori :: repository action + $% [%& p=soba] :: delta + [%| p=@tas q=(unit aeon)] :: label + == :: ++$ nuri :: repository action + $% [%& p=suba] :: delta + [%| p=@tas] :: label + == :: +:: ++$ mizu [p=@u q=(map @ud tako) r=rang] :: new state ++$ moar [p=@ud q=@ud] :: normal change range ++$ moat [from=case to=case =path] :: change range +``` + +### Operations {#operations} + +#### Reads + +`+aver` scaffolds read requests (`%sing`s); see also `+read-at-tako` (which is why we needed to see commit logic before we could really examine `+read-x`). + +#### Changes + +A change to a desk can originate from at least three sources: + +1. Unix, via a mounted desk. +2. Userspace editing. +3. Update from a remote desk. + +```hoon +:: :::: +:::: ++clay :: (1c) versioning + :: :::: +++ clay ^? + |% + +$ gift :: out result <-$ + $% [%mere p=(each (set path) (pair term tang))] :: merge result + [%writ p=riot] :: response + [%wris p=[%da p=@da] q=(set (pair care path))] :: many changes + == :: + +$ task :: in request ->$ + $% [%drop des=desk] :: cancel pending merge + [%info des=desk dit=nori] :: internal edit + [%into des=desk all=? fis=mode] :: external edit + $: %merg :: merge desks + des=desk :: target + her=@p dem=desk cas=case :: source + how=germ :: method + == :: + $: %fuse :: merge many + des=desk :: target desk + bas=beak :: base desk + con=(list [beak germ]) :: merges + == :: + == :: + -- ::clay +``` + +We consider these first as single-file updates (commits), then take a look at the merge process. + +To modify a file, we must produce an `%info` write task. This requires a desk label and a `$nori` or repository action. In the case of writing a new file, the `$nori` looks like this: + +```hoon +[%& ~[[/blade/runner/txt %ins %txt !>(~['Batty' 'Pris' 'Zhora' 'Leon'])]]] +``` + +passed into Clay like this: + +```hoon +|pass [%c %info %base [%& ~[[/blade/runner/txt %ins %txt !>(~['Batty' 'Pris' 'Zhora' 'Leon'])]]]] +``` + +(A text file in Urbit is a `(list cord)` not a `cord`, per the `%txt` mark.) + + + +- Trace how the `%info` task is dispatched into Clay: `+call`→`%info`→the worklist→`+abet`. + +Merging desks is a more sophisticated operation, since it involves reconciling both current state and the parent commits. There are many ways to reconcile two versions of a desk: + +```hoon ++$ germ :: merge style + $? %init :: new desk + %fine :: fast forward + %meet :: orthogonal files + %mate :: orthogonal changes + %meld :: force merge + %only-this :: ours with parents + %only-that :: hers with parents + %take-this :: ours unless absent + %take-that :: hers unless absent + %meet-this :: ours if conflict + %meet-that :: hers if conflict + == :: +``` + +> Control flow starts at `+start-merge`, then `+merge`, but everything is scaffolding for `+merge-by-germ`, which is the ideal of a merge function: it takes two commits and a merge strategy and produces a new commit. + +The mechanics of the merge are handled by `+merge-by-germ`. For instance, an `%only-this` merge follows this rule: + +> If this is an `%only-this` merge, we check to see if `ali`'s and `bob`'s commits are the same, in which case we're done. Otherwise, we create a new commit with `bob`'s data plus `ali` and `bob` as parents. + +```hoon +++ merge-by-germ + |= [=ali=yaki bob-yaki=(unit yaki)] + ^- (unit merge-result) + ?+ germ + %only-this + ?: =(r.ali-yaki r.bob-yaki) + ~ + :* ~ + conflicts=~ + new=&+[[r.bob-yaki r.ali-yaki ~] (to-yuki q.bob-yaki)] + lat=~ + == + -- +``` + +- [“Merge Desks”](../../urbit-os/kernel/clay/tasks.md#merge-desks) +- [“`%merg`”](../../urbit-os/kernel/clay/examples.md#merg) + +### Types {#types} + +```hoon ++$ cone (map [ship desk] dome) :: domes ++$ crew (set ship) :: permissions group ++$ dict [src=path rul=real] :: effective permission ++$ domo :: project state + $: let=@ud :: top id + hit=(map @ud tako) :: changes by id + lab=(map @tas @ud) :: labels + == :: ++$ germ :: merge style + $? %init :: new desk + %fine :: fast forward + %meet :: orthogonal files + %mate :: orthogonal changes + %meld :: force merge + %only-this :: ours with parents + %only-that :: hers with parents + %take-this :: ours unless absent + %take-that :: hers unless absent + %meet-this :: ours if conflict + %meet-that :: hers if conflict + == :: ++$ mode (list [path (unit mime)]) :: external files ++$ mood [=care =case =path] :: request in desk ++$ mool [=case paths=(set (pair care path))] :: requests in desk ++$ norm (axal ?) :: tombstone policy ++$ open $-(path vase) :: get prelude ++$ page ^page :: export for compat ++$ rang :: repository + $+ rang + $: hut=(map tako yaki) :: changes + lat=(map lobe page) :: data + == :: ++$ rant :: response to request + $: p=[p=care q=case r=desk] :: clade release book + q=path :: spur + r=cage :: data + == :: ++$ rave :: general request + $% [%sing =mood] :: single request + [%next =mood] :: await next version + [%mult =mool] :: next version of any + [%many track=? =moat] :: track range + == :: ++$ real :: resolved permissions + $: mod=?(%black %white) :: + who=(pair (set ship) (map @ta crew)) :: + == :: ++$ regs (map path rule) :: rules for paths ++$ rein (map dude:gall ?) :: extra apps ++$ riff [p=desk q=(unit rave)] :: request+desist ++$ riff-any :: + $% [%1 =riff] :: + == :: ++$ rite :: new permissions + $% [%r red=(unit rule)] :: for read + [%w wit=(unit rule)] :: for write + [%rw red=(unit rule) wit=(unit rule)] :: for read and write + == :: ++$ riot (unit rant) :: response+complete ++$ rule [mod=?(%black %white) who=(set whom)] :: node permission ++$ rump [p=care q=case r=@tas s=path] :: relative path ++$ saba [p=ship q=@tas r=moar s=dome] :: patch+merge ++$ toro [p=@ta q=nori] :: general change +++ unce :: change part + |* a=mold :: + $% [%& p=@ud] :: skip[copy] + [%| p=(list a) q=(list a)] :: p -> q[chunk] + == :: +++ urge |*(a=mold (list (unce a))) :: list change ++$ waft :: kelvin range + $^ [[%1 ~] p=(set weft)] :: + weft :: ++$ whom (each ship @ta) :: ship or named crew ++$ zest $~(%dead ?(%dead %live %held)) :: how live +:: :: +``` + +## Building Code: `+ford` & Marks {#building-code-ford-marks} + +Clay is responsible for assembling and building code. Building code differs from compiling code in that Clay's `+ford` arm must collect associated cores and code (referenced via `/` fas runes) and produce the appropriate Hoon source for `+ride` and friends to process into executable Nock. (The former standalone `%ford` vane was merged into `%clay` via [Ford Fusion](https://urbit.org/blog/ford-fusion) in 2020.) + +Since Clay receives updates as source from remote desks, Clay is the _de facto_ prime mover for internal state upgrades. + +- `/sys/hoon` is stateless, so when it is updated it takes place first and just passes the worklist into the new world. +- `/sys/arvo` does maintain state, so the current state must be extracted and passed into the newly built program. +- `/sys/zuse` is stateless. +- Vanes are stateful and like Arvo may have a larval phase if necessary. (Notably Gall has one.) +- Userspace apps can then be updated by Gall using their `+on-save` and `+on-load` arms. + +Ford produces several kinds of results, but these may be grouped into file-related types and mark-related types: + +```hoon ++$ pour :: ford build w/content + $% [%file =path] + [%nave =mark] + [%dais =mark] + [%cast =mars] + [%tube =mars] + :: leafs + :: + [%vale =path =lobe] + [%arch =path =(map path lobe)] + == ++$ soak :: ford result + $% [%cage =cage] + [%vase =vase] + [%arch dir=(map @ta vase)] + [%dais =dais] + [%tube =tube] + == +``` + +### File Builds {#file-builds} + +To see an example of how `+ford` works, trace the `%a` care: + +- `+scry` → `%a` +- `+read-a` +- `+tako-ford` → `+tako-to-yaki` +- `+build-file` → `+build-dependency` (note the `+slap`) → `+read-file` → `+run-dependency` → `+parse-pile` +- Also see `+build-fit` for paths, noting how it handles `-`/`/`. + +Files are built by `+ford` in vase mode. Arvo (Gall, etc.) can then drop them back into static mode once it has the core. + +> Since building a file is a pure function, Clay memoizes the results of all builds, including builds of marks, mark conversions, and hoon source files. These memoization results are stored along with the desk and are used by later revisions of that desk. + +Ford supplies several `/` fas “runes” to build code. (Formally these are not part of Hoon and are more aking to C's `#include` statements.) These are processed in `+parse-pile` → `+pile-rule`. + +- `/?` faswut, pin kelvin version (currently ignored) +- `/-` import `/sur` files +- `/+` import `/lib` files +- `/=` arbitrary path to file +- `/~` arbitrary path to directory +- `/%` build and import mark core +- `/$` import mark conversion gate +- `/*` import file via specific mark + +**Build cares, Part I** + +- [`%a` - Build Hoon code.](/reference/arvo/clay/scry#a---build-hoon) + +### Marks {#marks} + +One of the roles of Clay in managing desks is to validate desk content. That is, does every resource in a desk have a definite way to convert to a noun representation (or to another representation, but at minimum to `%noun`)? + +> If Clay has been asked to perform a commit, it needs to validate all the files in this desk and notify all subscribers to live queries of this desk's data. Gall, for example, maintains live queries on builds of its live agents. Validation uses the Ford build system. + +A conventional DVCS filesystem like Git has special rules for handling text v. binary blob elements, Clay encourages the use of marks to identify filesystem data type and conversion routines. “It’s best defined as a symbolic mapping from a filesystem to a schema engine.” It's much like a [MIME type](https://en.wikipedia.org/wiki/Media_type), which specifies an intended data format (in a manner similar to a file extension). + +> A mark is "like an executable MIME type." It's best defined as a symbolic mapping from a filesystem to a schema engine. + +(You should get used to divorcing the conceptual relationship of data—what we could call it’s _form_ in the Platonic sense or the _noun_ in the Martian sense—from it’s _representation_ or _instantiation_. For instance, one writes a JSON file a certain way in text, but when parsing it needs to think about it at a higher level of abstraction.) + +A mark is a validated data structure, including rules for transformation between representations. In this regard, it is like a more rigorous file type. We frequently use marks in Gall agents to verify classes of operations (such as actions or updates) or to convert incoming data (such as via the JSON mark). + +> Consider a file at `/web/foo/json`. In order to validate this file, Clay must load the mark definition core and use its validation routine to ensure the untyped value of `/web/foo/json` is in fact valid JSON. To obtain this core, Clay must build the file at `/mar/json/hoon` from source and then process the resulting raw mark core using some mild metaprogramming to get a standard interface core for dealing with marks, called a `$dais`, whose type is defined in Zuse. +> Since building a source file only makes sense if the file has been validated as a `%hoon` file, but mark definitions themselves must be built from source, there's a logical dependency cycle -- who validates the validators? To break this cycle, Clay hard-codes the validation of `%hoon` files. \[(See `+read-x` in `/sys/clay`.)] This allows mark definitions to be built from source, and in fact any file can depend on any other file of any mark as long as there are no cycles. As of Ford Fusion, Ford performs a cycle check to ensure acyclicity. + +At a high level, files are validated using `+read-file`, which uses marks via `+validate-page`. So let's dive into marks. + +The simplest way to use a mark is to simply supply Dojo with the names for source and target along with a value: + +```hoon +&json &mime [/application/json (as-octs:mimes:html '"hey"')] +``` + +(In one sense, a mark is simply a label which nominally corresponds to a `/mar` file—but it is possible to have `cage`s that don't ever touch the filesystem.) + +Marks expose several arms for converting between value representations: + +- `+grab` cores convert _to_ our `mark` _from_ other `mark`s. +- `+grow` cores convert _from_ our `mark` _to_ another `mark`. +- `+grad` specify functions for revision control like creating diffs, patching files and so on. In our case, rather than writing all those functions, we've just delegated those tasks to the `%noun` `mark`. + +To convert from mark `%alfa` to mark `%bravo`, Clay tries the following operations, in order: + +- direct grow from `%alfa` +- direct grab from `%bravo` +- indirect jump from `%alfa` through `%charlie` +- indirect grab from `%bravo` through `%charlie` + +You can see this logic instantiated in `+build-cast`. + +- Construct a multi-step conversion between two marks that cannot grab/grow into each other (likely via `%noun`). + +Note that marks don't have to perfectly round-trip: if you converted a `wain` to `json` back to `wain`, you won't necessarily have the same text. + +- Examine the mark file `/mar/tape/hoon`. +- Examine the mark file `/mar/xml/hoon`. + +As practically constructed, marks are typically either simple calls to outsource to other marks and `/sur` type validation, or they may involve JSON reparsing or construction. Only rarely do more complicated marks need to be built. + +Marks can be built (using the right cares) to be either static or dynamic. + +Static `mark` conversion gates only convert from one type directly to another. These have type `$-(from to)`. + +```hoon +> =txt-to-mime .^($-(wain mime) %cf /===/txt/mime) + +> (txt-to-mime ~['foo']) +[p=/text/plain q=[p=3 q=7.303.014]] +``` + +- See `+read-f` and `+build-nave`. + +Static `mark` cores (`$nave`s) are more flexible than `%f` gates because they also supply the `+grad` arm to apply diffs. + +```hoon +:: $nave: typed mark core +:: +++ nave + |$ [typ dif] + $_ + ^? + |% + ++ diff |~([old=typ new=typ] *dif) + ++ form *mark + ++ join |~([a=dif b=dif] *(unit (unit dif))) + ++ mash + |~ [a=[ship desk dif] b=[ship desk dif]] + *(unit dif) + ++ pact |~([typ dif] *typ) + ++ vale |~(noun *typ) + -- +``` + +- See `+read-e` and `+build-tube`. + +Dynamic `mark` conversion gates, or `$tube`s, process on `vase`s instead. + +```hoon +:: $tube: mark conversion gate +:: ++$ tube $-(vase vase) +``` + +```hoon +> =txt-mime-tube .^(tube:clay %cc /===/txt/mime) + +> !< mime (txt-mime-tube !>(~['foo'])) +[p=/text/plain q=[p=3 q=7.303.014]] +``` + +- See `+read-c` and `+build-tube`. + +Finally, dynamic `mark` cores (`$dais`) are the most powerful of all: they are doors operating in vase mode on files. + +```hoon +:: $dais: processed mark core +:: ++$ dais + $_ ^| + |_ sam=vase + ++ diff |~(new=_sam *vase) + ++ form *mark + ++ join |~([a=vase b=vase] *(unit (unit vase))) + ++ mash + |~ [a=[ship desk diff=vase] b=[ship desk diff=vase]] + *(unit vase) + ++ pact |~(diff=vase sam) + ++ vale |~(noun sam) + -- +``` + +- See `+read-b` and `+build-dais`. + +- [Curtis Yarvin ~sorreg-namtyv, “Towards a New Clay”](https://urbit.org/blog/toward-a-new-clay/) (very old content but interesting) +- [“Using Marks”](../../urbit-os/kernel/clay/marks/using-marks.md) + +**Build cares, Part II** + +- [`%b` - Produce dynamic mark core.](/reference/arvo/clay/scry#b---dyn-mark-core) +- [`%c` - Produce dynamic mark conversion gate.](/reference/arvo/clay/scry#c---dyn-mark-convert) +- [`%e` - Produce static mark core.](/reference/arvo/clay/scry#e---static-mark-core) +- [`%f` - Produce static mark conversion gate.](/reference/arvo/clay/scry#f---stat-mark-convert) + + +## Clay as a Vane {#clay-as-a-vane} + +Now we're ready to have a gander at the formal vane state. + +```hoon +:: Formal vane state. +:: +:: -- `rom` is our domestic state. +:: -- `hoy` is a collection of foreign ships where we know something about +:: their clay. +:: -- `ran` is the object store. +:: -- `mon` is a collection of mount points (mount point name to urbit +:: location). +:: -- `hez` is the unix duct that %ergo's should be sent to. +:: -- `cez` is a collection of named permission groups. +:: -- `pud` is an update that's waiting on a kernel upgrade +:: ++$ raft :: filesystem + $: rom=room :: domestic + hoy=(map ship rung) :: foreign + ran=rang :: hashes + fad=flow :: ford cache + mon=(map term beam) :: mount points + hez=(unit duct) :: sync duct + cez=(map @ta crew) :: permission groups + tyr=(set duct) :: app subs + tur=rock:tire :: last tire + pud=(unit [=desk =yoki]) :: pending update + sad=(map ship @da) :: scry known broken + bug=[veb=@ mas=@] :: verbosity + == :: +``` + +- `$room` is the domestic desk state. + +```hoon +:: +:: Domestic ship. +:: +:: `hun` is the duct to dill, and `dos` is a collection of our desks. +:: ++$ room :: fs per ship + $: hun=duct :: terminal duct + dos=(map desk dojo) :: native desk + == :: +:: +:: Domestic desk state. +:: +:: Includes subscriber list, dome (desk content), possible commit state (for +:: local changes), possible merge state (for incoming merges), and permissions. +:: ++$ dojo + $: qyx=cult :: subscribers + dom=dome :: desk state + per=regs :: read perms per path + pew=regs :: write perms per path + fiz=melt :: state for mega merges + == +``` + +### Move handler {#move-handler} + +There are several engine cores embedded in `/sys/clay`: + +- `+de` desk engine to modify the desk (commits, merges, etc.) and metadata about the desk +- `+ze` utility engine to manipulate desk state itself +- `+lu` userspace agent management engine +- `+me` merge management core + +The formal Arvo interface is located at `section 4cA, filesystem vane`. It is rather complex compared to the smaller vanes, and deserves a scan through the main arms. + +The complete set of moves for Clay are: + +```hoon +:: :::: +:::: ++clay :: (1c) versioning + :: :::: +++ clay ^? + |% + +$ gift :: out result <-$ + $% [%boon payload=*] :: ames response + [%croz rus=(map desk [r=regs w=regs])] :: rules for group + [%cruz cez=(map @ta crew)] :: permission groups + [%dirk p=@tas] :: mark mount dirty + [%ergo p=@tas q=mode] :: version update + [%hill p=(list @tas)] :: mount points + [%done error=(unit error:ames)] :: ames message (n)ack + [%mere p=(each (set path) (pair term tang))] :: merge result + [%ogre p=@tas] :: delete mount point + [%rule red=dict wit=dict] :: node r+w permissions + [%tire p=(each rock:tire wave:tire)] :: app state + [%writ p=riot] :: response + [%wris p=[%da p=@da] q=(set (pair care path))] :: many changes + == :: + +$ task :: in request ->$ + $~ [%vega ~] :: + $% [%boat ~] :: pier rebooted + [%cred nom=@ta cew=crew] :: set permission group + [%crew ~] :: permission groups + [%crow nom=@ta] :: group usage + [%drop des=desk] :: cancel pending merge + [%info des=desk dit=nori] :: internal edit + $>(%init vane-task) :: report install + [%into des=desk all=? fis=mode] :: external edit + $: %merg :: merge desks + des=desk :: target + her=@p dem=desk cas=case :: source + how=germ :: method + == :: + $: %fuse :: merge many + des=desk :: target desk + bas=beak :: base desk + con=(list [beak germ]) :: merges + == :: + [%mont pot=term bem=beam] :: mount to unix + [%dirk pot=term] :: mark mount dirty + [%ogre pot=$@(term beam)] :: delete mount point + [%park des=desk yok=yoki ran=rang] :: synchronous commit + [%perm des=desk pax=path rit=rite] :: change permissions + [%pork ~] :: resume commit + [%prep lat=(map lobe page)] :: prime clay store + [%rein des=desk ren=rein] :: extra apps + [%stir arg=*] :: debug + [%tire p=(unit ~)] :: app state subscribe + [%tomb =clue] :: tombstone specific + $>(%trim vane-task) :: trim state + $>(%vega vane-task) :: report upgrade + [%warp wer=ship rif=riff] :: internal file req + [%werp who=ship wer=ship rif=riff-any] :: external file req + [%wick ~] :: try upgrade + [%zeal lit=(list [=desk =zest])] :: batch zest + [%zest des=desk liv=zest] :: live + $>(%plea vane-task) :: ames request + == :: +``` + +### The scry interface {#the-scry-interface} + +#### Scries + +Clay has more cares than any other vane because it needs to store and build Hoon code, as well as handle resource transformation using marks. We have already incidentally run into many of these scries, but for the sake of summary: + +```hoon ++$ care :: clay submode + $? %a %b %c %d %e %f :: + %p %q %r %s %t %u :: + %v %w %x %y %z :: + == :: +``` + +**Resource cares** + +- [`%d` - List desks.](/reference/arvo/clay/scry#d---list-desks) +- [`%p` - Return file permissions.](/reference/arvo/clay/scry#p---file-permissions) +- [`%r` - Return file as vase.](/reference/arvo/clay/scry#r---file-as-vase) +- [`%t` - List files.](/reference/arvo/clay/scry#t---list-files) +- [`%u` - Check if resource exists.](/reference/arvo/clay/scry#u---check-exists) +- [`%v` - Retrieve desk state.](/reference/arvo/clay/scry#v---desk-state) +- [`%w` - Show revision number of a given `case`.](/reference/arvo/clay/scry#w---revision-number) +- [`%x` - Read file.](/reference/arvo/clay/scry#x---read-file) +- [`%y` - Read `arch` (file hash or directory contents).](/reference/arvo/clay/scry#y---read-arch) +- [`%z` - Show content hash.](/reference/arvo/clay/scry#z---content-hash) + +**Build cares** + +- [`%a` - Build Hoon code.](/reference/arvo/clay/scry#a---build-hoon) +- [`%b` - Produce dynamic mark core.](/reference/arvo/clay/scry#b---dyn-mark-core) +- [`%c` - Produce dynamic mark conversion gate.](/reference/arvo/clay/scry#c---dyn-mark-convert) +- [`%e` - Produce static mark core.](/reference/arvo/clay/scry#e---static-mark-core) +- [`%f` - Produce static mark conversion gate.](/reference/arvo/clay/scry#f---stat-mark-convert) + +**System care** + +- [`%s` - Perform miscellaneous internal scries.](/reference/arvo/clay/scry#s---misc-scries) + +At this point, `%s` is the only new scry that we haven't looked at yet. (Clay unlike some other vanes certainly has a full complement of convenience scries.) + +```hoon +> =/ =dome:clay .^(dome:clay %cv %) + =/ =tako:clay (~(got by hit.dome) let.dome) + .^(tako:clay %cs %/hash/(scot %uv tako)) +0v16.er7uq.oke4u.cru7u.nglu9.q3su7.6ub1o.bh4qk.r5uav.ut12d.5rdl5 +``` + + +## `|mount` & `unix.c` {#mount-unixc} + +Urbit maintains its own single-level store including Clay via the runtime, but supports synchronizing Clay's vision with the underlying host OS. To mount a drive in this sense means to make a Unix-visible copy in the pier; the more recently timestamped of two files is considered the canonical instance. + +```hoon +:: :::: +:::: ++clay :: (1c) versioning + :: :::: +++ clay ^? + |% + +$ gift :: out result <-$ + $% [%dirk p=@tas] :: mark mount dirty + [%hill p=(list @tas)] :: mount points + [%ogre p=@tas] :: delete mount point + == :: + +$ task :: in request ->$ + $% [%boat ~] :: pier rebooted + [%mont pot=term bem=beam] :: mount to unix + [%dirk pot=term] :: mark mount dirty + [%ogre pot=$@(term beam)] :: delete mount point + == :: +``` + +Mount point information is stored in the `$raft` at `mon=(map term beam)`. The actual procedure for mounting a drive is in `+mount`; note particularly the call out to `+ergo` which brokers file synchronization to Unix via the associated `hez` duct. The emission of an `%ergo` task to Unix is handled in `vere/io/unix.c`. + +- Scan through `vere/io/unix.c`. +- How are on-Urbit edits be handled in the case of conflict with a base file? Explore this scenario. + + +## Solid-state subscriptions {#solid-state-subscriptions} + +Agents frequently need to synchronize all or some of their state via communication. To do this, they can either communicate their entire state when it changes, or they can send deltas indicating how to update the state to a particular point. (There could be checks on this like reporting a checksum or the hash.) Chat agents, for instance, send single messages rather than the total history of the chat channel to that point. + +> The more efficient solution is … to only send out instructions on how to *update* the state, but then any subscribed Agent B has to manually interpret these, update its own state, and risk getting some detail wrong. Even if this is done correctly, reimplementing this common pattern in many agents is obviously both wasting wetware and cluttering codebases. +> \[Solid-state subscriptions are] how we … implement the second solution in kernelspace, reducing code overhead, network load and memory usage at the same time. + +SSS will likely continue to evolve. + +- [`/lib/sss`](https://github.com/urbit/urbit/blob/develop/pkg/base-dev/lib/sss.hoon) +- [`/app/simple`](https://github.com/wicrum-wicrun/sss/blob/master/urbit/app/simple.hoon) + + +## Permissions {#permissions} + +Clay supports file permissions at the level of paths (and daughter paths). See `+perm` in `/sys/clay` for an example of setting permissions. This system does not appear to be used much at the current time. It does impact requests of remote file resources, which is not yet a common use pattern beyond simply publishing code. + +```hoon +:: :::: +:::: ++clay :: (1c) versioning + :: :::: +++ clay ^? + |% + +$ gift :: out result <-$ + $% [%croz rus=(map desk [r=regs w=regs])] :: rules for group + [%cruz cez=(map @ta crew)] :: permission groups + [%rule red=dict wit=dict] :: node r+w permissions + == :: + +$ task :: in request ->$ + $% [%cred nom=@ta cew=crew] :: set permission group + [%crew ~] :: permission groups + [%crow nom=@ta] :: group usage + [%perm des=desk pax=path rit=rite] :: change permissions + == :: + -- +:: ++$ crew (set ship) :: permissions group ++$ regs (map path rule) :: rules for paths ++$ rule [mod=?(%black %white) who=(set whom)] :: node permission ++$ whom (each ship @ta) :: ship or named crew ++$ rite :: new permissions + $% [%r red=(unit rule)] :: for read + [%w wit=(unit rule)] :: for write + [%rw red=(unit rule) wit=(unit rule)] :: for read and write + == + +``` + +- Trace how `|public` works. + +Some related material for `%treaty` and the docket file system will be covered in `ca12`. + +- [“Permissions”](../../urbit-os/kernel/clay/tasks.md#permissions) + + +## Kiln {#kiln} + +`%kiln` is the system affordance for interacting with Clay and Gall from userspace without composing direct tasks. It's a library inside of `%hood` and a set of associated generators. + +A `%hood` generator (located in `/gen/hood`) that wants to interact with `%kiln` needs to send a poke indicating which predefined `%kiln` action should be taken, e.g., + +```hoon +;< bind:m ~ (poke-our:strandio %hood %kiln-mount !>([pax desk])) +(pure:m !>(~)) +``` + +You can see the set of `%kiln` actions in the `+poke` arm of `/lib/kiln`. Most of these have an associated generator in `/gen/hood`. + +- Trace the `%kiln` action for `|revive`. +- Trace the `%kiln` action for `|ota`. +- Trace the `%kiln` action for `|mount`. +- See [~midden-fabler, `mount-all-desks.hoon`](https://gist.github.com/midden-fabler/5a6bd08d5e9b78113080b941b5d4e04b) for an example of using `%kiln` in another generator. + + +## Tombstoning {#tombstoning} + +> Tombstoning is the deletion of data for old desk revisions. Clay has a single `%tomb` `task`, but its [`clue`](../../urbit-os/kernel/clay/data-types.md#clue) has a number of different possible actions: + +```hoon ++$ clue :: murder weapon + $% [%lobe =lobe] :: specific lobe + [%all ~] :: all safe targets + [%pick ~] :: collect garbage + [%norm =ship =desk =norm] :: set default norm + [%worn =ship =desk =tako =norm] :: set commit norm + [%seek =ship =desk =cash] :: fetch source blobs + == +:: ++$ norm (axal ?) +``` + +A tombstoned value can no longer be successfully returned from a scry. In this case, `[~ ~]` is a response meaning that you can never know the value. + +The tombstone policy (`$norm`) affects a recursive directory structure. + +- Does the cache retain a tombstoned value? + +- [`%tomb` - Tombstoning](../../urbit-os/kernel/clay/tasks.md#tomb---tombstoning) + + +## Story {#story} + +Story is a set of generators to produce Clay commit messages. The actual messages are stored in a file in Clay, effectively using a Clay as a database. The generators are instrumented through `%hood`/`%helm` so they can pass notes to Arvo. + +```hoon +> |new-desk %tale + +> |mount %tale + +> |cp /===/mar/story/hoon /=tale=/mar/story/hoon ++ /~zod/tale/2/mar/story/hoon + +> |cp /===/sur/story/hoon /=tale=/sur/story/hoon ++ /~zod/tale/3/lib/story/hoon + +> |cp /===/lib/story/hoon /=tale=/lib/story/hoon ++ /~zod/tale/4/lib/story/hoon + +> |story-init, =desk %tale ++ /~zod/tale/5/story + +> +story-read, =desk %tale + +> |story-write 'Short message' 'Long descriptive message', =desk %tale +: /~zod/tale/6/story + +> +story-read, =desk %tale +commit: 0vn.l7i50.emt3e.79vbv.tjuv6.ftaqk.pos61.iqa5q.j0jq4.7mn92.vjssn +Short message + +Long descriptive message +``` + +Story is supported in `%base`, but you'll need to make the mark available on the target desk as done here. + + +## The Future of Clay {#the-future-of-clay} + +Clay does some things very well, but at the current scale of Urbit it hasn't really been stress-tested to its performance limits much. (There are some limits on the number of tokens that can be loaded from a single file, for instance.) + +There are really two directions we can go with Clay: strip it back down towards source control and distribution, or scale it up into a full noun management system. + +The first approach is rooted in an argument that Clay shouldn't do everything, but instead should push off aspects of file management and data storage to Gall instead. The details of this have not been laid out explicitly in any document I'm aware of, but it has been discuss in core architecture meetings. + +The other possibility is that Gall and Clay merge into a hypothetical vane called Hume, which then manages agents and agent data in the same space as files and source. + + +## Exercise {#exercise} + +- Produce a `%hood` generator that triggers `%kiln` to produce a file containing the line count of a supplied text file. This file should have the same name but a `.wc` suffix (which will require a mark). A `%txt` mark results in `(list cord)`. +- Walk through producing an OTA for a fake ~zod sponsor and a ~marzod sponsee. See [`MAINTAINERS.md`](https://github.com/urbit/urbit/blob/develop/MAINTAINERS.md) for details of that process (under “Release Next Release Candidate”); you will obviously need to change ship identities. + +> There are always horrid exceptions, even in common use -- like extensionless Makefiles. The trivial solution is that if `%clay` finds a file mysterious, it won't track it. +- What happens to a `Makefile` today (that is, a file without a suffix)? diff --git a/docs/core-academy/ca11.md b/docs/core-academy/ca11.md new file mode 100644 index 0000000..d3a52ec --- /dev/null +++ b/docs/core-academy/ca11.md @@ -0,0 +1,657 @@ +--- +description: "Covers Gall, Arvo's userspace application framework vane, the agent type, running agents, and userspace software updates." +layout: + title: + visible: true + description: + visible: false + tableOfContents: + visible: true + outline: + visible: true + pagination: + visible: true +--- + +# 12. Vanes V: Gall and Userspace + +*This lesson covers Gall: the `$agent` type, running agents, and userspace software updates.* + +## Gall {#gall} + +Currently, the end user zone of Urbit, denoted “userspace”, is supplied primarily by the `/sys/vane/gall` vane. Much as Arvo acts as a dispatcher and state manager for a functional OS in terms of transactions between vanes, Gall acts as a dispatcher and state manager for longrunning daemons. These daemons are referred to as “agents”, and comprise the main way that users actually use Urbit. (Other parts of userspace include generators and threads.) + +Gall is responsible to manage userspace applications and their state, including subscribers. To fully understand agent userspace, we want to cover a few related topics: + +1. Agents +2. `/sys/vane/gall` +3. Treaty/Docket publication +4. Agent wrappers +5. Historic (dynamic) Gall + +We will cover threads in the lesson on Khan and Lick. We covered generators previously in the lesson on Dill and Dojo. + + +## Agents {#agents} + +> An agent is a piece of software that is primarily focused on maintaining and distributing a piece of state with a defined structure. It exposes an interface that lets programs read, subscribe to, and manipulate the state. Every event happens in an atomic transaction, so the state is never inconsistent. Since the state is permanent, when the agent is upgraded with a change to the structure of the state, the developer provides a migration function from the old state type to the new state type. + +What is an agent in practice? In contemporary static Gall, an agent is a core that hews to the definition: + +```hoon +:: :::: +:::: ++gall :: (1g) extensions + :: :::: +++ gall ^? + |% + :: + :: +agent: app core + :: + ++ agent + =< form + |% + +$ step (quip card form) + +$ card (wind note gift) + +$ note + $% [%agent [=ship name=term] =task] + [%arvo note-arvo] + [%pyre =tang] + :: + [%grow =spur =page] + [%tomb =case =spur] + [%cull =case =spur] + == + +$ task + $% [%watch =path] + [%watch-as =mark =path] + [%leave ~] + [%poke =cage] + [%poke-as =mark =cage] + == + +$ gift + $% [%fact paths=(list path) =cage] + [%kick paths=(list path) ship=(unit ship)] + [%watch-ack p=(unit tang)] + [%poke-ack p=(unit tang)] + == + +$ sign + $% [%poke-ack p=(unit tang)] + [%watch-ack p=(unit tang)] + [%fact =cage] + [%kick ~] + == + ++ form + $_ ^| + |_ bowl + ++ on-init + *(quip card _^|(..on-init)) + :: + ++ on-save + *vase + :: + ++ on-load + |~ old-state=vase + *(quip card _^|(..on-init)) + :: + ++ on-poke + |~ [mark vase] + *(quip card _^|(..on-init)) + :: + ++ on-watch + |~ path + *(quip card _^|(..on-init)) + :: + ++ on-leave + |~ path + *(quip card _^|(..on-init)) + :: + ++ on-peek + |~ path + *(unit (unit cage)) + :: + ++ on-agent + |~ [wire sign] + *(quip card _^|(..on-init)) + :: + ++ on-arvo + |~ [wire sign-arvo] + *(quip card _^|(..on-init)) + :: + ++ on-fail + |~ [term tang] + *(quip card _^|(..on-init)) + -- + -- + -- ::gall + +``` + +A Gall agent must have ten arms. (There's a fascinating bit of self-reference in the state definition going on, and to be honest I'm a little surprised that it works, but it is an iron core.) The definitions here are for `|~` barsig arms in a `^|` ketbar core. + +- [`|~` barsig](../../hoon/rune/bar.md#barsig) yields an iron gate. +- [`^|` ketbar](../../hoon/rune/ket.md#ketbar) yields an iron core from a gold core. + +So we must at last really grapple with the [core variance model](../hoon-school/R-metals.md) in Urbit. This is often notorious to understand because we don't have great metaphors or analogues to type variance in real life. Right now, what we need to understand is that an iron/contravariant core is _opaque_: That is, we use this to define an interface in which the argument can be less specific than the interface and the result can be more specific. Contravariance is useful for flexibility in input values (`sample`s). + +> An `%iron` core `i` has a write-only sample (payload head, `+6.i`) and an opaque context (payload tail, `+7.i`). A core `j` which nests within it must be a `%gold` or `%iron` core, such that `+6.i` nests within `+6.j`. Hence, **contravariant**. + +The archetypal Gall agents in `/sys/lull` are composed using iron gates since they will be used as examples for building actual agent cores. Likewise, the `+rs` and sister gates in `/sys/hoon` are built using iron doors with specified rounding behavior so when you actually use the core (like `+add:rs`) the core you are using has been built as an example. + +- How are the iron gate runes actually implemented in the Hoon type system? (See `+deem:nest:ut` and `+peel:ut`.) +- Try to implement an agent missing an arm, like `+on-fail`. +- Bonus question: what half-implemented rune produces an `%iron` core? + +We construct an agent explicitly in an `/app` file by applying `%- agent:gall` to a correctly-shaped core. + +- What does each arm produce? + +```hoon +++ on-init (quip card _agent) +++ on-save (vase) +++ on-load (quip card _agent) +++ on-poke (quip card _agent) +++ on-watch (quip card _agent) +++ on-leave (quip card _agent) +++ on-peek (unit (unit cage)) +++ on-agent (quip card _agent) +++ on-arvo (quip card _agent) +++ on-fail (quip card _agent) +``` + +Finally, we can take a gander at what that ubiquitous `+quip` is: + +```hoon +++ quip + |$ [item state] + [(list item) state] +``` + +(It's just a wrapper for `(list item) state`.) + +Basically, every arm must produces a list of effects and a state change, if any. + +- What does each arm expect? + +```hoon +++ on-init :: not a gate, only an arm +++ on-save :: not a gate, only an arm +++ on-load |= =vase +++ on-poke |= =cage +++ on-watch |= =path +++ on-leave |= =path +++ on-peek |= =path +++ on-agent |= [=wire =sign:agent:gall] +++ on-arvo |= [=wire =sign-arvo] +++ on-fail |= [=term =tang] +``` + +We'll need to differentiate the Gall `$sign`s and the Arvo `$sign`s in a moment. + +When `+ford:clay` reads in a Gall agent file from `/app`, it automatically composes cores together using `=>` tisgar. (This leads to a slightly disconcerting situation in which the cores are simply present serially in a file.) + +- Compare the definition of `$agent:shoe` with `$agent:gall`. How does this correctly extend the Gall agent definition for the type system? + + +## Vane {#vane} + +While Gall facilitates very complex userspace apps, the vane itself is rather modest, weighing in at less than half the size of Clay or Ames. Gall knows how to route events to the handler arms in a standard agent core, and it instruments upgrades and subscriptions. + +However, we have to consider Gall at two levels: the vane level, which manages top-level state like the set of running agents and queued moves, and the agent level, which manages agents as doors. + +Gall is a landlocked vane. It has no runtime counterpart. + +### `/sys/lull` Definition {#syslull-definition} + +```hoon +:: :::: +:::: ++gall :: (1g) extensions + :: :::: +++ gall ^? + |% + +$ boar (map [=wire =ship =term] nonce=@) :: and their nonces + +$ dude term :: server identity + +$ gill (pair ship term) :: general contact + +$ load (list [=dude =beak =agent]) :: loadout + +$ scar :: opaque duct + $: p=@ud :: bone sequence + q=(map duct bone) :: by duct + r=(map bone duct) :: by bone + == :: + +$ suss (trel dude @tas @da) :: config report + +$ well (pair desk term) :: + +$ deal + $% [%raw-poke =mark =noun] + task:agent + == + +$ unto + $% [%raw-fact =mark =noun] + sign:agent + == + -- ::gall +``` + +Most of the important types have been separated and are called out below. + +(Variance again: [`^?` ketwut](../../hoon/rune/ket.md#ketwut) is for a lead/bivariant core.) + +### Vane State {#vane-state} + +```hoon ++$ state + $: system-duct=duct + outstanding=(map [wire duct] (qeu remote-request)) + contacts=(set ship) + yokes=(map term yoke) + blocked=(map term (qeu blocked-move)) + =bug + == +``` + +- `system-duct` is the set of outbound moves to other vanes (like Ames for subscriptions) or remote agent contacts. +- `outstanding` is the outstanding request queue. +- `contacts` is the set of other ships with which we are in communication. +- `yokes` is the set of running agents. +- `blocked` is the set of moves to agents that haven't been started yet. +- `bug` is the debug print configuration. + +### Vane Moves {#vane-moves} + +```hoon +|% ++$ gift :: outgoing result + $% [%boon payload=*] :: ames response + [%done error=(unit error:ames)] :: ames message (n)ack + [%unto p=unto] :: + == :: ++$ task :: incoming request + $~ [%vega ~] :: + $% [%deal p=sock q=term r=deal] :: full transmission + [%sear =ship] :: clear pending queues + [%jolt =desk =dude] :: (re)start agent + [%idle =dude] :: suspend agent + [%load =load] :: load agent + [%nuke =dude] :: delete agent + [%doff dude=(unit dude) ship=(unit ship)] :: kill subscriptions + [%rake dude=(unit dude) all=?] :: reclaim old subs + $>(%init vane-task) :: set owner + $>(%trim vane-task) :: trim state + $>(%vega vane-task) :: report upgrade + $>(%plea vane-task) :: network request + [%spew veb=(list verb)] :: set verbosity + [%sift dudes=(list dude)] :: per agent + == :: +-- +``` + +### Agent State {#agent-state} + +```hoon ++$ bitt (map duct (pair ship path)) :: incoming subs ++$ boat (map [=wire =ship =term] [acked=? =path]) :: outgoing subs ++$ bowl :: standard app state + $: $: our=ship :: host + src=ship :: guest + dap=term :: agent + == :: + $: wex=boat :: outgoing subs + sup=bitt :: incoming subs + $= sky :: scry bindings + %+ map path :: + ((mop @ud (pair @da (each page @uvI))) lte) :: + == :: + $: act=@ud :: change number + eny=@uvJ :: entropy + now=@da :: current time + byk=beak :: load source + == == :: +``` + +Every agent needs two parts of its state: the `$bowl`, which is the information outside of the agent that Gall needs to communicate for the vane, and the + +### Agent Moves {#agent-moves} + +```hoon +|% ++$ card (wind note gift) ++$ note + $% [%agent [=ship name=term] =task] + [%arvo note-arvo] + [%pyre =tang] + :: + [%grow =spur =page] + [%tomb =case =spur] + [%cull =case =spur] + == ++$ task + $% [%watch =path] + [%watch-as =mark =path] + [%leave ~] + [%poke =cage] + [%poke-as =mark =cage] + == ++$ gift + $% [%fact paths=(list path) =cage] + [%kick paths=(list path) ship=(unit ship)] + [%watch-ack p=(unit tang)] + [%poke-ack p=(unit tang)] + == ++$ sign + $% [%poke-ack p=(unit tang)] + [%watch-ack p=(unit tang)] + [%fact =cage] + [%kick ~] + == +-- +``` + +A Gall `$card` differs from an Arvo card: + +```hoon +:: Arvo ++$ card (cask) :: tagged, untyped event +++ cask |$ [a] (pair mark a) :: marked data builder +:: +:: Gall ++$ card (wind note gift) +++ wind + |$ :: a: forward + :: b: reverse + :: + [a b] + $% :: %pass: advance + :: %slip: lateral + :: %give: retreat + :: + [%pass p=path q=a] + [%slip p=a] + [%give p=b] + == ++$ note + $% [%agent [=ship name=term] =task] + [%arvo note-arvo] + [%pyre =tang] + :: + [%grow =spur =page] + [%tomb =case =spur] + [%cull =case =spur] + == ++$ gift + $% [%fact paths=(list path) =cage] + [%kick paths=(list path) ship=(unit ship)] + [%watch-ack p=(unit tang)] + [%poke-ack p=(unit tang)] + == +``` + +Gall does not permit a `%slip`, so a card is either: + +- `[%pass path note]` +- `[%give gift]` + + +### Structure {#structure} + +The two main engine cores within `/sys/vane/gall` are: + +- `+mo` Arvo move handler +- `+ap` agent-level core + +The `+abet` pattern used in Gall prefixes each arm with the containing door abbreviation so you can remain more easily oriented within `/sys/vane/gall`. + +#### `+mo` Arvo move handler + +Many `+mo` calls resolve into `+ap` calls. It mainly sets things up around particular per-agent calls. + +#### `+ap` agent-level core + +To run an agent, we have to know the state of the agent, which includes its state and relevant bowl information: + +```hoon +:: $yoke: agent runner state +:: ++$ yoke + $% [%nuke sky=(map spur @ud)] + $: %live + control-duct=duct + run-nonce=@t + sub-nonce=_1 + =stats + =bitt + =boat + =boar + code=* + agent=(each agent vase) + =beak + marks=(map duct mark) + sky=(map spur path-state) + ken=(jug spar:ames wire) + == == +``` + +- `control-duct` is the duct of +- `run-nonce` is a unique nonce for each build. +- `sub-nonce` is global `%watch` nonce. +- `stats` +- `bitt` is the set of incoming subscriptions (for the `bowl`). +- `boat` is the set of outgoing subscriptions (for the `bowl`). +- `code` is the most recently loaded code as a noun. +- `agent` is the agent core, possibly as a vase. +- `beak` is the compilation source. +- `marks` is the map of mark conversion requests. +- `sky` is the map of scry bindings. +- `ken` is the map of sets of open `%keen` remote scry requests. + +A typical call from `+mo` to `+ap` will be predicated on `+ap` setting up a Gall agent with its state and processing the incoming move through the appropriate arm. + +For instance, this is the lifecycle of a scry call to Gall: + +- A scry handler (`roof`) produces a call to Gall's `+scry` arm. + - `+mo` as a door needs a duct and a set of moves. + - `+mo-peek` sets up a call to `+ap-peek` along the given path with the appropriate care. + - `+ap-abed` sets up the agent noun for evaluation. + - `+ap-yoke` loads the actual agent state; an agent is a door with a state and bowl sample. + - `+ap-peek` parses the scry path appropriately. + - `+ap-mule-peek` evaluates the code using `[9 2 0 1]` and `+mock` (see `ca01` for a refresher). + - `+ap-agent-core` sets up the agent core with its current bowl and state; this includes a `+on-peek` arm since we know the shape of the `$agent` core. + - `+ap-construct-bowl` produces the agent-ready bowl from Gall-level information. + +The lifecycle of a poke looks like this: + +- A move is injected targeting Gall's `+call` arm as a `%deal` (indicating that the move goes to an agent). + - `+call` dispatches to `+mo-handle-use` for an agent. + - `+mo-handle-local` is for running local agents. + - `+mo-apply` and `+mo-apply-sure` prepare to call `+ap`. + - `+ap-abed` sets up the agent noun for evaluation. + - `+ap-apply` dispatches several kinds of operations, including pokes. + - `+ap-poke` queues a `%poke-ack` (since it's first among moves) and calls `+ap-ingest`. + - `+ap-ingest` calls the agent arm. + - `+ap-handle-result` and `+ap-handle-peers` take care of watches etc. + - `+ap-agent-core` sets up the agent core with its current bowl and state; this includes a `+on-poke` arm since we know the shape of the `$agent` core. + - `+ap-abet` yields the list of cards to resolve back to `+mo`, but also the `$yoke`, which is the new agent state for Gall's `state`. + - `+mo-abet` finalizes. + +What about an agent modification like `|nuke`? Let's see the lifecycle of that call. + +- `/gen/hood/nuke` → `%kiln-nuke` + - `/lib/hood/kiln` → `+poke-nuke` + - `[%pass /nuke %arvo %g [%nuke dude]]` + - `/sys/vane/gall`: + - `+call` + - `+mo` + - `+mo-nuke` + - `+ap` + - `+ap-abed` + - `+ap-nuke` is where the real work is done. Review it. + - `+ap-ingest` + - `+ap-abet` + - `+mo-abet` + +Clay actually governs which agents can run on a given desk. How does `|install` instigate this? + +- See `+goad` in `/sys/vane/clay`. + +There is additionally some plumbing around Gall receiving responses in `+take` for the vane (`/sys`) versus for an agent (`/use`). + +We'll need to differentiate the Gall `$sign`s and the Arvo `$sign`s in a moment. + +### Scry interface {#scry-interface} + +Gall brokers two kinds of scries: vane scries and agent scries. + +> In order to hit the vane-level endpoints, the beginning of the the `spur` (e.g. the `path` after the `beak`) _must_ be a `%$` empty element. For example: + +```hoon +.^(desk %gd /=acme=/$) +.^((set [=dude:gall live=?]) %ge /=base=/$) +.^((list path) %gt /=acme=//foo) +``` + +- [`%d`: get desk of app](../../urbit-os/kernel/gall/scry.md#d-get-desk-of-app) +- [`%e`: running apps](../../urbit-os/kernel/gall/scry.md#e-running-apps) +- [`%f`: nonces of apps](../../urbit-os/kernel/gall/scry.md#f-nonces-of-apps) +- [`%n`: get nonce of subscription](../../urbit-os/kernel/gall/scry.md#n-get-nonce-of-subscription) +- [`%t`: remote scry subpaths](../../urbit-os/kernel/gall/scry.md#t-remote-scry-subpaths) +- [`%u`: check if installed](../../urbit-os/kernel/gall/scry.md#u-check-if-installed) +- [`%w`: latest revision of path](../../urbit-os/kernel/gall/scry.md#w-latest-revision-of-path) +- [`%x`: remote scry file](../../urbit-os/kernel/gall/scry.md#x-remote-scry-file) +- [`%z:` hash of value at path](../../urbit-os/kernel/gall/scry.md#z-hash-of-value-at-path) + +An agent scry has the form `/=agent=/path/to/scry` and may accept any care. `%x` cares must include a terminal mark in the path, however. + +Gall also dispatches scries to agents' `+on-peek` arms. This takes place via `+mo-peek`→`+ap-peek`→`+on-peek`→`+ap-mule-peek`. + +See `+scry` for details of both. + +- [“Gall Scry Reference”](../../urbit-os/kernel/gall/scry.md) + + +## Treaty and Docket {#treaty-and-docket} + +There are two ways to distribute nouns over Ames today: + +1. Mark a desk `|public` and use Clay to directly synchronize desks. +2. Use the `/app/treaty` agent from `%landscape` to discover and install agents. + +Landscape (formerly Grid) is formally a Tlon product. It primarily consists of two agents (and associated marks, libraries, etc.): + +- `/app/treaty` handles publishing and advertising application desks. +- `/app/docket` handles retrieving, validating, and installing application desks. + +Together they query a remote `%treaty` instance to `+install` a particular desk. + +- Examine `$alliance` in `/sur/treaty`. +- Examine `+publish:so` and `+watch:tr` in `/app/treaty`. +- Examine `+install:ch` in `/app/docket`. + +`/app/treaty` in particular has pretty tight construction and I commend its style to you. + +- [“Software Distribution · Userspace”](../userspace/dist/software-distribution.md) + +### Updates {#updates} + +> When Gall receives a newly rebuilt agent from Clay, it calls the gate produced by the `+on-load` arm of the new agent with the state extracted from the old agent. If there is a crash in any `+on-load` calls or in the handling of any effects they emit (which can include more agent activations), then the whole event crashes, canceling the commit. This effectively gives any agent the ability to abort a commit by crashing. + +- Gall's `+call` arm receives a `%load` move with a noun of a core built by `+ford:clay`. + - `+mo-core` is a handle to `+mo` because no `+abet` is needed. + - `+mo-load` installs agents pretty mechanically, by simply `+skim`ming over the `%live` agents. + - `+mo-receive-core` checks whether the agent is running. + - If it is, then `+ap` is invoked to update the agent: + - `+ap-abed` + - `+ap-reinstall` + - `+on-save:ap-agent-core` + - `+ap-install` is the install wrapper. + - `+ap-upgrade-state` + - `+on-init:ap-agent-core` + - `+on-load:ap-agent-core` + - `+ap-abet` + - `+mo-clear-queue` flushes the blocked tasks pending for a new agent. + - If it isn't, then we have to create it in `+ap`: + - `+ap-abed` + - `+ap-upgrade-state` + - `+on-init:ap-agent-core` + - `+on-load:ap-agent-core` + - `+mo-idle` puts the agent to sleep if it's in the kill list (to be retired due to `desk/bill`). + - `+mo-abet` finalizes moves and state changes. + +Note that this is independent of Treaty and Docket once the remote desk has been installed. + + +## Agent Wrappers as Core Modifiers {#agent-wrappers-as-core-modifiers} + +.An agent wrapper (like `dbug`) is a tool to wrap additional handlers around an agent core. These can wrap the internal Gall agent with new functionality by catching pokes and other standard moves, then re-dispatching to the normal arms if no special behavior is needed. + +```hoon +> :my-agent +dbug [%state 'value'] +``` + +- Examine `/lib/dbug`. + - In particular, how does the `+on-poke` wrapper arm work? + - How does the `+dbug` generator work? + +> Using agent transformers to extend agents is a very nice conceptual pattern. But in practice, there are three pretty big problems with it: +> 1. You need to edit the agent code yourself. +> 2. Stateful transformers can break the agent. +> 3. The agent's world will also get transformed. + +- [“Debugging Wrapper”](../userspace/examples/dbug.md) +- [~wicrum-wicrun, “Gall Middleware (Assembly 2022”](https://www.quartus.co/blog/gall-middleware-wicrum-wicrun-assembly-2022) + + +## Dynamic Gall {#dynamic-gall} + +An earlier incarnation of Gall, dynamic Gall, specified its arms in terms of the names of the move coming back or the mark of the poke coming in. (This is what I learned on, way back when.) + +For instance, this agent was an earlier version of the egg timer app: + +```hoon +:: |start %egg +:: :egg ~s5 +|% ++$ effect (pair bone syscall) ++$ syscall [%wait path @da] +-- +|_ [bowl:gall ~] +++ poke-noun + |= t=@dr + ^+ [*(list effect) +>.$] + :_ +>.$ :_ ~ + [ost %wait /egg-timer (add now t)] +++ wake + |= [=wire error=(unit tang)] + ^+ [*(list effect) +>.$] + ~& "Timer went off!" + [~ +>.$] +-- +``` + + +## Exercises {#exercises} + +- Your assignment is to produce a minimalist Gall-like agent handler: a userspace framework for producing “toy” agent-like applications. Let's call them “scamps”. + + The scamp's state is defined in a `$state` block at the top of its file, e.g.: + + ```hoon + +$ state + $: scores=(list @) + hi-score=@ + == + ``` + + Scamps do not support state upgrades, so no version tag is provided. + + A scamp requires the following arms for the developer: + + ```hoon + |% + ++ on-init + ++ on-poke + ++ on-peek + -- + ``` + + You should be able to poke and peek into a scamp. It has no subscription model. + + A scamp specification file is NOT implicitly chained with a running `=>` tisgal. Compose explicitly. + +As a final aside, I believe that building an `%aqua`/`%pyro` testbed along the lines of Gall should also be feasible for you at this point. diff --git a/docs/core-academy/ca12.md b/docs/core-academy/ca12.md new file mode 100644 index 0000000..6932784 --- /dev/null +++ b/docs/core-academy/ca12.md @@ -0,0 +1,454 @@ +--- +description: "This lesson covers I/O with Khan, Arvo's thread runner vane, and Lick, Arvo's IPC vane." +layout: + title: + visible: true + description: + visible: false + tableOfContents: + visible: true + outline: + visible: true + pagination: + visible: true +--- + +# 13. Vanes VI: Khan, Lick + +*This lesson covers I/O with Khan, the thread-runner vane, and Lick, the IPC vane.* + +Khan and Lick are both interprocess communications vanes with slightly different philosophies. We will also discuss `conn.c`. In brief: + +1. `%khan` is a high-level thread interface, useful to both Arvo and external clients. +2. `%lick` is a low-level noun interface for domain sockets, treating Arvo as a server and earth software as a client. +3. `conn.c` provides full administrative control over Arvo and Vere. + + +## Khan & Threads {#khan-threads} + +Khan allows threads to be triggered from outside of Urbit. To start off with, what exactly is a thread? + +> A thread is a monadic function that takes arguments and produces a result. It may perform input and output while running, so it is not a pure function. +> A thread's strength is that it can easily perform complex IO operations. It uses what's often called the IO monad (plus the exception monad) to provide a natural framework for IO. +> A thread's weakness is that it's impermanent and may fail unexpectedly. In most of its intermediate states, it expects only a small number of events (usually one), so if it receives anything it didn't expect, it fails. When code is upgraded, it's impossible to upgrade a running thread, so it fails. + +Threads can be run using Gall's `%spider` agent or Khan. + +### `%spider`: Threads Before Khan {#spider-threads-before-khan} + +Arvo is an event handler for OS-level moves for vanes. Gall is an event handler for agent-level moves. Spider is an agent for transient thread-level operations. + +#### Thread Definition + +Ultimately, a `$thread` is a gate which accepts a `vase` and returns the `form` of a `strand` that produces a `vase`. In other words, the `$thread` doesn't (just) produce a result, it produces a strand that takes input and produces output from which a result can be extracted. This allows threads to chain friable computations together until a `%thread-done` is produced. + +```hoon ++$ thread $-(vase _*form:(strand ,vase)) +``` + +- `+form` is the mold of the strand. It weaves together the notions of input and output thus: + +```hoon +++ form (strand-form-raw a) +++ strand-form-raw + |* a=mold + $-(strand-input (strand-output-raw a)) ++$ strand-input [=bowl in=(unit input)] +++ strand-output-raw + |* a=mold + $~ [~ %done *a] + $: cards=(list card) + $= next + $% [%wait ~] + [%skip ~] + [%cont self=(strand-form-raw a)] + [%fail err=(pair term tang)] + [%done value=a] + == + == +``` + +`+strand` is more complicated. It's an “asynchronous transaction mold”, which is basically a union of four different monads. It's a wet gate producing a core from a mold. + +1. Reader for input. +2. Writer for cards. +3. Continuation for callbacks. +4. Exception. + +This gate also produces a number of critical handlers, such as: + +- `+form` is the main type of a strand computation. +- `+pure` is an identity computation, useful for binding. +- `+bind` is a combination of two computations. +- `+eval` maintains the monadic nature of the computation. + +A simple thread (like `/ted/code`) is simply a wrapper for some check like a scry: + +```hoon +/- spider +/+ strandio +=, strand=strand:spider +^- thread:spider +|= arg=vase +=/ m (strand ,vase) +^- form:m +;< =bowl:spider bind:m get-bowl:strandio +;< code=@p bind:m (scry:strandio @p /j/code/(scot %p our.bowl)) +(pure:m !>(code)) +``` + +- Read this thread with new eyes about the types involved. + +[`;<` micgal](../../hoon/rune/mic.md#micgal) serves to permit a sequence of computations in which each one depends on the output of the previous one. + +```hoon +;< mold bind expr1 expr2 +``` + +which desugars to: + +```hoon +%+ (bind mold) + expr1 +|= mold +expr2 +``` + +> `;<` can be used to glue a pipeline together to run an asynchronous function or event. This can be helpful when deferring parts of a computation based on external data. + +The `main-loop` pattern provides a way of providing a list of functions to try a value against, and seems like an interesting way of handling an arbitrary number of `%facts`. + +Threads can trigger daughter threads. `+handle-start-thread` does this by modifying Spider's thread `yarn`, but you need to see Spider now. + +- [“Fetch JSON”](../../urbit-os/base/threads/examples/get-json.md) +- [“Child Thread”](../../urbit-os/base/threads/examples/child-thread.md) +- [“Main Loop”](../../urbit-os/base/threads/examples/main-loop.md) +- [“Poke Agent”](../../urbit-os/base/threads/examples/poke-agent.md) +- [“Scry”](../../urbit-os/base/threads/examples/scry.md) +- [“Take Fact”](../../urbit-os/base/threads/examples/take-fact.md) + +#### `/app/spider` + +`/app/spider` tracks threads at the highest level using a “spider core” and a state manager. + +```hoon ++$ card card:agent:gall ++$ thread $-(vase shed:khan) ++$ tid @tatid ++$ input [=tid =cage] ++$ yarn (list tid) ++$ thread-form _*eval-form:eval:(strand ,vase) ++$ trying ?(%build %none) +:: ++$ state + $: starting=(map yarn [=trying =vase]) + running=(axal thread-form) + tid=(map tid yarn) + serving=(map tid [(unit @ta) =mark =desk]) + scrying=(jug tid [=wire =ship =path]) + == +``` + +Fundamentally, each thread is an invocation of a list of thread IDs and their startup state, threads currently in progress, and some handlers for remote scries, etc. + +- `starting` is the collection of threads pending successful execution. +- `running` contains currently-running threads identified by path. +- `tid` is a map for tracking child threads. +- `serving` has to do with the [HTTP API](../../urbit-os/base/threads/http-api.md) for threads, allowing you to use the Urbit ship like a function-as-a-service server. +- `scrying` is a map of sets of remote scries. + +Threads use a set of internal mark conventions (notably `%thread-done` and `%thread-fail`). + +```hoon +++ strand-output-raw + |* a=mold + $~ [~ %done *a] + $: cards=(list card) + $= next + $% [%wait ~] + [%skip ~] + [%cont self=(strand-form-raw a)] + [%fail err=(pair term tang)] + [%done value=a] + == + == +``` + +- `cards` is the set of cards to dispatch immediately. +- `%thread-wait` means to not move on but to stay awaiting a callback. +- `%thread-skip` is a drop because this should be handled elsewhere. +- `%thread-cont` means to continue the computation from a new callback. +- `%thread-fail` aborts a computation and doesn't send effects. +- `%thread-done` finishes a computation and sends effects. + +The `%spider-helper` core has all the logic to handle HTTP, start and conclude threads, build code, handle input, etc. For instance: + +```hoon +++ thread-done + |= [=yarn =vase silent=?] + ^- (quip card ^state) + :: %- (slog leaf+"strand {<yarn>} finished" (sell vase) ~) + =/ =tid (yarn-to-tid yarn) + =/ done-cards=(list card) + :~ [%give %fact ~[/thread-result/[tid]] %thread-done vase] + [%give %kick ~[/thread-result/[tid]] ~] + == + =^ http-cards state + (thread-http-response tid vase) + =^ scry-card state (cancel-scry tid silent) + =^ cards state (thread-clean yarn) + [:(weld done-cards cards http-cards scry-card) state] +``` + +It's not a proper `+abet` core. + +Spider supports a few auxiliary scries to monitor thread state, such as the set of currently running threads: + +```hoon +.^((list path) %gx /=spider=/tree/noun) +``` + +You can only subscribe to Spider for thread results. + +- Look at `/lib/strand`. What surprises you? + +- [“Spider API”](../../urbit-os/base/threads/api.md) + +### A New Interface {#a-new-interface} + +> Khan is the "control plane" and thread-runner vane. Its main purpose is to allow external applications to run [threads](../../urbit-os/base/threads) via a Unix Socket and receive the result. + +Khan was conceived as a way to control Urbit ships from the exterior using threads. The concept evolved a fair bit from proposal to implementation. In practice, Khan is essentially an interface wrapper for Spider-based threads, which produces a somewhat strange (but not unprecedented) situation in which a vane relies on a piece of userspace infrastructure to function correctly. + +Khan can be internally invoked (using a `cage`) or externally invoked (using a `page`). + +#### `/sys/lull` Definition + +```hoon +:: :::: +:::: ++khan :: (1i) threads + :: :::: +++ khan ^? + |% + +$ gift :: out result <-$ + $% [%arow p=(avow cage)] :: in-arvo result + [%avow p=(avow page)] :: external result + == :: + +$ task :: in request ->$ + $~ [%vega ~] :: + $% $>(%born vane-task) :: new unix process + [%done ~] :: socket closed + [%fard p=(fyrd cage)] :: in-arvo thread + [%fyrd p=(fyrd cast)] :: external thread + [%lard =bear =shed] :: inline thread + $>(%trim vane-task) :: trim state + $>(%vega vane-task) :: report upgrade + == :: + :: :: + ++ avow |$ [a] (each a goof) :: $fyrd result + +$ bear $@(desk beak) :: partial $beak + +$ cast (pair mark page) :: output mark + input + ++ fyrd |$ [a] [=bear name=term args=a] :: thread run request + :: :: + +$ shed _*form:(strand:rand ,vase) :: compute vase + -- ::khan +``` + +- [“Khan”](../../urbit-os/kernel/khan) + +While `%khan` hasn't been thoroughly documented yet (we expect some minor API changes, such as the more recent addition of [inline thread invocation](https://github.com/urbit/urbit/pull/5981)), there are examples of its use in ~midsum-salrux's [Tendiebot price bot](https://github.com/midsum-salrux/tendiebot/blob/master/desk/tendiebot/app/tendiebot.hoon) and [Faux Urbit–Discord bridge](https://github.com/midsum-salrux/faux). + +The basic conceit of Khan is that it instruments three ways to run a thread: + +- `%fard` runs a thread from within Arvo directly. +- `%fyrd` runs a thread from outside Arvo (a connexion with the runtime). +- `%lard` runs an inline thread (rather than from `/ted`). + +A `%fard` has the form: + +```hoon +:* %pass + /path-name :: path + %arvo %k %fard :: Arvo vane and %khan mode + %namespace :: desk? + %thread-name :: /ted/thread-name.hoon + %noun :: mark (always %noun for now) + !> :* :: thread arguments: + bowl :: bowl (entropy etc.) + other-info :: other arguments for thread + == +== +``` + +A `%lard` has the form: + +```hoon +=strandio -build-file %/lib/strandio/hoon +=sh |= message=@t + =/ m (strand:rand ,vase) + ;< ~ bind:m (poke:strandio [our %hood] %helm-hi !>('hi')) + ;< ~ bind:m (poke:strandio [our %hood] %helm-hi !>(message)) + (pure:m !>('product')) +|pass [%k %lard %base (sh 'the message')] +``` + +Since `/sys/vane/khan` is a vane, you receive its gifts in `+on-arvo`. + +- `[%arow p=(avow cage)]` is received in userspace. Note that it is a `cage`, or a pair of mark and vase. +- `[%avow p=(avow page)]` can only be received by an external process. It is a `page`, or a pair of mark and (unvased) data. + +Compare Spider and Khan: + +```hoon +:_ this +:~ [%pass /thread/[ta-now] %agent [our.bowl %spider] %watch /thread-result/[tid]] + [%pass /thread/[ta-now] %agent [our.bowl %spider] %poke %spider-start !>([~ `tid byk.bowl %foo !>(~)])] +== +:: +:_ this +:~ [%pass /thread[ta-now] %arvo %k %fard q.byk.bowl %foo %noun !>([bowl ~])] +== +``` + +As a vane, `/sys/vane/khan` is almost as simple as a vane can be: it simply `+call`s tasks and `+take`s gifts from Spider to dispatch back to its caller. + +Khan currently supports no scries. + +- [“Khan: API Reference”](../../urbit-os/kernel/khan/tasks.md) +- [“Developer Call: The Future of `%khan`”](https://www.youtube.com/watch?v=cdSFvFNFqpI) + +Speculatively, I believe that producing an improved CLI predicated on thread execution is feasible today on Urbit. Imagine a context which can dispatch moves either batched or singly, and queue return cards for processing. + +In fact, although the vane evolved from its initial conception, Khan was originally proposed under the theory that pre-written threads would be the easiest way to bundle, distribute, and manage scripts for hosting and maintenance. + + +## Lick {#lick} + +Although also dealing with interprocess communication, Lick was designed for a very different scenario than Khan: to allow external processes, in particular hardware drivers, to intercommunicate with Urbit. (This breached the Earth/Mars divide.) Thus `/sys/vane/lick` focuses on instrumenting a low-level noun interfaces over domain sockets. + +> Lick manages IPC ports, and the communication between Urbit applications and POSIX applications via these ports. Other vanes and applications ask Lick to open an IPC port, notify it when something is connected or disconnected, and transfer data between itself and the Unix application. + +Lick works by opening a Unix socket for a particular process, which allows serialized IPC communications. These involve a jammed noun so the receiving process needs to know how to communicate in nouns. + +> The IPC ports Lick creates are Unix domain sockets (`AF_UNIX` address family) of the `SOCK_STREAM` type. + +The connexions are made via filepaths in `.urb/dev` of the pier. + +The format is: + +``` +V.BBBB.JJJJ.JJJJ... +``` + +- `V` version +- `B` jam size in bytes (little endian) +- `J` jammed noun (little endian) + +> The process on the host OS must therefore strip the first 5 bytes, [`+cue`](../../hoon/stdlib/2p.md#cue) the jamfile, check the mark and (most likely) convert the noun into a native data structure. + +### `/sys/lull` Definition {#syslull-definition} + +```hoon +:: :::: +:::: ++lick :: (1j) IPC + :: :::: +++ lick ^? + |% + +$ gift :: out result <-$ + $% [%spin =name] :: open an IPC port + [%shut =name] :: close an IPC port + [%spit =name =mark =noun] :: spit a noun to the IPC port + [%soak =name =mark =noun] :: soak a noun from the IPC port + == + +$ task :: in request ->$ + $~ [%vega ~] :: + $% $>(%born vane-task) :: new unix process + $>(%trim vane-task) :: trim state + $>(%vega vane-task) :: report upgrade + [%spin =name] :: open an IPC port + [%shut =name] :: close an IPC port + [%spit =name =mark =noun] :: spit a noun to the IPC port + [%soak =name =mark =noun] :: soak a noun from the IPC port + == + :: + +$ name path + -- ::lick +``` + +To evaluate what `/sys/vane/lick` is doing, we need to look at Unix's IPC model briefly. IPC (“interprocess communication“) describes any way that two processes in an operating system's shared context have to communicate with each other. Lick focuses on [Unix domain sockets](https://en.wikipedia.org/wiki/Unix_domain_socket), which are just [communication endpoints](https://man7.org/linux/man-pages/man7/unix.7.html). +For instance, a valid use of `%lick` would use cards that look like this: + +```hoon +++ init [[%pass / %arvo %l %spin /control]~ this] +:: +++ on-arvo + |= [=wire =sign-arvo] + ?+ sign-arvo (on-arvo:def wire sign-arvo) + [%lick %soak *] + ?+ mark.sign-arvo [~ this] + :: + %connect + ~& > "connect" + :_ this [%pass /spit %arvo %l %spit /control %init area.state]~ + == == +:: +++ send-state + |= =state + ^- card:agent:gall + [%pass /spit %arvo %l %spit /control %state [slick:state face.state food.state live.state]] +``` + +- [~mopfel-winrux, `%slick`](https://github.com/mopfel-winrux/slick) + +The vane definition of `/sys/vane/lick` is even simpler than `/sys/vane/khan`: it has no `+abet` core and primarily communicates to the `unix-duct` in its state. The `owner` is a `duct` to handle the return `%soak`. + +Lick takes several scries: + +- [`%a` - Read ports](../../urbit-os/kernel/lick/scry.md#a---read-ports) +- [`%d` - Port owner](../../urbit-os/kernel/lick/scry.md#d---port-owner) +- [`%u` - Port existance](../../urbit-os/kernel/lick/scry.md#u---port-existance) + +Gall needs to wrap `%soak` and `%spit` to route properly. See e.g. `+ap-generic-take`. This lets multiple agents share sockets with the same name, and each agent can have its own folder. + +### `vere/io/lick.c` {#vereiolickc} + +The hardware counterpart of `/sys/lick` is contained in `vere/io/lick.c` aside from its callback registration. As with other parts of the runtime event loop and callback system, the primary connexion is made using `libuv`, in this case via an instance of a [`uv_pipe_t`](https://docs.libuv.org/en/v1.x/pipe.html) descriptor. + +- `_lick_ef_spit()` + - `_lick_send_noun()` + - `u3_newt_send()` +- `_lick_sock_cb()`, callback for connection from Earth. +- `_lick_moor_poke`, result of `%soak` from external process. + + +## `conn.c` {#connc} + +[`conn.c`](https://github.com/urbit/vere/blob/develop/pkg/vere/io/conn.c) is a driver in Vere. It is a part of the "King" (a.k.a. "Urth") process. It exposes a [Unix domain socket](https://en.wikipedia.org/wiki/Unix_domain_socket) at `/path/to/pier/.urb/conn.sock` for sending/receiving data from external processes. The point of `conn.c` is to provide administrative control over Arvo and Vere: read ephemeral or persistent state, enqueue events, and send arbitrary commands (pack, meld, mass, &c). + +(`conn.c` is only loosely related to `/sys/khan`. Its main connection is special-casing some inputs for Khan.) + +`conn.c` accepts a newt-encoded `+jam`med noun of the shape `[request-id command arguments]`, where: + +- `request-id` is a client-supplied atomic identifier with type `@`. It exists entirely for the benefit of the client, allowing responses to be matched to requests. (The poor Earthling's wire.) +- `command` is one of: + - `%peek`, namespace scry request into Arvo. + - `%peel`, emulated namespace scry request into Vere. + - `%ovum`, injection of a raw kernel move. + - `%fyrd`, direct shortcut to Khan command. + - `%urth`, subcommand to runtime to `%pack` or `%meld`. + +See particularly: + +- `_conn_moor_poke()` for the main message dispatcher. +- `_conn_peek_cb()` for the peek handler. + - `_conn_send_noun()` +- `_conn_read_peel()` for the `%peel` handler. + +- [“`conn.c` Usage Guide”](https://github.com/urbit/tools/wiki/conn.c-Usage-Guide) +- [Click](https://github.com/urbit/tools/tree/master/pkg/click) + + +## Exercises {#exercises} + +- Run these valid commands on a fakeship from the outside (following examples in the `conn.c` usage guide). + - Pack, meld, OTA, install, code, vats diff --git a/docs/core-academy/ca13.md b/docs/core-academy/ca13.md new file mode 100644 index 0000000..b5c6b5d --- /dev/null +++ b/docs/core-academy/ca13.md @@ -0,0 +1,525 @@ +--- +description: "A look at Urbit's networking key infrastructure. Covers the Jael vane, the %eth-watcher agent, and Azimuth's Layer 2 rollup solution." +layout: + title: + visible: true + description: + visible: false + tableOfContents: + visible: true + outline: + visible: true + pagination: + visible: true +--- + +# 14. Vanes VII: Jael, Azimuth + +_This final lesson looks at Urbit's networking key infrastructure, including the Jael vane, `%eth-watcher` agent, and Azimuth's Layer 2 rollup._ + +## Azimuth <a href="#azimuth" id="azimuth"></a> + +Azimuth is the “Urbit ID” component of the system. It allows ships to securely communicate over Ames as peers, either using galaxy-based routing (today) or a direct route (if known from previous contact). + +> Each ship in a two-way conversation computes the shared symmetric key for that conversation by computing the product of their own private key and the public key of the other party. + +Azimuth is a [“public key infrastructure“](https://en.wikipedia.org/wiki/Public_key_infrastructure), meaning that binds public keys to entity identities, with each entity holding its own private key. For each point, Azimuth needs to know three things: + +1. Identity (point) +2. Life (key revision number) +3. Rift (continuity number) + +> These values are utilized by [Ames](../../urbit-os/kernel/ames/) and [Jael](../../urbit-os/kernel/jael/) to ensure that communication between ships is always done with the most recent set of networking keys, and that networking state is appropriately reset when a breach has occurred. + +* [“Life and Rift”](../../urbit-id/life-and-rift.md) + +Today, Azimuth is instantiated as an NFT, technically an [ERC-721](https://ethereum.org/en/developers/docs/standards/tokens/erc-721/) token. The original Azimuth was “Layer 1” on Ethereum, meaning that the transactions and Azimuth state are directly verified by the Ethereum Virtual Machine. More recently, a “Layer 2” was added, which permits batched transactions very cheaply at the cost of eliding the EVM check (and external visibility). + +Azimuth is strictly under the governance of the Galactic Senate and so the Ethereum state could be overridden. “Sovereign is he who decides the exception” (Carl Schmitt), and this provides an explicit safety valve against a chain state or contract exploit against Ethereum. (This would be controversial, I imagine, and it could break Azimuth-on-Ethereum, but there are circumstances under which it would make sense.) + +> The Senate has no ability to touch this data directly. This is in direct contrast to all existing centralized services, where your account is always at risk of being taken away from you. What the Senate can change is the “business logic” that decides how you can interact with the data in Azimuth. These are mechanisms such as what powers various proxies have, how stars/planets are released over time, and how sponsorship works. + +We will not discuss details of Layer 2 points other than to note that from Jael's perspective they behave the same as L1 points. + +### Identity <a href="#identity" id="identity"></a> + +You are doubtless familiar with the fivefold hierarchical nature of Urbit ID: + +| **Type** | **Span** | **Number** | +| -------- | --------------------------------------------------------------------- | ---------------- | +| Galaxy | `0x0`–`0xff` | `256` | +| Star | `0x100`–`0xffff` | `65.280` | +| Planet | `0x1.0000`–`0xffff.ffff` | `4.294.901.760` | +| Moon | `0x1.0000.0000`–`0xffff.ffff.ffff.ffff` | $2^{64}-2^{32}$ | +| Comet | `0x1.0000.0000.0000.0000`–`0xffff.ffff.ffff.ffff.ffff.ffff.ffff.ffff` | $2^{128}-2^{64}$ | + +(The counts are not generally even powers of two because of the carveout for the superior points.) + +At various points in the past, these points had different names: + +| **Type** | \*\*Feudal Type I \*\* (fort) | **Feudal Type II** (fort) | **Maritime Type** (ship) | +| -------- | ----------------------------- | ------------------------- | ------------------------ | +| Galaxy | duke | czar | carrier | +| Star | earl | king | cruiser | +| Planet | lord | duke | destroyer | +| Moon | pawn | earl | yacht | +| Comet | wolf | pawn | submarine | + +(\~zod was originally pope in the feudal terminology.) + +Within Arvo, these are still baked into the core rank types, so you need occasionally to know the second set of feudal types: + +```hoon ++$ rank ?(%czar %king %duke %earl %pawn) :: ship width class +++ clan :: ship to rank + |= who=ship + ^- rank + =/ wid (met 3 who) + ?: (lte wid 1) %czar + ?: =(2 wid) %king + ?: (lte wid 4) %duke + ?: (lte wid 8) %earl + ?> (lte wid 16) %pawn +``` + +```hoon +> (sein:title our now our) +~zod + +> (clan:title our) +%czar +``` + +The `+title` core in `/sys/zuse` provides identity-related functions. Saliently: + +* `+sein` for the actual sponsor of a particular point. (We distinguish because of emancipation.) +* `+clan` for the rank. +* `+name` for the responsible point (that point unless a moon). +* `+saxo` for the on-Azimuth sponsorship chain. + +```hoon +> (saxo:title our now ~middev-middev) +[i=~middev-middev t=[i=~nodsel t=~[~sel]]] +``` + +#### Galaxies + +Galaxies have a public key visible in Azimuth PKI. Their IP address is loaded on boot from a galaxy table supplied by Urbit (as organization). Then each ship can build the sponsorship chain it needs by construction. + +#### Stars, Planets + +Other points’ public keys in Azimuth PKI are obtained from Ethereum and stored in Jael, along with its sponsor, life, and rift. + +#### Moons + +Moons do not have an independent Azimuth presence, but rely on their parent ship's Jael to store their life and rift values. + +#### Comets + +Comets do not have a registration in Azimuth. They cannot cycle their networking keys, and they cannot breach, so their life and rift are always `0`. + +* [“Comet self-attestation”](../../urbit-os/kernel/ames/cryptography.md#comets) + +### Ecliptic <a href="#ecliptic" id="ecliptic"></a> + +There are two contract addresses for Azimuth: + +* [`Azimuth.eth`](../../urbit-id/azimuth-eth.md) is the contract data store containing the actual state of Azimuth on Ethereum. +* [`Ecliptic.eth`](../../urbit-id/ecliptic.md) contains the operational logic (transfers, unlocking, etc.). It is separate from `Azimuth.eth` so that the contracts can be upgraded without losing chain state. + +Ecliptic contracts are written in Solidity, the EVM contract programming language. + +> * **Azimuth**: contains all on-chain state for azimuth. Most notably, ownership and public keys. Can't be modified directly, you must use the Ecliptic. +> * **Ecliptic**: is used as an interface for interacting with your points on-chain. Allows you to configure keys, transfer ownership, etc. +> * **Polls**: registers votes by the Galactic Senate on proposals. These can be either static documents or Ecliptic upgrades. +> * **Linear Star Release**: facilitates the release of blocks of stars to their owners over a period of time. +> * **Conditional Star Release**: facilitates the release of blocks of stars to their owners based on milestones. +> * **Claims**: allows point owners to make claims about (for example) their identity, and associate that with their point. +> * **Censures**: simple reputation management, allowing galaxies and stars to flag points for negative reputation. +> * **Delegated Sending**: enables network-effect like distributing of planets. +> * **Planet Sale**: gives an example of a way in which stars could sell planets on-chain. + +* Demonstrate using the `READ` functions with [Etherscan](https://etherscan.io/address/ecliptic.eth#readContract). +* Read [`PlanetSale.sol`](https://github.com/urbit/azimuth/blob/master/contracts/PlanetSale.sol). + +> There is no limit for galaxies. Instead, for most galaxies, all stars have already been spawned and placed into one of the lockup contracts: [Linear Star Release↗](https://etherscan.io/address/0x86cd9cd0992f04231751e3761de45cecea5d1801) and [Conditional Star Release↗](https://etherscan.io/address/0x8c241098c3d3498fe1261421633fd57986d74aea).\ +> Beginning in 2019, stars may spawn at most 1024 planets. This limit doubles every subsequent year until the maximum is reached. + +* [`Ecliptic.eth` contract reader](https://etherscan.io/address/0x86cd9cd0992f04231751e3761de45cecea5d1801#readContract) +* [“Azimuth's First Contract Upgrade”](https://urbit.org/blog/first-contract) +* [`urbit/azimuth`](https://github.com/urbit/azimuth) + +### Bridge <a href="#bridge" id="bridge"></a> + +Bridge is the primary interface to execute Ecliptic contracts. + +* [“Using Bridge”](../../user-manual/id/using-bridge.md) + +### Urbit HD Wallet <a href="#urbit-hd-wallet" id="urbit-hd-wallet"></a> + +A hierarchical deterministic (HD) wallet is a system of related key pairs with distinct delegated powers, à la [BIP-32](https://en.bitcoin.it/wiki/BIP_0032). A master key pair is used to deterministically generate subsidiary key pairs with specific capabilities on the chain. + +As a point owner, you receive some means of claiming the point initially—a claim code or email or simply a transfer to an Ethereum address that you must accept. From that, you can either hold the point via the ownership address on Ethereum or via the master ticket (with seed `@q`). + +(A `@q` is like a `@p` in that it is syllabic, but has no limitation on value or range unlike that implied by `@p`.) + +The holder of the main ownership address and key pair can designate proxies who can only run certain Ecliptic transactions. + + + +The key pairs are: + +* Ownership +* Transfer Proxy (set on a temporary basis to make transferring a point a two-step process less prone to error) +* Spawn Proxy (only galaxies and stars; can spawn new child identities) +* Management Proxy (planets, stars, and galaxies; can set life and rift, and manage sponsorship; receives boot keyfile) +* Voting Proxy (only galaxies) + +The `/app/claz` command-line tool lets you perform specific sophisticated actions on L1 points from a fake ship. This provides an unmediated interface to Ecliptic. + +* Read through `/app/claz`, `/ted/claz/prep-command`, and in particular `/lib/claz`. + +Like `sudo`, this is riskier but direct. + +* [“Urbit HD Wallet”](../../urbit-id/hd-wallet.md) +* [“HD Wallet (Master Ticket)”](../../user-manual/id/hd-wallet.md) +* [“HD Crypto Wallets: What Are They?” (Gemini)](https://www.gemini.com/cryptopedia/hd-crypto-wallets-hierachichal-deterministic) +* [“Operator's Manual: Proxies”](../../user-manual/id/proxies.md) +* [“Advanced Azimuth Tools”](../../urbit-id/advanced-azimuth-tools.md) + +### Breaching/Factory Resets <a href="#breachingfactory-resets" id="breachingfactory-resets"></a> + +What is a breach? + +> Ships on the Ames network sometimes need to reset their continuity. A factory reset (hereafter just called a _reset_) is when an individual ship announces to the network: "I forgot who I am, let's start over from scratch." That is, it clears its own event log and sends an announcement to the network, asking all ships that have communicated with it to reset its networking information in their state. This makes it as though the ship was just started for the first time again, since everyone on the network has forgotten about it. + +We call them “factory resets” for the public, but internally it's all `%breach`. + +* See `/sys/vane/ames:++on-publ-breach`. + +Since a breach happens to a ship, it cannot know about its own breach. + +* [“Guide to Factory Resets”](../../user-manual/id/guide-to-resets.md) + +## Jael <a href="#jael" id="jael"></a> + +Jael stores Azimuth information. `/sys/vane/jael` is intimately related to Azimuth. Azimuth represents what we may call the exoteric view of ownership, while Jael is the esoteric view. Jael’s primary role is ship networking management and cryptography, but it also supports promises. + +`/sys/vane/jael` segregates state into two categories: absolute and relative. Absolute state refers to what is known about the Azimuth PKI, ship ownership, private keys, etc. Since not every ship is live on the network (such as a fakezod), there is also a notion of relative state, referring to what is known about the current ship only. + +Jael's code has Azimuth hardcoded into it, but could be made to talk to a different agent pretty easily, as long as it communicates key change & sponsorship events in the same way. + +### `/sys/lull` Definition <a href="#syslull-definition" id="syslull-definition"></a> + +Jael defines a lot of different types to support what different parts of the system need to see. (For instance, `/app/eth-watcher` needs to communicate with `/sys/vane/jael`.) + +#### Tasks and Gifts + +```hoon ++$ gift :: out result <-$ + $% [%done error=(unit error:ames)] :: ames message (n)ack + [%boon payload=*] :: ames response + :: + [%private-keys =life vein=(map life ring)] :: private keys + [%public-keys =public-keys-result] :: ethereum changes + :: + [%turf turf=(list turf)] :: domains + == :: ++$ task :: in request ->$ + $~ [%vega ~] :: + $% [%dawn dawn-event] :: boot from keys + [%fake =ship] :: fake boot + :: + [%listen whos=(set ship) =source] :: set ethereum source + [%meet =ship =life =pass] :: met after breach + [%nuke whos=(set ship)] :: cancel tracker from + [%ruin ships=(set ship)] :: pretend breach + :: + [%private-keys ~] :: sub to privates + [%public-keys ships=(set ship)] :: sub to publics + [%rekey =life =ring] :: update private keys + [%resend ~] :: resend private key + :: + [%turf ~] :: view domains + :: + [%moon =ship =udiff:point] :: register moon keys + [%step ~] :: reset web login code + :: + $>(%trim vane-task) :: trim state + $>(%vega vane-task) :: report upgrade + $>(%plea vane-task) :: ames request + == :: +``` + +* `%private-keys` are used by Ames. +* `%public-keys` are used in a few places where messages need to be decrypted, e.g. Clay and `/app/ping`. +* `%turf` relates to domains for real ships. + +#### Organization + +`/sys/vane/jael` is not a particularly complicated vane. + +`lex` is the durable state of Jael: + +```hoon +=| lex=state-2 ++$ state-2 + $: %2 + pki=state-pki-2 :: + etn=state-eth-node :: eth connection state + == :: +``` + +`$state-pki` is the known PKI state: + +```hoon ++$ state-pki-2 :: urbit metadata + $: $= own :: vault (vein) + $: yen=(set duct) :: trackers + sig=(unit oath) :: for a moon + tuf=(list turf) :: domains + fak=_| :: fake keys + lyf=life :: version + step=@ud :: login code step + jaw=(map life ring) :: private keys + == :: + $= zim :: public + $: yen=(jug duct ship) :: trackers + ney=(jug ship duct) :: reverse trackers + nel=(set duct) :: trackers of all + dns=dnses :: on-chain dns state + pos=(map ship point) :: on-chain ship state + == :: + == +``` + +`$state-eth-node` is the state of a connexion to an Ethereum node. + +```hoon ++$ source (each ship term) ++$ source-id @udsourceid +:: +:: +state-eth-node: state of a connection to an ethereum node +:: ++$ state-eth-node :: node config + meta + $: top-source-id=source-id + sources=(map source-id source) + sources-reverse=(map source source-id) + default-source=source-id + ship-sources=(map ship source-id) + ship-sources-reverse=(jug source-id ship) + == :: +``` + +There are a number of ancillary cores that track the public-key/private-key state. + +For the most part, you won't need to directly use Jael when working with Gall agents unless you are carrying out Azimuth operations. + +There is a standard vane interface core which dispatches to an engine. + +The main body of `/sys/vane/jael` has several engines: + +* `+ez` Ethereum wallet algebra +* `+of` main engine for top-level semantics. +* `+su` subjective engine for all derived state, subscriptions, and actions. + +The `+feel` core handles tracking and updating public keys. + +* Trace an Azimuth update from `%fact` via `+new-event` to `+feel`'s operations. + +Parenthetically, `/sys/vane/jael` uses a unique pattern for invoking `+abet`: + +```hoon +%- curd =< abet +(private-keys:~(feel su hen now pki etn) life.tac ring.tac) +``` + +#### Scries + +Jael takes no cares, so all information is exposed via the path. Oddly, rather than a desk name, Jael expects an operation type. (This is a very old scry pattern and should be cleaned up.) + +Reveal your web login code: + +```hoon +.^(@p %j /=code=/(scot %p our)) +``` + +Query who the ship's sponsor is: + +```hoon +.^(@p %j /=sein=/~sampel-palnet) +``` + +Query the sponsorship chain: + +```hoon +.^((list @p) %j /=saxo=/~sampel-palnet) +``` + +Get current state of subscriptions to public key updates; this won't have much to say if you are on a fakeship: + +```hoon +.^([yen=(jug duct ship) ney=(jug ship duct) nel=(set duct)] %j /=subscriptions=/1) +``` + +* Examine the [`%step` exercise](../../urbit-os/kernel/jael/examples.md#step) in the Jael docs. + +### Azimuth Data Flow <a href="#azimuth-data-flow" id="azimuth-data-flow"></a> + +Jael does not talk to Ethereum/Azimuth directly. It derives its knowledge of Azimuth as a PKI from some particular source. In the default case, this is `/app/eth-watcher` and friends. + +> By default, Jael's primary source of information is the Gall agent `%azimuth`, which (using `%eth-watcher`) runs a thread that polls an Ethereum node for transactions in the Azimuth contract. It can also get updates from other ships on the network. (This is always the case for moons, where it subscribes to the moon's parent's Jael for updates.) + +`/app/azimuth` is the agent responsible for subscribing to Azimuth transactions from `/app/eth-watcher`. (It unrolls L2 transactions as well.) It exposes this data via scries and subscription endpoints. + +`/app/eth-watcher` is actually a generic Ethereum state listener. It works for Azimuth because what Jael needs is conveyed via `/app/azimuth`. + +A few relevant types, eliding the point algebra: + +```hoon +:: :::: +:::: ++jael :: (1h) security + :: :::: +++ jael ^? + |% + +$ public-keys-result + $% [%full points=(map ship point)] + [%diff who=ship =diff:point] + [%breach who=ship] + == + :: + ++ block + =< block + |% + +$ hash @uxblockhash + +$ number @udblocknumber + +$ id [=hash =number] + +$ block [=id =parent=hash] + -- + :: + ++ point + =< point + |% + +$ point + $: =rift + =life + keys=(map life [crypto-suite=@ud =pass]) + sponsor=(unit @p) + == + -- + -- :: jael +``` + +* [“Azimuth Data Flow: `%eth-watcher`”](../../urbit-id/flow.md#eth-watcher) +* [“Azimuth Data Flow: `naive.hoon`](../../urbit-id/flow.md#naive) + +#### `/sys/lull` Shared Types + +```hoon +:: :::: +:::: ++ethereum-types :: eth surs for jael + :: :::: +++ ethereum-types + |% + ++ address @ux :: ethereum address, 20 bytes. + +$ event-id [block=@ud log=@ud] :: event location + ++ events (set event-id) + -- +:: :::: +:::: ++azimuth-types :: az surs for jael + :: :::: +++ azimuth-types + =, ethereum-types + |% + ++ point + $: :: ownership + $= own + $: owner=address + management-proxy=address + voting-proxy=address + transfer-proxy=address + == + :: + :: networking + $= net + %- unit + $: =life + =pass + continuity-number=@ud + sponsor=[has=? who=@p] + escape=(unit @p) + == + :: + :: spawning + $= kid + %- unit + $: spawn-proxy=address + spawned=(set @p) + == + == + :: + +$ dnses [pri=@t sec=@t ter=@t] + :: + ++ diff-azimuth + $% [%point who=@p dif=diff-point] + [%dns dnses] + == + :: + ++ diff-point + $% [%full new=point] :: + [%owner new=address] :: OwnerChanged + [%activated who=@p] :: Activated + [%spawned who=@p] :: Spawned + [%keys =life =pass] :: ChangedKeys + [%continuity new=@ud] :: BrokeContinuity + [%sponsor new=[has=? who=@p]] :: EscapeAcc/LostSpons + [%escape new=(unit @p)] :: EscapeReq/Can + [%management-proxy new=address] :: ChangedManagementPro + [%voting-proxy new=address] :: ChangedVotingProxy + [%spawn-proxy new=address] :: ChangedSpawnProxy + [%transfer-proxy new=address] :: ChangedTransferProxy + == + -- + :: :: + :::: ++pki:jael :: (1h2) certificates + :: :::: + ++ pki ^? + |% + +$ hand @uvH :: 128-bit hash + +$ mind [who=ship lyf=life] :: key identifier + +$ name (pair @ta @t) :: ascii / unicode + +$ oath @ :: signature + ++ tale :: urbit-signed * + |$ [typ] :: payload mold + $: dat=typ :: data + syg=(map ship (pair life oath)) :: signatures + == :: + -- :: pki +``` + +* What is a change on Azimuth? (See `+diff-point`.) + +### Bootstrapping <a href="#bootstrapping" id="bootstrapping"></a> + +Jael rises early in the booting of a new ship, since we bake identity into Arvo very soon. This results from `%dawn` task passed with a `$dawn-event`: + +```hoon ++$ dawn-event + $: =seed + spon=(list [=ship point:azimuth-types]) + czar=(map ship [=rift =life =pass]) + turf=(list turf) + bloq=@ud + node=(unit purl:eyre) + == +``` + +Jael is a landlocked vane, but does receive a `%dawn` event constructed by `vere/dawn.c:u3_dawn_vent()`. + +* Review that function and `_dawn_post_json()` for the actual information retrieval. + +## Example <a href="#example" id="example"></a> + +* Boot a comet using a star as its sponsor if you have a running star. (Maybe try with the next option.) +* Boot a fake comet. (See `vere/main.c` and override.) +* Build a fake sponsorship chain. Can you change the sponsorship chain to another point (emancipate) on a fakenet? Why or why not? |