diff options
Diffstat (limited to 'docs/runtime')
-rw-r--r-- | docs/runtime/README.md | 31 | ||||
-rw-r--r-- | docs/runtime/api.md | 648 | ||||
-rw-r--r-- | docs/runtime/c.md | 93 | ||||
-rw-r--r-- | docs/runtime/conn.md | 434 | ||||
-rw-r--r-- | docs/runtime/cryptography.md | 34 | ||||
-rw-r--r-- | docs/runtime/jetting.md | 1414 | ||||
-rw-r--r-- | docs/runtime/nouns.md | 311 | ||||
-rw-r--r-- | docs/runtime/u3.md | 34 |
8 files changed, 2999 insertions, 0 deletions
diff --git a/docs/runtime/README.md b/docs/runtime/README.md new file mode 100644 index 0000000..c9612ac --- /dev/null +++ b/docs/runtime/README.md @@ -0,0 +1,31 @@ +--- +description: "Documentation for Urbit's runtime and Nock interpreter (Vere), including guides for U3, jetting, C programming in Urbit, and cryptography implementation." +layout: + title: + visible: true + description: + visible: false + tableOfContents: + visible: true + outline: + visible: true + pagination: + visible: true +--- + +# Runtime + +These guides cover Urbit's runtime and Nock interpreter, which is called Vere. + +* [U3 Overview](u3.md) - An overview of the noun-wrangling part of the runtime, U3. +* [Conn.c Guide](conn.md) - Using `conn.c` to interact with a running ship from the outside. +* [How to Write a Jet](jetting.md) - A jetting guide by for new Urbit developers. +* [C3: C in Urbit](c.md) - Under u3 is the simple c3 layer, which is just how we write C in Urbit. +* [U3: Land of Nouns](nouns.md) - The division between c3 and u3 is that you could theoretically imagine using c3 as just a generic C environment. Anything to do with nouns is in u3. +* [U3: API Overview](api.md) - A walkthrough of each of the u3 modules. +* [Cryptography](cryptography.md) - References on the cryptography libraries utilized by jets. + +## Additional Resources <a href="#additional-resources" id="additional-resources"></a> + +* [User Reference](../../user-manual/running/vere.md) - Reference for the utilities and options the runtime takes from the terminal. +* [The Vere Repo](https://github.com/urbit/vere) - Github repository for the Urbit runtime. diff --git a/docs/runtime/api.md b/docs/runtime/api.md new file mode 100644 index 0000000..43c0817 --- /dev/null +++ b/docs/runtime/api.md @@ -0,0 +1,648 @@ +--- +description: "API overview by prefix for u3 system modules. Function reference for noun manipulation, jets, allocation, Nock computation." +layout: + title: + visible: true + description: + visible: false + tableOfContents: + visible: true + outline: + visible: true + pagination: + visible: true +--- + +# API overview by prefix + +Let's run through the `u3` modules one by one. All public functions are commented, but the comments may be cryptic. + +## u3m: main control {#u3m-main-control} + +To start `u3`, run + +```c +/* u3m_boot(): start the u3 system. +*/ + void + u3m_boot(c3_o nuu_o, c3_o bug_o, c3_c* dir_c); +``` + +`nuu_o` is `c3y` (yes, `0`) if you're creating a new pier, `c3n` (no, `1`) if you're loading an existing one. `bug_o` is `c3y` if you want to test the garbage-collector, `c3n` otherwise. `dir_c` is the directory for the pier files. + +`u3m_boot()` expects an `urbit.pill` file to load the kernel from. It will try first `$dir/.urb.urbit.pill`, then `U3_LIB`. + +Any significant computation with nouns, certainly anything Turing complete, should be run (a) virtualized and (b) in an inner road. These are slightly different things, but at the highest level we bundle them together for your convenience, in `u3m_soft()`: + +```c +/* u3m_soft(): system soft wrapper. unifies unix and nock errors. +** +** Produces [%$ result] or [%error (list tank)]. +*/ + u3_noun + u3m_soft(c3_w sec_w, u3_funk fun_f, u3_noun arg); +``` + +`sec_w` is the number of seconds to time out the computation. `fun_f` is a C function accepting `arg`. + +The result of `u3m_soft()` is a cell whose head is an atom. If the head is `%$` - ie, `0` - the tail is the result of `fun_f(arg)`. Otherwise, the head is a `term` (an atom which is an LSB first string), and the tail is a `(list tank)` (a list of `tank` printables - see `+tank` in `hoon.hoon`). Error terms should be the same as the exception terms above. + +If you're confident that your computation won't fail, you can use `u3m_soft_sure()`, `u3m_soft_slam()`, or `u3m_soft_nock()` for C functions, Hoon function calls, and Nock invocations. Caution - this returns just the result, and asserts globally. + +All the `u3m_soft` functions above work **only on the surface**. Within the surface, virtualize with `u3m_soft_run()`. Note that this takes a `fly` (a namespace gate), thus activating the `11` super-operator in the nock virtualizer, `+mock`. When actually using the `fly`, call `u3m_soft_esc()`. Don't do either unless you know what you're doing! + +For descending into a subroad **without** Nock virtualization, use `u3m_hate()` and `u3m_love` respectively. Hating enters a subroad; loving leaves it, copying out a product noun. + +Other miscellaneous tools in `u3m`: `u3m_file()` loads a Unix file as a Nock atom; `u3m_water()` measures the boundaries of the loom in current use (ie, watermarks); and a variety of prettyprinting routines, none perfect, are available, mainly for debugging printfs: `u3m_pretty()`, `u3m_p()`, `u3m_tape()` and `u3m_wall()`. + +It's sometimes nice to run a mark-and-sweep garbage collector, `u3m_grab()`, which collects the world from a list of roots, and asserts if it finds any leaks or incorrect refcounts. This tool is for debugging and long-term maintenance only; refcounts should never err. + +## u3j: jets {#u3j-jets} + +The jet system, `u3j`, is what makes `u3` and `nock` in any sense a useful computing environment. Except perhaps `u3a` (there is really no such thing as a trivial allocator, though `u3a` is dumber than most) - `u3j` is the most interesting code in `u3`. + +Let's consider the minor miracle of driver-to-battery binding which lets `u3j` work - and decrement not be `O(n)` - without violating the precisely defined semantics of pure Nock, **ever**. + +It's easy to assume that jets represent an architectural coupling between Hoon language semantics and Nock interpreter internals. Indeed such a coupling would be wholly wrongtious and un-Urbit. But the jet system is not Hoon-specific. It is specific to nock runtime systems that use a design pattern we call a `core`. + +### u3j: core structure {#u3j-core-structure} + +A core is no more than a cell `[code data]`, in which a `code` is either a Nock formula or a cell of `code`s, and `data` is anything. In a proper core, the subject each formula expects is the core itself. + +Except for the arbitrary decision to make a core `[code data]`, (or as we sometimes say, `[battery payload]`), instead of `[data code]`, any high-level language transforming itself to Nock would use this design. + +So jets are in fact fully general. Broadly speaking, the jet system works by matching a C **driver** to a battery. When the battery is invoked with Nock operator `9`, it must be found in associative memory and linked to its driver. Then we link the formula axis of the operation (`a` in `[9 a b]`) to a specific function in the driver. + +To validate this jet binding, we need to know two things. One, we need to know the C function actually is a perfect semantic match for the Nock formula. This can be developed with driver test flags, which work, and locked down with a secure formula hash in the driver, which we haven't bothered with just yet. (You could also try to develop a formal method for verifying that C functions and Nock formulas are equivalent, but this is a research problem for the future.) + +Two, we need to validate that the payload is appropriate for the battery. We should note that jets are a Nock feature and have no reference to Hoon. A driver which relies on the Hoon type system to only pair it with valid payloads is a broken driver, and breaks the Nock compliance of the system as a whole. So don't. + +Now, a casual observer might look at `[battery payload]` and expect the simplest case of it to be `[formula subject]`. That is: to execute a simple core whose battery is a single formula, we compute + +``` +nock(+.a -.a) +``` + +Then, naturally, when we go from Hoon or a high-level language containing functions down to Nock, `[function arguments]` turns into `[formula subject]`. This seems like an obvious design, and we mention it only because it is **completely wrong**. + +Rather, to execute a one-armed core like the above, we run + +``` +nock(a -.a) +``` + +and the normal structure of a `gate`, which is simply Urbitese for "function," is: + +``` +[formula [sample context]] +``` + +where `sample` is Urbitese for "arguments" - and `context`, any Lisper will at once recognize, is Urbitese for "environment." + +To `slam` or call the gate, we simply replace the default sample with the caller's data, then nock the formula on the entire gate. + +What's in the context? Unlike in most dynamic languages, it is not some secret system-level bag of tricks. Almost always it is another core. This onion continues until at the bottom, there is an atomic constant, conventionally is the kernel version number. + +Thus a (highly desirable) `static` core is one of the form + +``` +[battery constant] +[battery static-core] +``` + +ie, a solid stack of nested libraries without any dynamic data. The typical gate will thus be, for example, + +``` +[formula [sample [battery battery battery constant]]] +``` + +but we would be most foolish to restrict the jet mechanism to cores of this particular structure. We cannot constrain a payload to be `[sample static-core]`, or even `[sample core]`. Any such constraint would not be rich enough to handle Hoon, let alone other languages. + +### u3j: jet state {#u3j-jet-state} + +There are two fundamental rules of computer science: (1) every system is best understood through its state; (2) less state is better than more state. Sadly, a pier has three different jet state systems: `cold`, `warm` and `hot`. It needs all of them. + +Hot state is associated with this particular Unix process. The persistent pier is portable not just between process and process, but machine and machine or OS and OS. The set of jets loaded into a pier may itself change (in theory, though not in the present implementation) during the lifetime of the process. Hot state is a pure C data structure. + +Cold state is associated with the logical execution history of the pier. It consists entirely of nouns and ignores restarts. + +Warm state contains all dependencies between cold and hot state. It consists of C structures allocated on the loom. + +Warm state is purely a function of cold and hot states, and we can wipe and regenerate it at any time. On any restart where the hot state might have changed, we clear the warm state with `u3j_ream()`. + +There is only one hot state, the global jet dashboard `u3j_Dash` or `u3D` for short. In the present implementation, u3D is a static structure not modified at runtime, except for numbering itself on process initialization. This structure - which embeds function pointers to all the jets - is defined in `j/tree.c`. The data structures: + +```c + /* u3j_harm: driver arm. + */ + typedef struct _u3j_harm { + c3_c* fcs_c; // `.axe` or name + u3_noun (*fun_f)(u3_noun); // compute or 0 / semitransfer + c3_o ice; // perfect (don't test) + c3_o tot; // total (never punts) + c3_o liv; // live (enabled) + } u3j_harm; + + /* u3j_core: C core driver. + */ + typedef struct _u3j_core { + c3_c* cos_c; // control string + struct _u3j_harm* arm_u; // blank-terminated static list + struct _u3j_core* dev_u; // blank-terminated static list + struct _u3j_core* par_u; // dynamic parent pointer + c3_l jax_l; // dynamic jet index + } u3j_core; + + /* u3e_dash, u3_Dash, u3D: jet dashboard singleton + */ + typedef struct _u3e_dash { + u3j_core* dev_u; // null-terminated static list + c3_l len_l; // ray_u filled length + c3_l all_l; // ray_u allocated length + u3j_core* ray_u; // dynamic driver array + } u3j_dash; +``` + +Warm and cold state is **per road**. In other words, as we nest roads, we also nest jet state. The jet state in the road is: + +```c + struct { // jet dashboard + u3p(u3h_root) har_p; // warm state + u3_noun das; // cold state + } jed; +``` + +In case you understand Hoon, `das` (cold state) is a `+dash`, and `har_p` (warm state) is a map from battery to `+calx`: + +```hoon + ++ bane ,@tas :: battery name + ++ bash ,@uvH :: label hash + ++ bosh ,@uvH :: local battery hash + ++ batt ,* :: battery + ++ calf :: + $: jax=,@ud :: hot core index + hap=(map ,@ud ,@ud) :: axis/hot arm index + lab=path :: label as path + jit=* :: arbitrary data + == :: + ++ calx (trel calf (pair bash cope) club) :: cached by battery + ++ clog (pair cope (map batt club)) :: identity record + ++ club (pair corp (map term nock)) :: battery pattern + ++ cope (trel bane axis (each bash noun)) :: core pattern + ++ core ,* :: core + ++ corp (each core batt) :: parent or static + ++ dash (map bash clog) :: jet system +``` + +The driver index `jax` in a `+calx` is an index into `ray_u` in the dashboard - ie, a pointer into hot state. This is why the warm state has to be reset when we reload the pier in a new process. + +Why is jet state nested? Nock of course is a functional system, so as we compute we don't explicitly create state. Jet state is an exception to this principle (which works only because it can't be semantically detected from Nock/Hoon) - but it can't violate the fundamental rules of the allocation system. + +For instance, when we're on an inner road, we can't allocate on an outer road, or point from an outer road to an inner. So if we learn something - like a mapping from battery to jet - in the inner road, we have to keep it in the inner road. + +Mitigating this problem, when we leave an inner road (with `u3m_love()`), we call `u3j_reap()` to promote jet information in the dying road. Reaping promotes anything we've learned about any battery that either (a) already existed in the outer road, or (b) is being saved to the outer road. + +### u3j: jet binding {#u3j-jet-binding} + +Jet binding starts with a `%fast` hint. (In Hoon, this is produced by the runes `~%`, for the general case, or `~/` for simple functions.) To bind a jet, execute a formula of the form: + +``` +[10 [%fast clue-formula] core-formula] +``` + +`core-formula` assembles the core to be jet-propelled. `clue-formula` produces the hint information, or `+clue` above, which we want to annotate it with. + +A clue is a triple of name, parent, and hooks: + +```hoon +++ clue (trel chum nock (list (pair term nock))) +``` + +The name, or `+chum`, has a bunch of historical structure which we don't need (cleaning these things up is tricky), but just gets flattened into a term. + +The parent axis is a nock formula, but always reduces to a simple axis, which is the address of this core's **parent**. Consider again an ordinary gate + +``` +[formula [sample context]] +``` + +Typically the `context` is itself a library core, which itself has a jet binding. If so, the parent axis of this gate is `7`. + +If the parent is already bound - and the parent **must** be already bound, in this road or a road containing it - we can hook this core bottom-up into a tree hierarchy. Normally the child core is produced by an arm of the parent core, so this is not a problem - we wouldn't have the child if we hadn't already made the parent. + +The clue also contains a list of **hooks**, named nock formulas on the core. Usually these are arms, but they need not be. The point is that we often want to call a core from C, in a situation where we have no type or other source information. A common case of this is a complex system in which we're mixing functions which are jet-propelled with functions that aren't. + +In any case, all the information in the `%fast` hint goes to `u3j_mine()`, which registers the battery in cold state (`das` in `jed` in `u3R`), then warm state (`har_p` in `jed`). + +It's essential to understand that the `%fast` hint has to be, well, fast - because we apply it whenever we build a core. For instance, if the core is a Hoon gate - a function - we will call `u3j_mine` every time the function is called. + +### u3j: the cold jet dashboard {#u3j-the-cold-jet-dashboard} + +For even more fun, the jet tree is not actually a tree of batteries. It's a tree of battery **labels**, where a label is an [axis term] path from the root of the tree. (At the root, if the core pattern is always followed properly, is a core whose payload is an atomic constant, conventionally the Hoon version.) + +Under each of these labels, it's normal to have an arbitrary number of different Nock batteries (not just multiple copies of the same noun, a situation we **do** strive to avoid). For instance, one might be compiled with debugging hints, one not. + +We might even have changed the semantics of the battery without changing the label - so long as those semantics don't invalidate any attached driver. + +For instance, it's normal to have two equivalent Nock batteries at the same time in one pier: one battery compiled with debugging hints, one not. + +Rather, the jet tree is a semantic hierarchy. The root of the hierarchy is a constant, by convention the Hoon kernel version because any normal jet-propelled core has, at the bottom of its onion of libraries, the standard kernel. Thus if the core is + +``` +[foo-battery [bar-battery [moo-battery 164]]] +``` + +we can reverse the nesting to construct a hierarchical core path. The static core + +``` +164/moo/bar/foo +``` + +extends the static core `164/moo/bar` by wrapping the `foo` battery (ie, in Hoon, `|%`) around it. With the core above, you can compute `foo` stuff, `bar` stuff, and `moo` stuff. Rocket science, not. + +Not all cores are static, of course - they may contain live data, like the sample in a gate (ie, argument to a function). Once again, it's important to remember that we track jet bindings not by the core, which may not be static, but by the battery, which is always static. + +(And if you're wondering how we can use a deep noun like a Nock formula or battery as a key in a key-value table, remember `mug_w`, the lazily computed short hash, in all boxed nouns.) + +In any case, `das`, the dashboard, is a map from `bash` to jet location record (`+clog`). A `clog` in turn contains two kinds of information: the `+cope`, or per-location noun; and a map of batteries to a per-battery `+club`. + +The `cope` is a triple of `+bane` (battery name, right now just a `term`); `+axis`, the axis, within **this** core, of the parent; and `(each bash noun)`, which is either `[0 bash]` if the parent is another core, or `[1 noun]`, for the constant noun (like `164`) if there is no parent core. + +A `bash` is just the noun hash (`+sham`) of a `cope`, which uniquely expresses the battery's hierarchical location without depending on the actual formulas. + +The `club` contains a `+corp`, which we use to actually validate the core. Obviously jet execution has to be perfectly compatible with Nock. We search on the battery, but getting the battery right is not enough - a typical battery is dependent on its context. For example, your jet-propelled library function is very likely to call `+dec` or other advanced kernel technology. If you've replaced the kernel in your context with something else, we need to detect this and not run the jet. + +There are two cases for a jet-propelled core - either the entire core is a static constant, or it isn't. Hence the definition of `corp`: + +```hoon +++ corp (each core batt) :: parent or static +``` + +Ie, a `corp` is `[0 core]` or `[1 batt]`. If it's static - meaning that the jet only works with one specific core, ie, the parent axis of each location in the hierarchy is `3` - we can validate with a single comparison. Otherwise, we have to recurse upward by checking the parent. + +Note that there is at present no way to force a jet to depend on static **data**. + +### u3j: the warm jet dashboard {#u3j-the-warm-jet-dashboard} + +We don't use the cold state to match jets as we call them. We use the cold state to register jets as we find them, and also to rebuild the warm state after the hot state is reset. + +What we actually use at runtime is the warm state, `jed->har_p`, which is a `u3h` (built-in hashtable), allocated on the loom, from battery to `+calx`. + +A `calx` is a triple of a `+calf`, a `[bash cope]` cell, and a `club`. The latter two are all straight from cold state. + +The `calf` contains warm data dependent on hot state. It's a quadruple: of `jax`, the hot driver index (in `ray_u` in `u3j_dash`); `hap`, a table from arm axis (ie, the axis of each formula within the battery) to driver arm index (into `arm_u` in `u3j_core`); `lab`, the complete label path; and `jit`, any other dynamic data that may speed up execution. + +We construct `hap`, when we create the calx, by iterating through the arms registered in the `u3j_core`. Note the way a `u3j_harm` declares itself, with the string `fcs_c` which can contain either an axis or a name. Most jetted cores are of course gates, which have one formula at one axis within the core: `fcs_c` is `".3"`. + +But we do often have fast cores with more complex arm structure, and it would be sad to have to manage their axes by hand. To use an `fcs_c` with a named arm, it's sufficient to make sure the name is bound to a formula `[0 axis]` in the hook table. + +`jit`, as its name suggests, is a stub where any sort of optimization data computed on battery registration might go. To use it, fill in the `_cj_jit()` function. + +### u3j: the hot dashboard {#u3j-the-hot-dashboard} + +Now it should be easy to see how we actually invoke jets. Every time we run a nock `9` instruction (pretty often, obviously), we have a core and an axis. We pass these to `u3j_kick()`, which will try to execute them. + +Because nouns with a reference count of 1 are precious, `u3j_kick()` has a tricky reference control definition. It reserves the right to return `u3_none` in the case where there is no driver, or the driver does not apply for this case; in this case, it retains argument `cor`. If it succeeds, though, it transfers `cor`. + +`u3j_kick()` searches for the battery (always the head of the core, of course) in the hot dashboard. If the battery is registered, it searches for the axis in `hap` in the `calx`. If it exists, the core matches a driver and the driver jets this arm. If not, we return `u3_none`. + +Otherwise, we call `fun_f` in our `u3j_harm`. This obeys the same protocol as `u3j_kick()`; it can refuse to function by returning `u3_none`, or consume the noun. + +Besides the actual function pointer `fun_f`, we have some flags in the `u3j_harm` which tell us how to call the arm function. + +If `ice` is yes (`&`, `0`), the jet is known to be perfect and we can just trust the product of `fun_f`. Otherwise, we need to run **both** the Nock arm and `fun_f`, and compare their results. + +(Note that while executing the C side of this test, we have to set `ice` to yes; on the Nock side, we have to set `liv` to no. Otherwise, many non-exponential functions become exponential. When auto-testing jets in this way, the principle is that the test is on the outermost layer of recursion.) + +(Note also that anyone who multi-threads this execution environment has a slight locking problem with these flags if arm testing is multi-threaded.) + +If `tot` is yes, (`&`, `0`), the arm function is **total** and has to return properly (though it can still return **u3_none**). Otherwise, it is **partial** and can `u3_cm_bail()` out with c3__punt. This feature has a cost: the jet runs in a subroad. + +Finally, if `liv` is no (`|`, 1), the jet is off and doesn't run. + +It should be easy to see how the tree of cores gets declared - precisely, in `j/dash.c`. We declare the hierarchy as a tree of `u3j_core` structures, each of which comes with a static list of arms `arm_u` and sub-cores `dev_u`. + +In `u3j_boot()`, we traverse the hierarchy, fill in parent pointers `par_u`, and enumerate all `u3j_core` structures into a single flat array `u3j_dash.ray_u`. Our hot state then appears ready for action. + +### u3j: jet functions {#u3j-jet-functions} + +At present, all drivers are compiled statically into `u3`. This is not a long-term permanent solution or anything. However, it will always be the case with a certain amount of core functionality. + +For instance, there are some jet functions that we need to call as part of loading the Arvo kernel - like `+cue` to unpack a noun from an atom. And obviously it makes sense, when jets are significant enough to compile into `u3`, to export their symbols in headers and the linker. + +There are three interface prefixes for standard jet functions: `u3k`, `u3q`, and `u3w`. All jets have `u3w` interfaces; most have `u3q`; some have `u3k`. Of course the actual logic is shared. + +`u3w` interfaces use the same protocol as `fun_f` above: the caller passes the entire core, which is retained if the function returns `u3_none`, transferred otherwise. Why? Again, use counts of 1 are special and precious for performance hackers. + +`u3q` interfaces break the core into C arguments, **retain** noun arguments, and **transfer** noun returns. `u3k` interfaces are the same, except with more use of `u3_none` and other simple C variations on the Hoon original, but **transfer** both arguments and returns. Generally, `u3k` are most convenient for new code. + +Following `u3k/q/w` is `[a-f]`, corresponding to the 6 logical tiers of the kernel, or `g` for user-level jets. Another letter is added for functions within subcores. The filename, under `j/`, follows the tier and the function name. + +For instance, `+add` is `u3wa_add(cor)`, `u3qa_add(a, b)`, or `u3ka_add(a, b)`, in `j/a/add.c`. `+get` in `+by` is `u3wdb_get(cor)`, `u3kdb_get(a, b)`, etc, in `j/d/by_get.c`. + +For historical reasons, all internal jet code in `j/[a-f]` **retains** noun arguments, and **transfers** noun results. Please do not do this in new `g` jets! The new standard protocol is to transfer both arguments and results. + +## u3a: allocation functions {#u3a-allocation-functions} + +`u3a` allocates on the current road (u3R). Its internal structures are uninteresting and typical of a naive allocator. + +The two most-used `u3a` functions are `u3a_gain()` to add a reference count, and `u3a_lose()` to release one (and free the noun, if the use count is zero). For convenience, `u3a_gain()` returns its argument. The pair are generally abbreviated with the macros `u3k()` and `u3z()` respectively. + +Normally we create nouns through `u3i` functions, and don't call the `u3a` allocators directly. But if you do: + +One, there are **two** sets of allocators: the word-aligned allocators and the fully-aligned (ie, malloc compatible) allocators. For instance, on a typical OS X setup, malloc produces 16-byte aligned results - needed for some SSE instructions. + +These allocators are **not compatible**. For 32-bit alignment as used in nouns, call + +```c + /* u3a_walloc(): allocate storage measured in words. + */ + void* + u3a_walloc(c3_w len_w); + + /* u3a_wfree(): free storage. + */ + void + u3a_wfree(void* lag_v); + + /* u3a_wealloc(): word realloc. + */ + void* + u3a_wealloc(void* lag_v, c3_w len_w); +``` + +For full alignment, call: + +```c + /* u3a_malloc(): aligned storage measured in bytes. + */ + void* + u3a_malloc(size_t len_i); + + /* u3a_realloc(): aligned realloc in bytes. + */ + void* + u3a_realloc(void* lag_v, size_t len_i); + + /* u3a_realloc2(): gmp-shaped realloc. + */ + void* + u3a_realloc2(void* lag_v, size_t old_i, size_t new_i); + + /* u3a_free(): free for aligned malloc. + */ + void + u3a_free(void* tox_v); + + /* u3a_free2(): gmp-shaped free. + */ + void + u3a_free2(void* tox_v, size_t siz_i); +``` + +There are also a set of special-purpose allocators for building atoms. When building atoms, please remember that it's incorrect to have a high 0 word - the word length in the atom structure must be strictly correct. + +Of course, we don't always know how large our atom will be. Therefore, the standard way of building large atoms is to allocate a block of raw space with `u3a_slab()`, then chop off the end with `u3a_malt()` (which does the measuring itself) or `u3a_mint()` in case you've measured it yourself. + +Once again, **do not call `malloc()`** (or C++ `new`) within any code that may be run within a jet. This will cause rare sporadic corruption when we interrupt execution within a `malloc()`. We'd just override the symbol, but `libuv` uses `malloc()` across threads within its own synchronization primitives - for this to work with `u3a_malloc()`, we'd have to introduce our own locks on the surface-level road (which might be a viable solution). + +## u3n: nock execution {#u3n-nock-execution} + +The `u3n` routines execute Nock itself. On the inside, they have a surprising resemblance to the spec proper (the only interesting detail is how we handle tail-call elimination) and are, as one would expect, quite slow. (There is no such thing as a fast tree interpreter.) + +There is only one Nock, but there are lots of ways to call it. (Remember that all `u3n` functions **transfer** C arguments and returns.) + +The simplest interpreter, `u3n_nock_on(u3_noun bus, u3_noun fol)` invokes Nock on `bus` (the subject) and `fol` (the formula). (Why is it`[subject formula]`, not `[formula subject]`? The same reason `0` is true and `1` is false.) + +A close relative is `u3n_slam_on(u3_noun gat, u3_noun sam)`, which slams a **gate** (`gat`) on a sample (`sam`). (In a normal programming language which didn't talk funny and was retarded, `u3n_slam_on()` would call a function on an argument.) We could write it most simply as: + +```c + u3_noun + u3n_slam_on(u3_noun gat, u3_noun sam) + { + u3_noun pro = u3n_nock_on + (u3nc(u3k(u3h(gat)), + u3nc(sam, u3k(u3t(u3t(gat))))), + u3k(u3h(gat))); + u3z(gat); + return pro; + } +``` + +Simpler is `u3n_kick_on(u3_noun gat)`, which slams a gate (or, more generally, a **trap** - because sample structure is not even needed here) without changing its sample: + +```c + u3_noun + u3n_kick_on(u3_noun gat, u3_noun sam) + { + return u3n_nock_on(gat, u3k(u3h(gat))); + } +``` + +The `_on` functions in `u3n` are all defined as pure Nock. But actually, even though we say we don't extend Nock, we do. But we don't. But we do. + +Note that `u3` has a well-developed error handling system - `u3m_bail()` to throw an exception, `u3m_soft_*` to catch one. But Nock has no exception model at all. That's okay - all it means if that if an `_on` function bails, the exception is an exception in the caller. + +However, `u3`'s exception handling happens to match a convenient virtual super-Nock in `hoon.hoon`, the infamous `+mock`. Of course, Nock is slow, and `mock` is Nock in Nock, so it is (logically) super-slow. Then again, so is decrement. + +With the power of `u3`, we nest arbitrary layers of `mock` without any particular performance cost. Moreover, we simply treat Nock proper as a special case of `mock`. (More precisely, the internal VM loop is `+mink` and the error compiler is `+mook`. But we call the whole sandbox system `mock`.) + +The nice thing about `mock` functions is that (by executing within `u3m_soft_run()`, which as you may recall uses a nested road) they provide both exceptions and the namespace operator - `.^` in Hoon, which becomes operator `11` in `mock`. + +`11` requires a namespace function, or `fly`, which produces a `+unit` - `~` (`0`) for no binding, or `[0 value]`. The sample to a `fly` is a `+path`, just a list of text `span`. + +`mock` functions produce a `+toon`. Fully elaborated: + +```hoon + ++ noun ,* :: any noun + ++ path (list ,@ta) :: namespace path + ++ span ,@ta :: text-atom (ASCII) + ++ toon $% [%0 p=noun] :: success + [%1 p=(list path)] :: blocking paths + [%2 p=(list tank)] :: stack trace + == :: + ++ tank :: printable + $% [%leaf p=tape] :: flat text + $: %palm :: backstep list + p=[p=tape q=tape r=tape s=tape] :: mid cap open close + q=(list tank) :: contents + == :: + $: %rose :: straight list + p=[p=tape q=tape r=tape] :: mid open close + q=(list tank) :: contents + == :: + == +``` + +(Note that `tank` is overdesigned and due for replacement.) + +What does a `toon` mean? Either your computation succeeded (`[0 noun]`, or could not finish because it blocked on one or more global paths (`[1 (list path)]`), or it exited with a stack trace (`[2 (list tank)]`). + +Note that of all the `u3` exceptions, only `%exit` is produced deterministically by the Nock definition. Therefore, only `%exit` produces a `2` result. Any other argument to `u3m_bail()` will unwind the virtualization stack all the way to the top - or to be more exact, to `u3m_soft_top()`. + +In any case, the simplest `mock` functions are `u3n_nock_un()` and `u3n_slam_un()`. These provide exception control without any namespace change, as you can see by the code: + +```c + /* u3n_nock_un(): produce .*(bus fol), as ++toon. + */ + u3_noun + u3n_nock_un(u3_noun bus, u3_noun fol) + { + u3_noun fly = u3nt(u3nt(11, 0, 6), 0, 0); // |=(a=* .^(a)) + + return u3n_nock_in(fly, bus, fol); + } + + /* u3n_slam_un(): produce (gat sam), as ++toon. + */ + u3_noun + u3n_slam_un(u3_noun gat, u3_noun sam) + { + u3_noun fly = u3nt(u3nt(11, 0, 6), 0, 0); // |=(a=* .^(a)) + + return u3n_slam_in(fly, gat, sam); + } +``` + +The `fly` is added as the first argument to `u3n_nock_in()` and `u3n_slam_in()`. Of course, logically, `fly` executes in the caller's exception layer. (Maintaining this illusion is slightly nontrivial.) Finally, `u3n_nock_an()` is a sandbox with a null namespace. + +## u3e: persistence {#u3e-persistence} + +The only `u3e` function you should need to call is `u3e_save()`, which saves the loom. As it can be restored on any platform, please make sure you don't have any state in the loom that is bound to your process or architecture - except for exceptions like the warm jet state, which is actively purged on reboot. + +## u3r: reading nouns (weak) {#u3r-reading-nouns-weak} + +As befits accessors they don't make anything, `u3r` noun reading functions always retain their arguments and their returns. They never bail; rather, when they don't work, they return a `u3_weak` result. + +Most of these functions are straightforward and do only what their comments say. A few are interesting enough to discuss. + +`u3r_at()` is the familiar tree fragment function, `/` from the Nock spec. For taking complex nouns apart, `u3r_mean()` is a relatively funky way of deconstructing nouns with a varargs list of `axis`, `u3_noun *`. For cells, triples, etc, decompose with `u3r_cell()`, `u3r_trel()`, etc. For the tagged equivalents, use `u3r_pq()` and friends. + +`u3r_sing(u3_noun a, u3_noun b)` (true if `a` and `b` are a **single** noun) are interesting because it uses mugs to help it out. Clearly, different nouns may have the same mug, but the same nouns cannot have a different mug. It's important to understand the performance characteristics of `u3r_sing()`: the worst possible case is a comparison of duplicate nouns, which have the same value but were created separately. In this case, the tree is traversed + +`u3r_sung()` is a deeply funky and frightening version of `u3r_sing()` that unifies pointers to the duplicate nouns it finds, freeing the second copy. Obviously, do not use `u3r_sung()` when you have live, but not reference counted, noun references from C - if they match a noun with a refcount of 1 that gets freed, bad things happen. + +It's important to remember that `u3r_mug()`, which produces a 31-bit, nonzero insecure hash, uses the `mug_w` slot in any boxed noun as a lazy cache. There are a number of variants of `u3r_mug()` that can get you out of building unneeded nouns. + +## u3x: reading nouns (bail) {#u3x-reading-nouns-bail} + +`u3x` functions are like `u3r` functions, but instead of returning `u3_none` when (for instance) we try to take the head of an atom, they bail with `%exit`. In other words, they do what the same operation would do in Nock. + +## u3h: hash tables. {#u3h-hash-tables} + +We can of course use the Hoon `map` structure as an associative array. This is a balanced treap and reasonably fast. However, it's considerably inferior to a custom structure like an HAMT (hash array-mapped trie). We use `u3_post` to allocate HAMT structures on the loom. + +(Our HAMT implements the classic Bagwell algorithm which depends on the `gcc` standard directive `__builtin_popcount()`. On a CPU which doesn't support popcount or an equivalent instruction, some other design would probably be preferable.) + +There's no particular rocket science in the API. `u3h_new()` creates a hashtable; `u3h_free()` destroys one; `u3h_put()` inserts, `u3h_get()` retrieves. You can transform values in a hashtable with `u3h_walk()`. + +The only funky function is `u3h_gut()`, which unifies keys with `u3r_sung()`. As with all cases of `u3r_sung()`, this must be used with extreme caution. + +## u3z: memoization {#u3z-memoization} + +Connected to the `~+` rune in Hoon, via the Nock `%memo` hint, the memoization facility is a general-purpose cache. + +(It's also used for partial memoization - a feature that'll probably be removed, in which conservative worklist algorithms (which would otherwise be exponential) memoize everything in the subject **except** the worklist. This is used heavily in the Hoon compiler jets (j/f/*.c). Unfortunately, it's probably not possible to make this work perfectly in that it can't be abused to violate Nock, so we'll probably remove it at a later date, instead making `+ut` keep its own monadic cache.) + +Each `u3z` function comes with a `c3_m` mote which disambiguates the function mapping key to value. For Nock itself, use 0. For extra speed, small tuples are split out in C; thus, find with + +```c + u3_weak u3z_find(c3_m, u3_noun); + u3_weak u3z_find_2(c3_m, u3_noun, u3_noun); + u3_weak u3z_find_3(c3_m, u3_noun, u3_noun, u3_noun); + u3_weak u3z_find_4(c3_m, u3_noun, u3_noun, u3_noun, u3_noun); +``` + +and save with + +```c + u3_noun u3z_save(c3_m, u3_noun, u3_noun); + u3_noun u3z_save_2(c3_m, u3_noun, u3_noun, u3_noun); + u3_noun u3z_save_3(c3_m, u3_noun, u3_noun, u3_noun, u3_noun); + u3_noun u3z_save_4(c3_m, u3_noun, u3_noun, u3_noun, u3_noun, u3_noun); +``` + +where the value is the last argument. To eliminate duplicate nouns, there is also + +```c + u3_noun + u3z_uniq(u3_noun); +``` + +`u3z` functions retain keys and transfer values. + +The `u3z` cache, built on `u3h` hashes, is part of the current road, and goes away when it goes away. (In future, we may wish to promote keys/values which outlive the road, as we do with jet state.) There is no cache reclamation at present, so be careful. + +## u3t: tracing and profiling. {#u3t-tracing-and-profiling} + +TBD. + +## u3v: the Arvo kernel {#u3v-the-arvo-kernel} + +An Arvo kernel - or at least, a core that compiles with the Arvo interface - is part of the global `u3` state. What is an Arvo core? Slightly pseudocoded: + +```hoon + ++ arvo + |% + ++ come |/ {yen/@ ova/(list ovum) nyf/pone} :: 11 + ^- {(list ovum) _+>} + !! + ++ keep |/ {now/@da hap/path} :: 4 + ^- (unit ,@da) + !! + ++ load |/ {yen/@ ova/(list ovum) nyf/pane} :: 86 + ^- {(list ovum) _+>} + !! + ++ peek |/ {now/@da path} :: 87 + ^- (unit) + !! + ++ poke |/ {now/@da ovo/ovum} :: 42 + ^- {(list ovum) _+>} + !! + ++ wish |/ txt/@ta :: 20 + ^- * + !! + -- + ++ card {p/@tas q/*} :: typeless card + ++ ovum {p/wire q/card} :: Arvo event + ++ wire path :: event cause +``` + +This is the Arvo ABI in a very real sense. Arvo is a core with these six arms. To use these arms, we hardcode the axis of the formula (`11`, `4`, `86`, etc) into the C code that calls Arvo, because otherwise we'd need type metadata - which we can get, by calling Arvo. + +It's important to understand the Arvo event/action structure, or `+ovum`. An `ovum` is a `card`, which is any `[term noun]` cell, and a `+wire`, a `path` which indicates the location of the event. At the Unix level, the `wire` corresponds to a system module or context. For input events, this is the module that caused the event; for output actions, it's the module that performs the action. + +`+poke` sends Arvo an event `ovum`, producing a cell of action ova and a new Arvo core. + +`+peek` dereferences the Arvo namespace. It takes a date and a key, and produces `~` (`0`) or `[~ value]`. + +`+keep` asks Arvo the next time it wants to be woken up, for the given `wire`. (This input will probably be eliminated in favor of a single global timer.) + +`+wish` compiles a string of Hoon source. While just a convenience, it's a very convenient convenience. + +`+come` and `+load` are used by Arvo to reset itself (more precisely, to shift the Arvo state from an old kernel to a new one); there is no need to call them from C. + +Now that we understand the Arvo kernel interface, let's look at the `u3v` API. As usual, all the functions in `u3v` are commented, but unfortunately it's hard to describe this API as clean at present. The problem is that `u3v` remains design coupled to the old `vere` event handling code written for `u2`. But let's describe the functions you should be calling, assuming you're not writing the next event system. There are only two. + +`u3v_wish(str_c)` wraps the `+wish` functionality in a cache (which is read-only unless you're on the surface road). + +`u3v_do()` uses `wish` to provide a convenient interface for calling Hoon kernel functions by name. Even more conveniently, we tend to call `u3v_do()` with these convenient aliases: + +```c + #define u3do(txt_c, arg) u3v_do(txt_c, arg) + #define u3dc(txt_c, a, b) u3v_do(txt_c, u3nc(a, b)) + #define u3dt(txt_c, a, b, c) u3v_do(txt_c, u3nt(a, b, c)) + #define u3dq(txt_c, a, b, c, d) u3v_do(txt_c, u3nt(a, b, c, d)) +``` + +## Code Mnemonics {#code-mnemonics} + +| Shorthand | Expansion | Meaning | +| --- | --- | --- | +| `u3A` | `(&(u3v_Home->arv_u))` | Arvo | +| `u3C` | `u3o_Config` | command line options | +| `u3D` | `u3j_Dash` | jet dashboard | +| `u3H` | `u3v_Home` | home road | +| `u3P` | `u3e_Pool` | snapshotting system | +| `u3R` | `u3a_Road` | current road | +| `u3T` | `u3t_Trace` | tracing profiler | diff --git a/docs/runtime/c.md b/docs/runtime/c.md new file mode 100644 index 0000000..dda6d22 --- /dev/null +++ b/docs/runtime/c.md @@ -0,0 +1,93 @@ +--- +description: "C conventions and style used in Urbit. Integer types, variable naming, loobeans. The c3 layer under u3." +layout: + title: + visible: true + description: + visible: false + tableOfContents: + visible: true + outline: + visible: true + pagination: + visible: true +--- + +# C in Urbit + +Under `u3` is the simple `c3` layer, which is just how we write C in Urbit. + +When writing C in u3, please of course follow the conventions of the code around you as regards indentation, etc. It's especially important that every function have a header comment, even if it says nothing interesting. + +But some of our idiosyncrasies go beyond convention. Yes, we've done awful things to C. Here's what we did and why we did. + +## c3: integer types {#c3-integer-types} + +First, it's generally acknowledged that underspecified integer types are C's worst disaster. C99 fixed this, but the `stdint` types are wordy and annoying. We've replaced them with: + +```c + /* Good integers. + */ + typedef uint64_t c3_d; // double-word + typedef int64_t c3_ds; // signed double-word + typedef uint32_t c3_w; // word + typedef int32_t c3_ws; // signed word + typedef uint16_t c3_s; // short + typedef int16_t c3_ss; // signed short + typedef uint8_t c3_y; // byte + typedef int8_t c3_ys; // signed byte + typedef uint8_t c3_b; // bit + + typedef uint8_t c3_t; // boolean + typedef uint8_t c3_o; // loobean + typedef uint8_t c3_g; // 5-bit atom for a 32-bit log. + typedef uint32_t c3_l; // little; 31-bit unsigned integer + typedef uint32_t c3_m; // mote; also c3_l; LSB first a-z 4-char string. + + /* Bad integers. + */ + typedef char c3_c; // does not match int8_t or uint8_t + typedef int c3_i; // int - really bad + typedef uintptr_t c3_p; // pointer-length uint - really really bad + typedef intptr_t c3_ps; // pointer-length int - really really bad +``` + +Some of these need explanation. A loobean is a Nock boolean - Nock, for mysterious reasons, uses 0 as true (always say "yes") and 1 as false (always say "no"). + +Nock and/or Hoon cannot tell the difference between a short atom and a long one, but at the `u3` level every atom under `2^31` is direct. The `c3_l` type is useful to annotate this. A `c3_m` is a **mote** - a string of up to 4 characters in a `c3_l`, least significant byte first. A `c3_g` should be a 5-bit atom. Of course, C cannot enforce these constraints, only document them. + +Use the "bad" - ie, poorly specified - integer types only when interfacing with external code that expects them. + +An enormous number of motes are defined in `i/c/motes.h`. There is no reason to delete motes that aren't being used, or even to modularize the definitions. Keep them alphabetical, though. + +## c3: variables and variable naming {#c3-variables-and-variable-naming} + +The C3 style uses Hoon style TLV variable names, with a quasi Hungarian syntax. This is weird, but works really well, as long as what you're doing isn't hideously complicated. (Then it works badly, but we shouldn't need anything hideous in u3.) + +A TLV variable name is a random pronounceable three-letter string, sometimes with some vague relationship to its meaning, but usually not. Usually CVC (consonant-vowel-consonant) is a good choice. + +You should use TLVs much the way math people use Greek letters. The same concept should in general get the same name across different contexts. When you're working in a given area, you'll tend to remember the binding from TLV to concept by sheer power of associative memory. When you come back to it, it's not that hard to relearn. And of course, when in doubt, comment it. + +Variables take pseudo-Hungarian suffixes, matching in general the suffix of the integer type: + +```c +c3_w wor_w; // 32-bit word +``` + +Unlike in standard Hungarian, there is no change for pointer variables. C structure variables take a `_u` suffix. + +## c3: loobeans {#c3-loobeans} + +The code (from `defs.h`) tells the story: + +```c + # define c3y 0 + # define c3n 1 + + # define _(x) (c3y == (x)) + # define __(x) ((x) ? c3y : c3n) + # define c3a(x, y) __(_(x) && _(y)) + # define c3o(x, y) __(_(x) || _(y)) +``` + +In short, use `_()` to turn a loobean into a boolean, `__` to go the other way. Use `!` as usual, `c3y` for yes and `c3n` for no, `c3a` for and and `c3o` for or. diff --git a/docs/runtime/conn.md b/docs/runtime/conn.md new file mode 100644 index 0000000..af6af7e --- /dev/null +++ b/docs/runtime/conn.md @@ -0,0 +1,434 @@ +--- +description: "Guide for using conn.c driver. Unix domain socket for external ship interaction. Khan vane, urbit eval utility." +layout: + title: + visible: true + description: + visible: false + tableOfContents: + visible: true + outline: + visible: true + pagination: + visible: true +--- + +# Using conn.c + +## Background {#background} + +Prior to 2022, the only way to interact with a running ship from Earth was via HTTP requests sent to the `%lens` agent. In addition to `%lens`, there was a Python script helper named `herb` which would automatically format HTTP requests for `%lens` based on user inputs. However, there were several pain points for using `%lens` and `herb`. `%lens`, `herb`, and the difficulties of using them are documented more fully [here](https://github.com/urbit/urbit/issues/6418). + +Unrelated to the above, inconveniences around writing boilerplate code to interact with the `%spider` agent was hampering adoption of threads outside of the Arvo kernel. + +## Present {#present} + +Starting in 2022, tools for solving the above issues began to appear (though work on them began in 2021). In order, they are: +- `conn.c` +- The Khan vane +- The `urbit eval` utility +- `-eval` and `-khan-eval` +- The click thin client + +Together, these tools are the building blocks for performing any action on a running ship (poke, scry, or command) from Earth, and receiving back programmatically usable output. + +### `conn.c` {#connc} + +[`conn.c`](https://github.com/urbit/vere/blob/develop/pkg/vere/io/conn.c) is a driver in Vere. It is a part of the "King" (a.k.a. "Urth") process. It exposes a [Unix domain socket](https://en.wikipedia.org/wiki/Unix_domain_socket) at `/path/to/pier/.urb/conn.sock` for sending/receiving data from external processes. + +Input to `conn.c` must be a newt-encoded jammed noun that fits mold `[request-id command arguments]`, where: +- `request-id` is a client-supplied atomic identifier with type `@`. It exists entirely for the benefit of the client, allowing responses to be matched to requests. +- `command` is one of: + - `%peek` + - `%peel` + - `%ovum` + - `%fyrd` + - `%urth` + +These commands cover all possible cases for the following 2x2 matrix: +| | poke | scry | +|----------|------------------|---------| +| **vere** | `%urth` | `%peel` | +| **arvo** | `%ovum`, `%fyrd` | `%peek` | + +For a valid command, the output from `conn.c` is a newt-encoded jammed noun with type `[request-id output]`, where: +- `request-id` matches the input `request-id` +- `output` depends on the `command` + +For an invalid command, the output from `conn.c` is a newt-encoded jammed noun with type `[0 %bail error-code error-string]`. However, `conn.c` makes no guarantees that: +- It will be able to sufficiently recover from the error to guarantee this output +- It will produce a meaningful error code and message + +#### `%ovum` + +The argument to an `%ovum` command is a raw kernel move which is injected directly into the Arvo event loop. This is a very powerful - and potentially dangerous - tool. For example, if a ship somehow got into a state where Clay was no longer working properly (meaning new files could not be compiled to fix the state of the kernel), the source code for a new, working Clay could be directly injected into the ship using an `%ovum`. + +The output of an `%ovum` command is: +- `[%news %done]` if the move completed successfully +- `[%news %drop]` if the move was dropped +- `[%bail goof]` if an error occurred + +#### `%fyrd` + +`%fyrd` is a direct shortcut to the Khan vane. The Khan vane coordinates and manages threads, and is described in further detail below. The arguments to a `%fyrd` command are (in order): +1. The name of the desk in which the thread lives (e.g. `%base`) or `beak` for the thread (e.g. `[%zod %base %10]`) +2. The name of the thread (e.g. `%hi`) +3. Mark to which the output should be cast (e.g. `%tape`) +4. Mark for how to interpret the input argument to thread (e.g. `%ship`) +5. Input argument to thread (e.g. `~zod`) + +The output of a `%fyrd` command is `[%avow (each page goof)]`, the value of `each` depending on whether the thread succeeded or not. + +#### `%urth` + +The argument to the `%urth` command is a subcommand for the action to perform. Currently, the only valid commands are `%pack` and `%meld`. + +`%urth` will return `%&` if given a valid command as input, otherwise it will return `[0 %bail 0xfffffff9 %urth-bad]`. No other output is emitted. + +#### `%peek` + +The `%peek` command is used to perform a namespace read request (a.k.a. scry) using Arvo's external peek interface ([arm +22 in arvo.hoon](https://github.com/urbit/urbit/blob/develop/pkg/arvo/sys/arvo.hoon#L1774)). The argument to `%peek` is the `nom` input to `+peek` in `arvo.hoon` (`lyc` is auto-filled as `[~ ~]`, i.e. "request from self"). That is to say that the argument to `%peek` must have type: +``` +$+ each path +$% [%once vis=view syd=desk tyl=spur] + [%beam vis=view bem=beam] +== +``` +Practically speaking, this means that the input will look like one of these three examples: +``` +[%& p=path] +[%| p=[%once vis=view syd=desk tyl=spur]] +[%| p=[%beam vis=view bem=beam]] +``` +Where: +- `path` is a `[view beam]`, with the `view` passed in as a `coin` +- `view` is the vane code for the scry, as well as an optional care, possibly appended to the vane (e.g. `%j`, `%gx`, etc.) +- `beam` is a `[beak spur]` +- `desk` is used to auto-generate a `beak`: `[our desk now]` +- `spur` is the scry endpoint for the agent or vane + +The output of a `%peek` command is `[%peek (unit (unit scry-output))]`, where `~` means that the scry endpoint is invalid, and `[~ ~]` means that the scry resolved to nothing. + +See [here](../app-school/10-scry.md) for more information on scrying. + +#### `%peel` + +`%peel` attempts to emulate a scry-like namespace, like the one used by Arvo and accessed by `%peek`. The argument to `%peel` should be a path. Valid paths result in a non-null `unit` containing the result of the scry. Invalid paths result in null (i.e. `~`). The valid paths and the data they return are: +``` +/help (unit (list path)) Supported %peel paths +/live (unit ~) Pier health check; succeeds if pier is running +/khan (unit ~) Khan health check; succeeds if Khan vane is running +/info (unit mass) Pier info as a mass +/v (unit @t) Returns version of the Vere binary as a cord +/who (unit @) Returns the Azimuth identity of the ship as an atom +``` + +Note that the pier info above is returned as a `mass` report, i.e. type `(pair cord (each * (list mass)))`. This is not the same as the `|mass` memory report. `/mass` is meant to be a valid `%peel` path which returns the `|mass` memory report, but it is currently unimplemented. + +### Khan {#khan} + +The Khan vane is a command / response interface for running threads. Khan was introduced to make running threads a kernel-level feature, as simple as poking an agent or setting a timer. Threads allow users to run arbitrarily complex code on their ships in the same way that bash allows them to do so on Linux. + +Khan's API exposes three thread requests: +- `%fard`: Kernel thread requests +- `%fyrd`: External thread requests +- `%lard`: "Inline" thread requests + +"Kernel" above doesn't mean that this interface is hidden or protected from userspace agents; thread requests by userspace agents should almost certainly use `%fard`. It just means that `%fard` thread requests are expected to originate from within the kernel or a userspace agent. Specifically: +- `%fard` commands take the thread input argument as a `cage` +- The data in the `vase` of the `cage` is a `unit` (as expected by `%spider`) +- The output is also a `cage` (see below for more information) + +`%fyrd` thread requests, on the other hand, perform some extra services that are useful when running threads from the dojo or via `conn.c`. Specifically: +- `%fyrd` commands take the thread input as a raw `noun` +- Khan performs mark conversion on both the input and output for `%fyrd` requests +- Khan automatically lifts the converted input into a `unit` + +"Inline" threads are a particularly specialized Khan thread request where the thread has already been compiled and is passed as a part of the input. + +Khan requests expect the following input: +- `%fard`: `p=[=bear name=term args=cage]` +- `%fyrd`: `p=[=bear name=term args=(pair mark page)]` +- `%lard`: `[=bear =shed]` + +Where: +- `bear` is a `desk` or a `beak`; if `bear` is a `desk`, then the it will be converted to a `beak` using `our` and `now` as default values +- `shed` is a pre-computed chain of strands that produce a `vase` (the canonical thread) + +All three produce the same output if an error occured while running the thread: `[vow %| goof]`, where `vow` is `%arow` for `%fard` and `%lard`, and `%avow` for `%fyrd`. + +If the thread succeeded, `%fard` and `%lard` produce `[%arow %& %noun vase]`. `%fyrd` produces `[%avow %& mark noun]`, where `mark` is the output mark and `noun` is the output as a raw noun after mark conversion. + +See [here](../../urbit-os/base/threads/basics/fundamentals.md) for more information about threads. + +### `urbit eval` {#urbit-eval} + +`eval` is a utility command in the Urbit binary. Originally, it was introduced to evaluate snippets of Hoon code using the binary to emulate Arvo from the associated ivory pill. This allowed it to run any Hoon code fragments that used kernel and STL functions (e.g. anything in `hoon.hoon`, `arvo.hoon`, `lull.hoon`, and `zuse.hoon`). Notably, this did not (and does not) evaluate any Hoon fragments that require pier state (e.g. scries, `our`, `now`, etc.). + +Example: +``` +$ echo '(add 2 2)' | ./urbit eval +loom: mapped 2048MB +lite: arvo formula 2a2274c9 +lite: core 4bb376f0 +lite: final state 4bb376f0 +eval (run): +4 +``` +The result (i.e. `4`) is printed to `stdout`. If the command had failed to compile, the stack trace would have been printed to `stdout` instead. All other messages are printed to `stderr`. + +#### Options + +`eval` was extended with several options that make it useful for processing Hoon nouns as input to or output from `conn.c`: +- `-j`, `--jam`: output result as a jammed noun +- `-c`, `--cue`: read input as a jammed noun +- `-n`, `--newt`: write output / read input as a newt-encoded jammed noun, when paired with `-j` or `-c` respectively +- `-k`: treat the input as the jammed noun input of a `%fyrd` request to `conn.c`; if the result is a `goof`, pretty-print it to `stderr` instead of returning it + +### `-eval` and `-khan-eval` {#eval-and--khan-eval} + +Two threads that evaluate arbitrary Hoon were added to the suite of threads included with Arvo: [`ted/eval.hoon`](https://github.com/urbit/urbit/blob/develop/pkg/arvo/ted/eval.hoon) and [`ted/khan-eval.hoon`](https://github.com/urbit/urbit/blob/develop/pkg/arvo/ted/khan-eval.hoon). + +Both threads take the same input: Hoon code as a `cord` and an optional `(list path)`. The optional `(list path)` is a list of Clay file dependencies which need to be included for the Hoon to be evaluated (i.e. if the Hoon code includes libraries or types defined outside of the kernel). Each `path` can be a `beam` (i.e. `[beak spur]`) or just a `spur`, in which case the default `beak` (i.e. `[our %base now]`) will be prepended. + +`ted/eval.hoon` expects the input to be a Hoon expression. It's very similar to `urbit eval`, except that it has access to ship state: `now`, `our`, vane & agent state, etc. + +`ted/khan-eval.hoon` expects the input to be a thread. It attempts to compile the thread using the dependencies (if any) and then sends it to Khan as a `%lard` thread request. + +Both threads return regular thread output, i.e. a `vase`. + +Examples: +- `-eval '(add 2 2)'` +- `-eval '(my-add 2 2)' [/lib/my-add/hoon ~]` + - Where `my-add` is defined in `lib/my-add.hoon` in `%base` +- `-eval '(my-add 2 2)' [/(scot %p our)/my-desk/(scot %da now)/lib/my-add/hoon ~]` + - Where `my-add` is defined in `lib/my-add.hoon` in `%my-desk` +- `-khan-eval '=/ m (strand ,vase) ;< ~ bind:m (poke [~zod %hood] %helm-hi !>(\'\')) (pure:m !>(\'success\'))'` + +### click {#click} + +[click](https://github.com/urbit/tools/tree/master/pkg/click) is a `bash` thin client which auto-formats `-eval` and `-khan-eval` thread calls via `%fyrd` requests to `conn.c` and coordinates chaining together the appropriate commands to execute those requests on a running ship. + +Using click, a call like: +``` +echo $'[0 %fyrd %base %khan-eval %noun %ted-eval \'=/ m (strand ,vase) ;< ~ bind:m (poke [~zod %hood] %helm-hi !>(\\\'\\\')) (pure:m !>(\\\'success\\\'))\']' | +/path/to/urbit eval -jn | +nc -U -W 1 /path/to/zod/.urb/conn.sock | +/path/to/urbit eval -cn +``` +instead looks like: +``` +/path/to/click -k /path/to/zod $'=/ m (strand ,vase) ;< ~ bind:m (poke [~zod %hood] %helm-hi !>(\\\'\\\')) (pure:m !>(\\\'success\\\'))' +``` +or even more conveniently: +``` +/path/to/click -k -i threads/poke.hoon /path/to/zod +``` + +``` +Usage: + click [options] <path-to-pier> <hoon> [<dependencies> ...] + click [options] -i <path-to-file> <path-to-pier> [<dependencies> ...] + click [-o|-p] -e -i <path-to-file> <path-to-pier> + + Thin client for interacting with running Urbit ship via conn.c + + options: + -e Execute jammed Hoon + -h Show usage info + -i <path-to-file> Read input from file + -j Jam only + -k Execute command using "khan-eval" thread + -o <path-to-file> Output to file + -p Filter failure stack traces from result and pretty-print them to stderr + -x Jam to hex +``` + +## Using these tools {#using-these-tools} + +Below are examples of how to execute common commands on a running ship from Earth. + +### `|mass` {#mass} + +Blocked by issues; not currently doable in a way that returns the results as data. + +### `|pack` {#pack} + +``` +echo "[0 %urth %pack]" | +/path/to/urbit eval -jn | +nc -U -W 1 /path/to/pier/zod/.urb/conn.sock | +/path/to/urbit eval -cn +``` +``` +echo "[0 %ovum %d /test %pack ~]" | +/path/to/urbit eval -jn | +nc -U -W 1 /path/to/pier/zod/.urb/conn.sock | +/path/to/urbit eval -cn +``` +``` +/path/to/click -kp /path/to/pier/zod \ +$'=/ m (strand ,vase) ;< ~ bind:m (flog [%pack ~]) (pure:m !>(\\\'success\\\'))' +``` + +### `|meld` {#meld} + +``` +echo "[0 %urth %meld]" | +/path/to/urbit eval -jn | +nc -U -W 1 /path/to/pier/zod/.urb/conn.sock | +/path/to/urbit eval -cn +``` +``` +echo "[0 %ovum %d /test %meld ~]" | +/path/to/urbit eval -jn | +nc -U -W 1 /path/to/pier/zod/.urb/conn.sock | +/path/to/urbit eval -cn +``` +``` +/path/to/click -kp /path/to/pier/zod \ +$'=/ m (strand ,vase) ;< ~ bind:m (flog [%meld ~]) (pure:m !>(\\\'success\\\'))' +``` + +### `|ota` {#ota} + +#### `|ota ~bus` + +``` +echo "[0 %ovum [%g /test [%deal [~zod ~zod] %hood %raw-poke %kiln-install %base ~bus %kids]]]" | +/path/to/urbit eval -jn | +nc -U -W 1 /path/to/pier/zod/.urb/conn.sock | +/path/to/urbit eval -cn +``` +``` +/path/to/click -kp /path/to/pier/zod \ +$'=/ m (strand ,vase) ;< our=@p bind:m get-our ;< ~ bind:m (poke [our %hood] %kiln-install !>([%base ~bus %kids])) (pure:m !>(\\\'success\\\'))' +``` + +#### `|ota %disable` + +``` +echo "[0 %ovum [%g /test [%deal [~zod ~zod] %hood %raw-poke %kiln-install %base ~zod %base]]]" | +/path/to/urbit eval -jn | +nc -U -W 1 /path/to/pier/zod/.urb/conn.sock | +/path/to/urbit eval -cn +``` +``` +/path/to/click -kp /path/to/pier/zod \ +$'=/ m (strand ,vase) ;< our=@p bind:m get-our ;< ~ bind:m (poke [our %hood] %kiln-install !>([%base our %base])) (pure:m !>(\\\'success\\\'))' +``` + +#### `|ota ~bus %desk` + +``` +echo "[0 %ovum [%g /test [%deal [~zod ~zod] %hood %raw-poke %kiln-install %base ~zod %desk]]]" | +/path/to/urbit eval -jn | +nc -U -W 1 /path/to/pier/zod/.urb/conn.sock | +/path/to/urbit eval -cn +``` +``` +/path/to/click -kp /path/to/pier/zod \ +$'=/ m (strand ,vase) ;< our=@p bind:m get-our ;< ~ bind:m (poke [our %hood] %kiln-install !>([%base ~bus %desk])) (pure:m !>(\\\'success\\\'))' +``` + +### `|install` {#install} + +#### `|install ~sampel-palnet %desk` + +``` +echo "[0 %ovum [%g /test [%deal [~zod ~zod] %hood %raw-poke %kiln-install %desk ~sampel-palnet %desk]]]" | +/path/to/urbit eval -jn | +nc -U -W 1 /path/to/pier/zod/.urb/conn.sock | +/path/to/urbit eval -cn +``` +``` +/path/to/click -kp /path/to/pier/zod \ +$'=/ m (strand ,vase) ;< our=@p bind:m get-our ;< ~ bind:m (poke [our %hood] %kiln-install !>([%desk ~sampel-palnet %desk])) (pure:m !>(\\\'success\\\'))' +``` + +#### `|install ~sampel-palnet %desk, =local %my-desk` + +``` +echo "[0 %ovum [%g /test [%deal [~zod ~zod] %hood %raw-poke %kiln-install %my-desk ~sampel-palnet %desk]]]" | +/path/to/urbit eval -jn | +nc -U -W 1 /path/to/pier/zod/.urb/conn.sock | +/path/to/urbit eval -cn +``` +``` +/path/to/click -kp /path/to/pier/zod \ +$'=/ m (strand ,vase) ;< our=@p bind:m get-our ;< ~ bind:m (poke [our %hood] %kiln-install !>([%my-desk ~sampel-palnet %desk])) (pure:m !>(\\\'success\\\'))' +``` + +### `+code` {#code} + +``` +/path/to/click -kp /path/to/pier/zod \ +$'=/ m (strand ,vase) ;< our=@p bind:m get-our ;< code=@p bind:m (scry @p /j/code/(scot %p our)) (pure:m !>((crip (slag 1 (scow %p code)))))' +``` + +### `+vats` {#vats} + +#### `+vats %base %kids` +``` +/path/to/click -kp /path/to/pier/zod \ +$'=/ m (strand ,vase) ;< our=@p bind:m get-our ;< now=@da bind:m get-time (pure:m !>((crip ~(ram re [%rose [~ ~ ~] (report-vats our now [%base %kids ~] %$ |)]))))' \ +'/sur/hood/hoon' +``` + +#### `+vats, =filt %exists` +``` +/path/to/click -kp /path/to/pier/zod \ +$'=/ m (strand ,vase) ;< our=@p bind:m get-our ;< now=@da bind:m get-time (pure:m !>((crip ~(ram re [%rose [~ ~ ~] (report-vats our now ~ %exists |)]))))' \ +'/sur/hood/hoon' +``` + +#### `+vats %base, =verb &, =filt %running` +``` +/path/to/click -kp /path/to/pier/zod \ +$'=/ m (strand ,vase) ;< our=@p bind:m get-our ;< now=@da bind:m get-time (pure:m !>((crip ~(ram re [%rose [~ ~ ~] (report-vats our now [%base ~] %exists &)]))))' \ +'/sur/hood/hoon' +``` +### Additional Notes {#additional-notes} + +#### Alternative click calls + +Any example above that uses click has two additional options that have been omitted for brevity, since the actual code for the call would be identical in each example: + +1. Custom `-thread` in `%desk`: +``` +echo "[0 %fyrd %desk %thread %noun %noun ~]" | +/path/to/urbit eval -jn | +nc -U -W 1 /path/to/pier/zod/.urb/conn.sock | +/path/to/urbit eval -cn +``` + +2. Pass inline thread to click from file: +``` +/path/to/bin/click -k -i path/to/thread.hoon /path/to/pier/zod +``` + +#### Undocked ships + +click assumes that the ship at the given pier is docked (i.e. that `/path/to/pier/.run` exists). If for whatever reason the running ship is undocked, it's still possible to work around this assumption using the click-format helper script. For example, the call for `+vats` becomes: +``` +/path/to/click-format -k \ +$'=/ m (strand ,vase) ;< our=@p bind:m get-our ;< now=@da bind:m get-time (pure:m !>((crip ~(ram re [%rose [~ ~ ~] (report-vats our now)]))))' \ +'/sur/hood/hoon' | +/path/to/urbit eval -jn | +nc -U -W 1 /path/to/pier/zod/.urb/conn.sock | +/path/to/urbit eval -ckn +``` + +## Issues and Future Work {#issues-and-future-work} + +Currently, there are a number of minor issues and one major issue impacting interactions between Earth and Mars. + +The minor issues are: +- `conn.c`'s simulated namespace for `%peel` + - Unprincipled namespace simulation for no reason other than consistency with Arvo scry +- `/mass` path for `conn.c` `%peel` not implemented +- No `mass` mark in Arvo, so attempting to scry for `|mass` with `%peek` crashes the ship + +The major issue is the lack of "thick" clients which are able to consume the newt-encoded jammed nouns emitted by `conn.c` as input. Though not officially codified yet, it makes sense for newt-encoded jammed nouns to be the [narrow waist](https://www.oilshell.org/blog/2022/02/diagrams.html) of Urbit, and [recent design decisions appear to be heading in this direction](https://github.com/urbit/urbit/pull/6396). Unfortunately, the narrow waist of `bash` is text, and it's not always easy or useful to convert nouns to text (particularly stack traces). + +There exist already two external noun libraries, in [Rust](https://github.com/urbit/noun) and Haskell (link to Haskell lib coming soon). Adding more, while not trivial, is not difficult. The proliferation of noun representation libraries in other languages would open many doors with regards to the support, hosting, and application opportunities available (the ever-fabled "Quake over Urbit"). diff --git a/docs/runtime/cryptography.md b/docs/runtime/cryptography.md new file mode 100644 index 0000000..3f49524 --- /dev/null +++ b/docs/runtime/cryptography.md @@ -0,0 +1,34 @@ +--- +description: "Cryptographic primitives used by Arvo. Ed25519 signatures, AES-SIV encryption. All jetted for performance." +layout: + title: + visible: true + description: + visible: false + tableOfContents: + visible: true + outline: + visible: true + pagination: + visible: true +--- + +# Cryptography + +All cryptographic primitives utilized by Arvo are [jetted](../../nock/jetting.md). This is done for performance-related reasons in other parts of the system, but for cryptography this is also extremely important because it allows us to utilize standard reference implementations for the primitives written in C. + +All jets related to encryption may be found in `pkg/urbit/jets/e/`. + +In this section we review what specific implementations are utilized. At the moment, only libraries directly related to Ames are documented here, though we note that there are jets for other cryptographic functions such as the [SHA Hash Family](../../hoon/stdlib/3d.md) as well. + +## Ed25519 <a href="#ed" id="ed"></a> + +Urbit implements [Ed25519](http://ed25519.cr.yp.to/) based on the SUPERCOP "ref10" implementation. Additionally there is key exchanging and scalar addition included to further aid building a PKI using Ed25519. All code is licensed under the permissive zlib license. + +All code is pure ANSI C without any dependencies, except for the random seed generation which uses standard OS cryptography APIs (CryptGenRandom on Windows, `/dev/urandom` on nix). + +## AES-SIV <a href="#aes" id="aes"></a> + +The library we utilize for AES-SIV is an [RFC5297](https://tools.ietf.org/html/rfc5297)-compliant C implementation of AES-SIV written by Daniel Franke on behalf of [Akamai Technologies](https://www.akamai.com). It is published under the [Apache License (v2.0)](https://www.apache.org/licenses/LICENSE-2.0). It uses OpenSSL for the underlying [AES](https://en.wikipedia.org/wiki/Advanced_Encryption_Standard) and [CMAC](https://en.wikipedia.org/wiki/One-key_MAC) implementations and follows a similar interface style. + +While the jets are found in `pkg/urbit/jets/e`, the statically-linked package is found at `pkg/urcrypt/`. diff --git a/docs/runtime/jetting.md b/docs/runtime/jetting.md new file mode 100644 index 0000000..163a856 --- /dev/null +++ b/docs/runtime/jetting.md @@ -0,0 +1,1414 @@ +--- +description: "Complete tutorial on writing jets in Urbit. C implementations of Hoon gates for performance. Developer environment setup." +layout: + title: + visible: true + description: + visible: false + tableOfContents: + visible: true + outline: + visible: true + pagination: + visible: true +--- + +# How to Write a Jet + +## Writing Jets + +## Writing Jets + +Many operations are inefficient when implemented in Nock, and it is efficacious to treat Nock as a standard of behavior rather than the implementation. This tutorial aims to teach you how to read existing jet code; produce a jet matching a Hoon gate with a single argument; and produce a more complex jet involving multiple values and floating-point arithmetic. It will then discuss jetting more generically. + +Since jetting code requires modifying the binary runtime, we will work some in Hoon but much more in C. While you can build whatever you like as experimental or personal work, if you intend to submit your jetted code back to the main Urbit developer community then you should coordinate with the Urbit Foundation. + +**Additional Resources** + +* [\~timluc-miptev, “Jets in the Urbit Runtime”](https://github.com/timlucmiptev/docs-runtime/blob/master/jets1_current.md) (recommended to start here first) +* [“`u3`: Land of Nouns”](nouns.md) (recommended as supplement to this document) +* [“API overview by prefix”](api.md) (recommended as supplement after this document) + +### Developer Environment <a href="#developer-environment" id="developer-environment"></a> + +#### Basic Setup (_Mise en place_) <a href="#basic-setup-mise-en-place" id="basic-setup-mise-en-place"></a> + +All of Urbit's source code is available in the main Github repo. We will presumptively work in a folder called `~/jetting` which contains a copy of the full Urbit repo. Create a new branch within the repo named `example-jet`. + +```sh +$ cd +$ mkdir jetting +$ cd jetting +$ git clone https://github.com/urbit/vere.git +``` + +The Urbit runtime build stack is based on [Bazel](https://bazel.build/). This suffices unless you intend to include some other third-party library, which must be [linked statically](https://en.wikipedia.org/wiki/Static_library) due to how the Urbit binary is distributed. There is also a bias towards software implementations of processes which hew to a specified reference implementation, such as [SoftFloat](http://www.jhauser.us/arithmetic/SoftFloat.html) rather than hardware floating-point for [IEEE 754](https://ieeexplore.ieee.org/document/30711) floating-point mathematics. + +Since jet development requires booting ships many times as one iterates, a pill can make the Urbit-side development process much faster, and is actually required for kernel jets. + +Test your build process to produce a local executable binary of Vere: + +```sh +$ cd ~/jetting/vere +$ bazel build :urbit +``` + +This invokes Nix to build the Urbit binary. Take note of where that binary is located (typically in `/tmp` on your main file system) and create a new fakezod using a downloaded pill. (You should check the current binary version and use the appropriate pill instead of `v1.9`.) + +```sh +$ cd ~/jetting +$ wget https://bootstrap.urbit.org/urbit-v1.9.pill +$ <Nix build path>/bin/urbit -B urbit-v1.9.pill -F zod +``` + +> As you work through this guide, version numbers will likely be older than the contemporary release version due to the pace of release. We will update this guide if a version breaks the instructions. + +We will primarily work in the development ship (a fakeship or moon) on the files just mentioned, and in the `pkg/urbit` directory of the main Urbit repository, so we need a development process that allows us to quickly access each of these, move them into the appropriate location, and build necessary components. The basic development cycle will look like this: + +1. Compose correct Hoon code. +2. Hint the Hoon code. +3. Register the jets in the Vere C code. +4. Compose the jets. +5. Compile and troubleshoot. +6. Repeat as necessary. + +* [“Contributor Guidelines”](https://github.com/urbit/vere/blob/master/CONTRIBUTING.md) + +#### Conveniences <a href="#conveniences" id="conveniences"></a> + +You should consider using a terminal utility like `tmux` or `screen` which allows you to work in several locations on your file system simultaneously: one for file system operations (copying files in and out of the `home` directory), one for running the development ship, and one for editing the files, or an IDE or text editor if preferred. + + + +Inside of your development ship, sync `%clay` to Unix, + +```hoon +> |mount % +``` + +Then copy the entire `%base` desk out so that you can work with it and copy it back in as necessary. + +```sh +$ cp -r zod/base . +``` + +In addition, making a backup copy of a fakeship will make it _much_ faster to reset if memory gets corrupted. (This is regrettably common when developing jets.) + +```sh +$ cp -r zod zod-backup +``` + +To reset, simply delete the `.urb/` directory and replace it: + +```sh +$ rm -rf zod/.urb +$ cp -r zod-backup/.urb zod +``` + +### Jet Walkthrough: `+add` <a href="#jet-walkthrough-add" id="jet-walkthrough-add"></a> + +Given a Hoon gate, how can a developer produce a matching C jet? Let us illustrate the process using a simple `|%` core. We assume the reader has achieved facility with both Hoon code and C code. This tutorial aims to communicate the practical process of producing a jet, and many [`u3` noun concepts](nouns.md) are only briefly discussed or alluded to. + +To this end, we begin by examining the Hoon `+add` gate, which accepts two values in its sample. + +The Hoon code for `+add` decrements one of these values and adds one to the other for each decrement until zero is reached. This is because all atoms in Hoon are unsigned integers and Nock has no simple addition operation. The source code for `+add` is located in `hoon.hoon`: + +```hoon +|% ++| %math +++ add + ~/ %add + :: unsigned addition + :: + :: a: augend + :: b: addend + |= [a=@ b=@] + :: sum + ^- @ + ?: =(0 a) b + $(a (dec a), b +(b)) +``` + +or in a more compact form (omitting the parent core and chapter label) + +```hoon +++ add + ~/ %add + |= [a=@ b=@] ^- @ + ?: =(0 a) b + $(a (dec a), b +(b)) +``` + +The jet hint `%add` allows Hoon to hint to the runtime that a jet _may_ exist. By convention, the jet hint name matches the gate label. Jets must be registered elsewhere in the runtime source code for the Vere binary to know where to connect the hint; we elide that discussion until we take a look at jet implementation below. We will expand on the jet registration runes [`~/` sigfas](../../hoon/rune/sig.md#sigfas) and [`~%` sigcen](../../hoon/rune/sig.md#sigcen) later. + +The following C code implements `+add` as a significantly faster operation including handling of >31-bit atoms. It may be found in `urbit/pkg/noun/jets/a/add.c`: + +```c +u3_noun +u3qa_add(u3_atom a, + u3_atom b) +{ + if ( _(u3a_is_cat(a)) && _(u3a_is_cat(b)) ) { + c3_w c = a + b; + + return u3i_words(1, &c); + } + else if ( 0 == a ) { + return u3k(b); + } + else { + mpz_t a_mp, b_mp; + + u3r_mp(a_mp, a); + u3r_mp(b_mp, b); + + mpz_add(a_mp, a_mp, b_mp); + mpz_clear(b_mp); + + return u3i_mp(a_mp); + } +} +u3_noun +u3wa_add(u3_noun cor) +{ + u3_noun a, b; + + if ( (c3n == u3r_mean(cor, u3x_sam_2, &a, u3x_sam_3, &b, 0)) || + (c3n == u3ud(a)) || + (c3n == u3ud(b) && a != 0) ) + { + return u3m_bail(c3__exit); + } else { + return u3qa_add(a, b); + } +} +``` + +The main entry point for a call into the function is `u3wa_add`. `u3w` functions are translator functions which accept the entire sample as a `u3_noun` (or Nock noun). `u3q` functions take custom combinations of nouns and atoms and generally correspond to unpacked samples. + +`u3wa_add` defines two nouns `a` and `b` which will hold the unpacked arguments from the sample. The sample elements are copied out by reference into `a` from sample address 2 (`u3x_sam_2`) and into `b` from sample address 3 (`u3x_sam_3`). A couple of consistency checks are made; if these fail, `u3m_bail` yields a runtime error. Else `u3qa_add` is invoked on the C-style arguments. + +`u3qa_add` has the task of adding two Urbit atoms. There is a catch, however! An atom may be a _direct_ atom (meaning the value as an unsigned integer fits into 31 bits) or an _indirect_ atom (anything higher than that). Direct atoms, called `cat`s, are indicated by the first bit being zero. + +``` +0ZZZ.ZZZZ.ZZZZ.ZZZZ.ZZZZ.ZZZZ.ZZZZ.ZZZZ +``` + +Any atom value which may be represented as $2^{31}-1 = 2.147.483.647$ or less is a direct atom. The `Z` bits simply contain the value. + +```hoon +> `@ub`2.147.483.647 +0b111.1111.1111.1111.1111.1111.1111.1111 +> `@ux`2.147.483.647 +0x7fff.ffff +``` + +However, any atom with a value _greater_ than this (including many cords, floating-point values, etc.) is an indirect atom (or `dog`) marked with a prefixed bit of one. + +``` +11YX.XXXX.XXXX.XXXX.XXXX.XXXX.XXXX.XXXX +``` + +where bit 31 indicates indirectness, bit 30 is always set, and bit 29 (`Y`) indicates if the value is an atom or a cell. An indirect atom contains a pointer into the loom from bits 0–28 (bits `X`). + +What does this mean for `u3qa_add`? It means that if the atoms are both direct atoms (`cat`s), the addition is straightforward and simply carried out in C. When converted back into an atom, a helper function `u3i_words` deals with the possibility of overflow and the concomitant transformation to a `dog`. + +```c +c3_w c = a + b; # c3_w is a 32-bit C word. + +return u3i_words(1, &c); +``` + +There's a second trivial case to handle one of the values being zero. (It is unclear to the author of this tutorial why both cases as-zero are not being handled; the speed change may be too trivial to matter.) + +Finally, the general case of adding the values at two loom addresses is dealt with. This requires general pointer-based arithmetic with GMP multi-precision integer operations. + +```c +mpz_t a_mp, b_mp; # mpz_t is a GMP multi-precision integer type + +u3r_mp(a_mp, a); # read the atoms out of the loom into the MP type +u3r_mp(b_mp, b); + +mpz_add(a_mp, a_mp, b_mp); # carry out MP-correct addition +mpz_clear(b_mp); # clear the now-unnecessary `b` value from memory + +return u3i_mp(a_mp); # write the value back into the loom and return it +``` + +The procedure to solve the problem in the C jet does not need to follow the same algorithm as the Hoon code. (In fact, it is preferred to use native C implementations where possible to avoid memory leaks in the `u3` noun system.) + +In general, jet code feels a bit heavy and formal. Jet code may call other jet code, however, so much as with Hoon layers of complexity can be appropriately encapsulated. Once you are used to the conventions of the u3 library, you will be in a good position to produce working and secure jet code. + +### Jet Composition: Integer `+factorial` <a href="#jet-composition-integer-factorial" id="jet-composition-integer-factorial"></a> + +Similar to how we encountered recursion way back in [Hoon School](../hoon-school/F-cores.md) to talk about gate mechanics, let us implement a C jet of the `+factorial` example code. We will call this library `trig` in a gesture to some subsequent functions you should implement as an exercise. Create a file `lib/trig.hoon` with the following contents: + +**`/lib/trig.hoon`** + +```hoon +~% %trig ..part ~ +|% +:: Factorial, $x!$ +:: +++ factorial + ~/ %factorial + |= x=@ud ^- @ud + =/ t=@ud 1 + |- ^- @rs + ?: =(x 0) t + ?: =(x 1) t + $(x (sub x 1), t (mul t x)) +-- +``` + +We will create a generator `gen/trig.hoon` which will help us quickly check the library's behavior. + +**`/gen/trig.hoon`** + +```hoon +/+ *trig +!: +:- %say +|= [[* eny=@uv *] [x=@rs n=@rs ~] ~] +:: +~& (factorial n) +~& (absolute x) +~& (exp x) +~& (pow-n x n) +[%verb ~] +``` + +We will further define a few unit tests as checks on arm behavior in `tests/lib/trig.hoon`: + +**`/tests/lib/trig.hoon`** + +```hoon +/+ *test, *trig +:: +:::: + :: +|% +++ test-factorial ^- tang + ;: weld + %+ expect-eq + !> 1 + !> (factorial 0) + %+ expect-eq + !> 1 + !> (factorial 1) + %+ expect-eq + !> 120 + !> (factorial 5) + %+ expect-eq + !> 720 + !> (factorial 6) + == +-- +``` + +(Here we are eliding a key point about contemporary Urbit development: `/lib` code is considered userspace and thus ineligible for jet inclusion in the runtime. This is a matter of development policy rather than technical capability. We will zoom out to consider how to modify kernel code later.) + +Save the foregoing library code in `base/lib` and the generator code in `base/gen`; also, don't forget the unit tests! Whenever you work in your preferred editor, you should work on the `base` copies, then move them back into the fakezod and synchronize before execution. + +```sh +$ cp -r base zod +``` + +```sh +> |commit %base + +> -test %/tests/lib/trig ~ +built /tests/lib/trig/hoon +OK /lib/trig/test-factorial +``` + +#### Jet construction <a href="#jet-construction" id="jet-construction"></a> + +Now that you have a developer cycle in place, let's examine what's necessary to produce a jet. A jet is a C function which replicates the behavior of a Hoon (Nock) gate. Jets have to be able to manipulate Urbit quantities within the binary, which requires both the proper affordances within the Hoon code (the interpreter hints) and support for manipulating Urbit nouns (atoms and cells) within C. + +Make a development branch for the jet changes first: + +```sh +$ cd ~/jetting/vere +$ git branch example-jet +$ git checkout example-jet +``` + +Jet hints must provide a trail of symbols for the interpreter to know how to match the Hoon arms to the corresponding C code. Think of these as breadcrumbs. Here we have a two-deep scenario. Specifically, we mark the outermost arm with `~%` and an explicit reference to the Arvo core (the parent of `part`). We mark the inner arms with `~/` because their parent symbol can be determined from the context. The `@tas` token will tell the runtime (Vere) which C code matches the arm. All symbols in the nesting hierarchy must be included. + +```hoon +~% %trig ..part ~ +|% +++ factorial + ~/ %factorial + |= x=@ud ^- @ud + ... +-- +``` + +We also need to add appropriate handles for the C code. This consists of several steps: + +1. Register the jet symbols and function names in `tree.c`. +2. Declare function prototypes in headers `q.h` and `w.h`. +3. Produce functions for compilation and linking in the `pkg/noun/jets/e` directory. + +The first two steps are fairly mechanical and straightforward. + +**Register the jet symbols and function names.** A jet registration may be carried out at in point in `tree.c`. The registration consists of marking the core in the Hoon source and including the name in the C source. + +```c +/* Jet registration of ++factorial arm under trig */ +static u3j_harm _140_hex__trig_factorial_a[] = {{".2", u3we_trig_factorial, c3y}, {}}; +/* Associated hash */ +static c3_c* _140_hex__trig_factorial_ha[] = { + "903dbafb8e59427eced0b35379ad617c2eb6083a235075e9cdd9dd80e732efa4", + 0 +}; + +static u3j_core _140_hex__trig_d[] = + { { "factorial", 7, _140_hex__trig_factorial_a, 0, _140_hex__trig_factorial_ha }, + {} + }; +static c3_c* _140_hex__trig_ha[] = { + "0bac9c3c43634bb86f6721bbcc444f69c83395f204ff69d3175f3821b1f679ba", + 0 +}; + +/* Core registration by token for trig */ +static u3j_core _140_hex_d[] = +{ /* ... pre-existing jet registrations ... */ + { "trig", 31, 0, _140_hex__trig_d, _140_hex__trig_ha }, + {} +}; +``` + +The numeric component of the title, `140`, indicates the Hoon Kelvin version. Library jets of this nature are registered as `hex` jets, meaning they live within the Arvo core. Other, more inner layers of `%zuse` and `%lull` utilize `pen` and other three-letter jet tokens. (These are loosely mnemonic from Greek antecedents.) The core is conventionally included here, then either a `d` suffix for the function association or a `ha` suffix for a jet hash. (Jet hashes are a way of “signing” code. They are not as of this writing actively used by the binary runtimes.) Arms are marked with `_a` and child cores with `_d`. The structs used are defined in `jets.h`. + +The particular flavor of C mandated by the Vere kernel is quite lapidary, particularly when shorthand functions (such as `u3z`) are employed. In this code, we see the following `u3` elements: + +1. `c3_c`, the platform C 8-bit `char` type +2. `c3y`, loobean true, `%.y` (similarly `c3n`, loobean false, `%.n`) +3. `u3j_core`, C representation of Hoon/Nock cores +4. `u3j_harm`, an actual C jet ("Hoon arm") + +The numbers `7` and `31` refer to relative core addresses. In most cases—unless you're building a particularly complicated jet or modifying `%zuse` or `%lull`—you can follow the pattern laid out here. `".2"` is a label for the axis in the core `[battery sample]`, so just the battery. The text labels for the `|%` core and the arm are included at their appropriate points. Finally, the jet function entry point `u3we_trig_factorial` is registered. + +For more information on `u3`, please check out the `u3` summary below or the official documentation at [`“u3`: Land of Nouns”](nouns.md). + +**Declare function prototypes in headers.** + +A `u3w` function is always the entry point for a jet. Every `u3w` function accepts a `u3noun` (a Hoon/Nock noun), validates it, and invokes the `u3q` function that implements the actual logic. The `u3q` function needs to accept the same number of atoms as the defining arm (since these same values will be extricated by the `u3w` function and passed to it). + +In this case, we have cited `u3we_trig_factorial` in `tree.c` and now must declare both it and `u3qe_trig_factorial`: + +In `w.h`: + +```c +u3_noun u3we_trig_factorial(u3_noun); +``` + +In `q.h`: + +```c +u3_noun u3qe_trig_factorial(u3_atom); +``` + +**Produce functions for compilation and linking.** + +Given these function prototype declarations, all that remains is the actual definition of the function. Both functions will live in their own file; we find it the best convention to associate all arms of a core in a single file. In this case, create a file `pkg/noun/jets/e/trig.c` and define all of your `trig` jets therein. (Here we show `+factorial` only.) + +As with `+add`, we have to worry about direct and indirect atoms when carrying out arithmetic operations, prompting the use of GMP `mpz` operations. + +```c +/* jets/e/trig.c +** +*/ +#include "all.h" +#include <stdio.h> // helpful for debugging, removable after development + +/* factorial of @ud integer +*/ + u3_noun + u3qe_trig_factorial(u3_atom a) /* @ud */ + { + fprintf(stderr, "u3qe_trig_factorial\n\r"); // DELETE THIS LINE LATER + if (( 0 == a ) || ( 1 == a )) { + return 1; + } + else if ( _(u3a_is_cat(a))) { + c3_d c = ((c3_d) a) * ((c3_d) (a-1)); + + return u3i_chubs(1, &c); + } + else { + mpz_t a_mp, b_mp; + + u3r_mp(a_mp, a); + mpz_sub(b_mp, a_mp, 1); + u3_atom b = u3qe_trigrs_factorial(u3i_mp(b_mp)); + u3r_mp(b_mp, b); + + mpz_mul(a_mp, a_mp, b_mp); + mpz_clear(b_mp); + + return u3i_mp(a_mp); + } + } + + u3_noun + u3we_trig_factorial(u3_noun cor) + { + fprintf(stderr, "u3we_trig_factorial\n\r"); // DELETE THIS LINE LATER + u3_noun a; + + if ( c3n == u3r_mean(cor, u3x_sam, &a, 0) || + c3n == u3ud(a) ) + { + return u3m_bail(c3__exit); + } + else { + return u3qe_trig_factorial(a); + } + } +``` + +This code merits ample discussion. Without focusing on the particular types used, read through the logic and look for the skeleton of a standard simple factorial algorithm. + +`u3r` operations are used to extract Urbit-compatible types as C values. + +`u3i` operations wrap C values back into Urbit-compatible types. + +### `u3` Overview <a href="#u3-overview" id="u3-overview"></a> + +Before proceeding to compose a more complicated floating-point jet, we should step back and examine the zoo of `u3` functions that jets use to formally structure atom access and manipulation. + +**`u3` Functions** + +`u3` defines a number of functions for extracting data from Urbit types into C types for ready manipulation, then wrapping those same values back up for Urbit to handle. These fall into several categories: + +| Prefix | Mnemonic | Source File | Example of Function | +| ------ | ------------------------------------------- | ------------- | ------------------- | +| `u3a_` | Allocation | `allocate.c` | `u3a_malloc` | +| `u3e_` | Event (persistence) | `events.c` | `u3e_foul` | +| `u3h_` | Hash table | `hashtable.c` | `u3h_put` | +| `u3i_` | Imprisonment (noun construction) | `imprison.c` | | +| `u3j_` | Jet control | `jets.c` | `u3j_boot` | +| `u3k_` | Jets (transfer semantics, C arguments) | `[a-g]/*.c` | | +| `u3l_` | Logging | `log.c` | `u3l_log` | +| `u3m_` | System management | `manage.c` | `u3m_bail` | +| `u3n_` | Nock computation | `nock.c` | `u3nc` | +| `u3q_` | Jets (retain semantics, C arguments) | `[a-g]/*.c` | | +| `u3r_` | Retrieval; returns on error | `retrieve.c` | `u3r_word` | +| `u3t_` | Profiling and tracing | `trace.c` | `u3t` | +| `u3v_` | Arvo operations | `vortex.c` | `u3v_reclaim` | +| `u3w_` | Jets (retain semantics, Nock core argument) | `[a-g]/*.c` | | +| `u3x_` | Retrieval; crashes on error | `xtract.c` | `u3x_cell` | +| `u3z_` | Memoize | `zave.c` | `u3z_uniq` | + +**`u3` Nouns** + +The `u3` system allows you to extract Urbit nouns as atoms or cells. Atoms may come in one of two forms: either they fit in 31 bits or less of a 32-bit unsigned integer, or they require more space. In the former case, you will use the singular functions such as `u3r_word` and `u3a_word` to extract and store information. If the atom is larger than this, however, you need to treat it a bit more like a C array, using the plural functions `u3r_words` and `u3a_words`. (For native sizes larger than 32 bits, such as double-precision floating-point numbers, replace `word` with `chub` in these.) Confusing a 31-bit-or-less integer with a 32+-bit integer means confusing a value with a pointer! Bad things will happen! + +An audit of the jet source code shows that the most commonly used `u3` functions include: + +1. `u3a_free` frees memory allocated on the loom (Vere memory model). +2. `u3a_malloc` allocates memory on the loom (Vere memory model).\ + (Never use regular C `malloc` in `u3`.) +3. `u3i_bytes` writes an array of bytes into an atom. +4. `u3i_chub` is the ≥32-bit equivalent of `u3i_word`. +5. `u3i_chubs` is the ≥32-bit equivalent of `u3i_words`. +6. `u3i_word` writes a single 31-bit or smaller atom. +7. `u3i_words` writes an array of 31-bit or smaller atoms. +8. `u3m_bail` produces an error and crashes the process. +9. `u3m_p` prints a message and a `u3` noun. +10. `u3r_at` retrieves data values stored at locations in the sample. +11. `u3r_byte` retrieves a byte from within an atom. +12. `u3r_bytes` retrieves multiple bytes from within an atom. +13. `u3r_cell` produces a cell `[a b]`. +14. `u3r_chub` is the >32-bit equivalent of `u3r_word`. +15. `u3r_chubs` is the >32-bit equivalent of `u3r_words`. +16. `u3r_mean` deconstructs a noun by axis address. +17. `u3r_met` reports the total size of an atom. +18. `u3r_trel` factors a noun into a three-element cell `[a b c]`. +19. `u3r_word` retrieves a value from an atom as a C `uint32_t`. +20. `u3r_words` is the multi-element (array) retriever like `u3r_word`. + +**`u3` Samples** + +Defining jets which have a different sample size requires querying the correct nodes of the sample as binary tree: + +``` +1. 1 argument → `u3x_sam` +2. 2 arguments → `u3x_sam_2`, `u3x_sam_3` +3. 3 arguments → `u3x_sam_2`, `u3x_sam_6`, `u3x_sam_7` +4. 4 arguments → `u3x_sam_2`, `u3x_sam_6`, `u3x_sam_14`, `u3x_sam_15` +5. 5 arguments → `u3x_sam_2`, `u3x_sam_6`, `u3x_sam_14`, `u3x_sam_30`, `u3x_sam_31` +6. 6 arguments → `u3x_sam_2`, `u3x_sam_6`, `u3x_sam_14`, `u3x_sam_30`, `u3x_sam_62`, `u3x_sam_63` +``` + +A more complex argument structure requires grabbing other entries; e.g., + +```hoon +|= [u=@lms [ia=@ud ib=@ud] [ja=@ud jb=@ud]] +``` + +requires + +```c +u3x_sam_2, u3x_sam_12, u3x_sam_13, u3x_sam_14, u3x_sam_15 +``` + +**Exercise: Review Jet Code** + +* We commend to the reader the exercise of selecting particular Hoon-language library functions provided with the system, such as [`+cut`](https://github.com/urbit/urbit/blob/ceed4b78d068d7cb70350b3cd04e7525df1c7e2d/pkg/arvo/sys/hoon.hoon#L854), locating the corresponding jet code in: + + * [`tree.c`](https://github.com/urbit/urbit/blob/cd400dfa69059e211dc88f4ce5d53479b9da7542/pkg/urbit/jets/tree.c#L1575) + * [`w.h`](https://github.com/urbit/urbit/blob/cd400dfa69059e211dc88f4ce5d53479b9da7542/pkg/urbit/include/jets/w.h#L53) + * [`q.h`](https://github.com/urbit/urbit/blob/cd400dfa69059e211dc88f4ce5d53479b9da7542/pkg/urbit/include/jets/q.h#L51) + * [`cut.c`](https://github.com/urbit/urbit/blob/cd400dfa69059e211dc88f4ce5d53479b9da7542/pkg/urbit/jets/c/cut.c) + + and learning in detail how particular operations are realized in `u3` C. Note in particular that jets do not need to follow the same solution algorithm and logic as the Hoon code; they merely need to reliably produce the same result. + +### Jet Composition: Floating-Point `+factorial` <a href="#jet-composition-floating-point-factorial" id="jet-composition-floating-point-factorial"></a> + +Let us examine jet composition using a more complicated floating-point operation. The Urbit runtime uses [SoftFloat](http://www.jhauser.us/arithmetic/SoftFloat-3/doc/SoftFloat.html) to provide a reference software implementation of floating-point mathematics. This is slower than hardware FP but more portable. + +This library `lib/trig-rs.hoon` provides a few transcendental functions useful in many mathematical calculations. The `~%` "sigcen" rune registers the jets (with explicit arguments, necessary at the highest level of inclusion). The `~/` "sigfas" rune indicates which arms will be jetted. + +**`/lib/trig-rs.hoon`** + +```hoon +:: Transcendental functions library, compatible with @rs +:: +=/ tau .6.28318530717 +=/ pi .3.14159265358 +=/ e .2.718281828 +=/ rtol .1e-5 +~% %trig ..part ~ +|% +:: Factorial, $x!$ +:: +++ factorial + ~/ %factorial + |= x=@rs ^- @rs + =/ t=@rs .1 + |- ^- @rs + ?: =(x .0) t + ?: =(x .1) t + $(x (sub:rs x .1), t (mul:rs t x)) +:: Absolute value, $|x|$ +:: +++ absolute + |= x=@rs ^- @rs + ?: (gth:rs x .0) + x + (sub:rs .0 x) +:: Exponential function, $\exp(x)$ +:: +++ exp + ~/ %exp + |= x=@rs ^- @rs + =/ rtol .1e-5 + =/ p .1 + =/ po .-1 + =/ i .1 + |- ^- @rs + ?: (lth:rs (absolute (sub:rs po p)) rtol) + p + $(i (add:rs i .1), p (add:rs p (div:rs (pow-n x i) (factorial i))), po p) +:: Integer power, $x^n$ +:: +++ pow-n + ~/ %pow-n + |= [x=@rs n=@rs] ^- @rs + ?: =(n .0) .1 + =/ p x + |- ^- @rs + ?: (lth:rs n .2) + p + ::~& [n p] + $(n (sub:rs n .1), p (mul:rs p x)) +-- +``` + +We will create a generator which will pull the arms and slam each gate such that we can assess the library's behavior. Later on we will create unit tests to validate the behavior of both the unjetted and jetted code. + +**`/gen/trig-rs.hoon`** + +```hoon +/+ *trig-rs +!: +:- %say +|= [[* eny=@uv *] [x=@rs n=@rs ~] ~] +:: +~& (factorial n) +~& (absolute x) +~& (exp x) +~& (pow-n x n) +[%verb ~] +``` + +We will further define a few unit tests as checks on arm behavior: + +**`/tests/lib/trig-rs.hoon`** + +```hoon +/+ *test, *trig-rs +:: +:::: + :: +|% +++ test-factorial ^- tang + ;: weld + %+ expect-eq + !> .1 + !> (factorial .0) + %+ expect-eq + !> .1 + !> (factorial .1) + %+ expect-eq + !> .120 + !> (factorial .5) + %+ expect-eq + !> .720 + !> (factorial .6) + == +-- +``` + +#### Jet Composition <a href="#jet-composition" id="jet-composition"></a> + +As before, the jet hints must provide a breadcrumb trail of symbols for the interpreter to know how to match the Hoon arms to the corresponding C code. + +```hoon +~% %trig-rs ..part ~ +|% +++ factorial + ~/ %factorial + |= x=@rs ^- @rs + ... +++ exp + ~/ %exp + |= x=@rs ^- @rs + ... +++ pow-n + ~/ %pow-n + |= [x=@rs n=@rs] ^- @rs + ... +-- +``` + +1. Register the jet symbols and function names in `tree.c`. +2. Declare function prototypes in headers `q.h` and `w.h`. +3. Produce functions for compilation and linking in the`pkg/noun/jets/e` directory. + +**Register the jet symbols and function names.** + +A jet registration may be carried out at any point in `tree.c`. The registration consists of marking the core + +In **`pkg/noun/jets/tree.c`**: + +```c +/* Jet registration of ++factorial arm under trig-rs */ +static u3j_harm _140_hex__trigrs_factorial_a[] = {{".2", u3we_trigrs_factorial, c3y}, {}}; +/* Associated hash */ +static c3_c* _140_hex__trigrs_factorial_ha[] = { + "903dbafb8e59427eced0b35379ad617c2eb6083a235075e9cdd9dd80e732efa4", + 0 +}; + +static u3j_core _140_hex__trigrs_d[] = + { { "factorial", 7, _140_hex__trigrs_factorial_a, 0, _140_hex__trigrs_factorial_ha }, + {} + }; +static c3_c* _140_hex__trigrs_ha[] = { + "0bac9c3c43634bb86f6721bbcc444f69c83395f204ff69d3175f3821b1f679ba", + 0 +}; + +/* Core registration by token for trigrs */ +static u3j_core _140_hex_d[] = +{ /* ... pre-existing jet registrations ... */ + { "trig-rs", 31, 0, _140_hex__trigrs_d, _140_hex__trigrs_ha }, + {} +}; +``` + +**Declare function prototypes in headers.** + +We must declare `u3we_trigrs_factorial` and `u3qe_trigrs_factorial`: + +In `w.h`: + +```c +u3_noun u3we_trigrs_factorial(u3_noun); +``` + +In `q.h`: + +```c +u3_noun u3qe_trigrs_factorial(u3_atom); +``` + +**Produce functions for compilation and linking.** + +Given these function prototype declarations, all that remains is the actual definition of the function. Both functions will live in their own file; we find it the best convention to associate all arms of a core in a single file. In this case, create a file `pkg/noun/jets/e/trig-rs.c` and define all of your `trig-rs` jets therein. (Here we show `+factorial` only.) + +**`pkg/noun/jets/e/trig-rs.c`** + +```c +/* jets/e/trig-rs.c +** +*/ +#include "all.h" +#include <softfloat.h> // necessary for working with software-defined floats +#include <stdio.h> // helpful for debugging, removable after development +#include <math.h> // provides library fabs() and ceil() + + union sing { + float32_t s; //struct containing v, uint_32 + c3_w c; //uint_32 + float b; //float_32, compiler-native, useful for debugging printfs + }; + +/* ancillary functions +*/ + bool isclose(float a, + float b) + { + float atol = 1e-6; + return ((float)fabs(a - b) <= atol); + } + +/* factorial of @rs single-precision floating-point value +*/ + u3_noun + u3qe_trigrs_factorial(u3_atom u) /* @rs */ + { + fprintf(stderr, "u3qe_trigrs_factorial\n\r"); // DELETE THIS LINE LATER + union sing a, b, c, e; + u3_atom bb; + a.c = u3r_word(0, u); // extricate value from atom as 32-bit word + + if (ceil(a.b) != a.b) { + // raise an error if the float has a nonzero fractional part + return u3m_bail(c3__exit); + } + + if (isclose(a.b, 0.0)) { + a.b = (float)1.0; + return u3i_words(1, &a.c); + } + else if (isclose(a.b, 1.0)) { + a.b = (float)1.0; + return u3i_words(1, &a.c); + } + else { + // naive recursive algorithm + b.b = a.b - 1.0; + bb = u3i_words(1, &b.c); + c.c = u3r_word(0, u3qe_trig_factorial(bb)); + e.s = f32_mul(a.s, c.s); + u3m_p("result", u3i_words(1, &e.c)); // DELETE THIS LINE LATER + return u3i_words(1, &e.c); + } + } + + u3_noun + u3we_trigrs_factorial(u3_noun cor) + { + fprintf(stderr, "u3we_trigrs_factorial\n\r"); // DELETE THIS LINE LATER + u3_noun a; + + if ( c3n == u3r_mean(cor, u3x_sam, &a, 0) || + c3n == u3ud(a) ) + { + return u3m_bail(c3__exit); + } + else { + return u3qe_trigrs_factorial(a); + } + } +``` + +This code deviates from the integer implementation in two ways: because all `@rs` atoms are guaranteed to be 32-bits, we can assume that `c3_w` can always contain them; and we are using software-defined floating-point operations with SoftFloat. + +We have made use of `u3r_word` to convert a 32-bit (really, 31-bit or smaller) Hoon atom (`@ud`) into a C `uint32_t` or `c3_w`. This unsigned integer may be interpreted as a floating-point value (similar to a cast to `@rs`) by the expedient of a C `union`, which allows multiple interpretations of the same bit pattern of data; in this case, as an unsigned integer, as a SoftFloat `struct`, and as a C single-precision `float`. + +`f32_mul` and its sisters (`f32_add`, `f64_mul`, `f128_div`, etc.) are floating-point operations defined in software ([Berkeley SoftFloat](http://www.jhauser.us/arithmetic/SoftFloat.html)). These are not as efficient as native hardware operations would be, but allow Urbit to guarantee cross-platform compatibility of operations and not rely on hardware-specific implementations. Currently all Urbit floating-point operations involving `@r` values use SoftFloat. + +#### Compiling and Using the Jet <a href="#compiling-and-using-the-jet" id="compiling-and-using-the-jet"></a> + +With this one jet for `+factorial` in place, compile the jet and take note of where Nix produces the binary. + +```sh +$ make +``` + +Copy the affected files back into the ship's pier: + +```sh +$ cp base/lib/trig-rs.hoon zod/base/lib +$ cp base/gen/trig-rs.hoon zod/base/gen +``` + +Restart your fakezod using the new Urbit binary and synchronize these to the `%home` desk: + +```hoon +> |commit %base +``` + +If all has gone well to this point, you are prepared to test the jet using the `%say` generator from earlier: + +```hoon +> +trig 5 +120 +``` + +Among the other output values, you should observe any `stderr` messages emitted by the jet functions each time they are called. + +**`pkg/noun/jets/e/trig.c`** + +```c +/* integer power of @rs single-precision floating-point value +*/ + u3_noun + u3qe_trigrs_pow_n(u3_atom x, /* @rs */ + u3_atom n) /* @rs */ + { + fprintf(stderr, "u3qe_trig_pow_n\n\r"); + union sing x_, n_, f_; + x_.c = u3r_word(0, x); // extricate value from atom as 32-bit word + n_.c = u3r_word(0, n); + + f_.b = (float)pow(x_, n_); + + return u3i_words(1, &f_.c); + } + + u3_noun + u3w_trigrs_pow_n(u3_noun cor) + { + fprintf(stderr, "u3w_trig_pow_n\n\r"); + u3_noun a, b; + + if ( c3n == u3r_mean(cor, u3x_sam_2, &a, + u3x_sam_3, &b, 0) || + c3n == u3ud(a) || c3n == u3ud(b) ) + { + return u3m_bail(c3__exit); + } + else { + return u3q_trigrs_pow_n(a, b); + } + } +``` + +The type union `sing` remains necessary to easily convert the floating-point result back into an unsigned integer atom. + +**Exercise: Implement the Other Jets** + +* We leave the implementation of the other jets to the reader as an exercise. (Please do not skip this: the exercise will both solidify your understanding and raise new important situational questions.) + + Again, the C jet code need not follow the same logic as the Hoon source code; in this case, we simply use the built-in `math.h` `pow` function. (We could—arguably should—have used SoftFloat-native implementations, but that is more involved than this tutorial intends.) + +### Jetting the Kernel <a href="#jetting-the-kernel" id="jetting-the-kernel"></a> + +Hoon jets are compiled into the Vere binary for distribution with the Urbit runtime. Per current development policy, this is the only way to actually share jets with other developers. + +Jets are registered with the runtime so that Vere knows to check whether a particular jet exists when it encounters a marked Hoon arm. + +* [`~/` sigfas](../../hoon/rune/sig.md#sigfas) registers a jet simply (using defaults). +* [`~%` sigcen](../../hoon/rune/sig.md#sigcen) registers a jet with all arguments specified. + +Typically we use `~/` sigfas to register jets within a core under the umbrella of a `~%` sigcen registration. For instance, `+add` is registered under the Kelvin tag of `hoon.hoon`: + +```hoon +~% %k.140 ~ ~ :: +|% +++ hoon-version + +-- => +~% %one + ~ +|% +++ add + ~/ %add + |= [a=@ b=@] + ^- @ + ?: =(0 a) b + $(a (dec a), b +(b)) +``` + +As a generic example, let us consider three nested arms within cores. We intend to jet only `+ccc`, but we need to give Vere a way of tracking the jet registration for all containing cores. + +```hoon + ++ aaa + ~% %aaa ..is ~ + ... + ++ bbb + ~/ %bbb + ++ ccc + ~/ %ccc + |= dat=@ + ^- pont + =+ x=(end 3 w a) + =+ y=:(add (pow x 3) (mul a x) b) + =+ s=(rsh 3 32 dat) + :- x + ?: =(0x2 s) y + ?: =(0x3 s) y + ~| [`@ux`s `@ux`dat] + !! +``` + +We hint `+ccc` with `%ccc` and add a trail of hints up the enclosing tree of arms. `~/` sigfas takes only the `term` symbol used to label the hint because it knows the context, but `~%` sigcen needs two more fields: the parent jet and some core registration information (which is often `~` null). We here use the parent of `..is`, a system-supplied jet, as the parent jet. Since `+is` is an arm of the Arvo core, `..is` is a reference to the entire Arvo core. The whole Arvo core is hinted with the jet label `%hex`, which is used as the parent for all the top-level jet hints in %zuse. + +When hinting your own code, make sure to hint each nesting arm. Skipping any nesting core will result in the jet code not being run. + +You do not need to provide C implementations for everything you hint. In the above, we hint `%aaa`, `%bbb`, and `%ccc`—even if our intent is only to jet `+ccc`. + +#### Editing the C Source Code <a href="#editing-the-c-source-code" id="editing-the-c-source-code"></a> + +Having hinted our Hoon, we now need to write the matching C code. If we don't, there isn't a problem—hinting code merely tells the interpreter to look for a jet, but if a jet is not found, the Hoon still runs just fine. + +This whole process recapitulates what you've done above, but in a generic way. + +There are two distinct tasks to be done C-side: + +1. Write the jet. +2. Register the jet. + +For each jet you will write one `u3we()` function and one `u3qe()` function. + +**Edit the C Source Code to Add Registration** + +* Edit the header file `include/jets/w.h` to have a declaration for each of your `u3we()` functions. Every `u3we()` function looks the same, e.g. + + ```c + u3_noun u3we_xxx(u3_noun); + ``` +* Edit the header file `~/jetting/urbit/pkg/urbit/include/jets/q.h` to have a declaration for your `u3qe()` function. `u3qe()` functions can differ from each other, taking distinct numbers of `u3_nouns` and/or `u3_atoms`, e.g. + + ```c + u3_noun u3qe_yyy(u3_atom, u3_atom); + u3_noun u3qe_zzz(u3_noun, u3_noun, u3_atom, u3_atom); + ``` +* Create a new `.c` file to hold your jets; both the `u3we_()` and `u3qe_()` functions go in the same file, for instance `~/jetting/urbit/pkg/noun/jets/e/secp.c`. The new file should include at least the following three things: + * `#include "all.h"` + * the new `u3we()` function + * the new `u3qe()` function +* Edit `~/jetting/urbit/pkg/urbitjets/tree.c` to register the jet. + +In the Hoon code we hinted some leaf node functions (`%ccc` for `+ccc` in our example) and then hinted each parent node up to the root `%aaa`/`+aaa`). We need to replicate this structure in C. Here's example C code to jet our above example Hoon: + +```c + // 1: register a C func u3we_ccc() + static u3j_harm _143_hex_hobo_reco_d[] = + { + {".2", u3we_ccc, c3y}, + {} + }; + + // 2: that implements a jet for Hoon arm 'ccc' + static u3j_core _143_hex_hobo_bbb_d[] = + { + { "ccc", _143_hex_hobo_ccc_d }, + {} + }; + + // 3: ... that is inside a Hoon arm 'bbb' + static u3j_core _143_hex_hobo_hobo_d[] = + { + { "bbb", 0, _143_hex_hobo_bbb_d }, + {} + }; + + // 4: ... that is inside a Hoon arm 'aaa' + static u3j_core _143_hex_d[] = + { { "down", 0, _143_hex_down_d }, + { "lore", _143_hex_lore_a }, + { "loss", _143_hex_loss_a }, + { "lune", _143_hex_lune_a }, + { "coed", 0, _143_hex_coed_d }, + { "aes", 0, _143_hex_aes_d }, + { "hmac", 0, _143_hex_hmac_d }, + { "aaa", 0, _143_hex_hobo_d }, + {} + }; +``` + +There are 4 steps here. Let's look at each in turn. + +1. Section 1 names the C function that we want to invoke: `u3we_ccc()`. The precise manner in which it does this is by putting entries in an array of `u3j_harm`s. The first one specifies the jet; the second one is empty and serves as a termination to the array, similar to how a C string is null terminated with a zero. The jet registration supplies two fields `{".2", u3we_secp}`, but this does not initialize all of the fields of `u3j_harm`. Other fields can be specified. + + The first field, with value ".2" in this example, is "arm 2". `".2"` labels the axis of the arm in the core. With a `%fast` hint (`~/` sigfas ), we're hinting a gate, so the relevant arm formula is always just the entire battery at `+2`. + + The second field, with value `u3we_ccc` in this example, is a function pointer (to the C implementation of the jet). + + The third field (absent here) is a flag to turn on verification of C jet vs Hoon at run time. It can take value `c3n` (which means verify at run time) or `c3y` (which means don't verify). If not present, it is set to don't verify. + + There are additional flags; see \~/tlon/urbit/include/noun/jets.h +2. Section 2 associated the previous jet registration with the name `"ccc"`. This must be the same symbol used in the Hoon hint. We again have a “null terminated” (metaphorically) list, ending with `{}`. + + Section 3 references structure built in step 2 and slots it under `bbb` (again, note that this is exactly the same symbol used in the hinting in Hoon). + + The line in section 2 + + ```c + { "ccc", _143_hex_hobo_ccc_a }, + ``` + + looks very similar to the line in section 3 + + ```c + { "bbb", 0, _143_hex_hobo_bbb_d }, + ``` + + But note that the line in section 2 fill in the first 2 fields in the struct, and the line in section 3 fills in the first three fields. Section 2 is registering an array of `u3j_harm`, i.e. is registering an actual C jet. +3. Section 3 specifies `0` for the array of `u3j_harm` and is instead specifying an array of `u3j_core`, i.e. it is registering nesting of another core which is not a leaf node. +4. Section 4 is much like section 3, but it's the root of this particular tree. Section 4 is also an example of how a given node in the jet registration tree may have multiple children. + +You should be able to register jets whether your nesting is 2 layers deep, 3 (like this example), or more. You should also be able to register multiple jets at the same nesting level (e.g. a function `u3we_ddd()` which is a sibling of `u3we_ccc()` inside data structure `_143_hex_hobo_reco_d[]` ). + +**Edit the C Source Code to Add the `u3we_()` Function** + +There are two C functions per jet, because separation of concerns is a good thing. + +The first C function—named `u3we_xxx()`—unpacks arguments from the Hoon code and gets them ready. + +The second C function -- named `u3qe_xxx()`—takes those arguments and actually performs the operations that parallel the Hoon code being jetted. + +Let's write the `u3we_xxx()` function first. This function accepts one argument, of type `u3_noun`. This is the same type as a Hoon noun (`*`). This one argument is the payload. The payload is a tree, obviously. + +The payload consists of (on the right branch) the context (you'd think of “global variables and available methods”, if analogies to other programming languages were allowed!) and on the left branch the sample (the arguments to this particular function call). + +Your `u3we_xxx()` function does one thing: unpacks the sample from `cor`, sanity checks them, and passes them to the `u3qe_xxx()` function. + +To unpack the sample, we use the function `u3r_mean()` to do this, thusly: + +```c + u3_noun arg_a, arg_b, arg_c ... ; + + u3r_mean(cor, + axis_a, & arg_a, + axis_b, & arg_b, + axis_c, & arg_c + ... + 0) +``` + +If we want to to assign the data located at axis 3 of cor to `arg_a`, we'd set `axis_a = 3`. + +`u3r_mean()` takes [varargs](https://en.wikipedia.org/wiki/Variadic_function), so we can pass in as many axis/return-argument pairs as we wish, terminated with a `0`. You saw above how to pull the sample arguments out of the right-descending trees (because a linked list is a degenerate case of a tree). + +If the Hoon that you're jetting looks like this + +```hoon +++ make-k +~/ %make-k +=, mimes:html +|= [aaa=@ bbb=@ ccc=@] +``` + +In the C code you'd fetch them out of the payload with + +```c +u3r_mean(cor, + u3x_sam_2, & arg_aaa, + u3x_sam_5, & arg_bbb, + u3x_sam_6, & arg_ccc + ... + 0) +``` + +If you're confident, go ahead and write code. If you want to inspect your arguments to see what's going on, you can pretty print the sample. + +You could in theory inspect/pretty-print the noun by calling + +```c +u3m_p("description", cor); :: DO NOT DO THIS !!! +``` + +… but you don't want to do this, because, recall, `cor` contains the _entire_ context. + +Do instead, perhaps, + +```c +c3_o ret; +u3_noun sample; + +ret = u3r_mean(sample, u3x_sam_1, &sample, 0); +fprintf(stderr, "ret = %i\n\r", ret); // we want ret = 0 = yes + +u3m_p("sample", sample); // pretty print the entire sample +``` + +After our C function pulls out the arguments it needs to typecheck them. + +If `arg_a` is supposed to be a atom, trust but verify: + +```c +u3ud(arg_a); // checks for atomicity; alias for u3a_is_atom() +``` + +If it's supposed to be a cell: + +```c +u3du(arg_a); // checks for cell-ness +``` + +There are other tests you might need to use + +```c +u3a_is_cat() // check whether the noun is a direct atom (31 bits or less) +u3a_is_dog() // check whether the noun is an indirect noun (32+ bits) + +u3a_is_pug() // check whether noun is indirect atom +u3a_is_pom() // check whether noun is indirect cell +``` + +All of these tests return Hoon loobeans (`yes` 0/`no` 1 vs. `TRUE`/`FALSE`), so check return values vs `c3n` / `c3y`. If any of these `u3_mean()`, `u3ud()` etc return `u3n` you have an error and should return + +```c +return u3m_bail(c3__exit); +``` + +Otherwise, pass the arguments into your inner jet function and return the results of that. + +**Edit the C Source Code to Add the `u3qe_()` Function** + +**Unpacking Nouns** + +The `u3qe_xxx()` function is the real jet—the C code that replaces the Hoon. + +First, you may need to massage your inputs a bit to get them into types that you can use. + +You have received a bunch of `u3_nouns` or `u3_atoms`, but you presumably want to do things in a native C/non-Hoon manner: computing with raw integers, etc. + +A `u3_noun` will want to be further disassembled into atoms. + +A `u3_atom` represents a simple number, but the implementation may or may not be simple. If the value held in the atom is 31 bits or less, it's stored directly in the atom. If the value is 32 bits the atom holds a pointer into the loom where the actual value is stored. ( see [Nouns](nouns.md) ) + +You don't want to get bogged down in the details of this—you just want to get data out of your atoms. + +If you know that the data fits in 32 bits or less, you can use + +```c +u3r_word(c3_w a_w, u3_atom b); +``` + +If it is longer than 32 bits, use + +```c +u3r_words(c3_w a_w, c3_w b_w, c3_w* c_w, u3_atom d); +``` + +or + +```c + u3r_bytes(c3_w a_w, c3_w b_w, c3_y* c_y, u3_atom d) +``` + +If you need to get the size, use + +```c +u3r_met(3, a); +``` + +Cells have their own set of characteristic functions for accessing interior nouns: `u3r_cell`, `u3r_trel`, `u3r_qual`, `u3h`, `u3t`, and the like. + +The actual meat of the function is up to you. What is the function supposed to do for Hoon? + +**Packing Nouns** + +Now we move on to return semantics. + +First, you can transfer raw values into nouns using + +```c +u3_noun u3i_words(c3_w a_w, const c3_w* b_w) +``` + +and you can build cells out of nouns using + +```c +u3nc(); // pair +u3nt(); // triple +u3nq(); // quad +``` + +There are two facets here: + +1. **Data format.** If the Hoon is expected to return a single atom (e.g. if the Hoon looks like this:) + + ```hoon + ++ make-k + ~/ %make-k + |= [has=@uvI prv=@] :: <---- input parguments + ^- @ :: <---- return value is a single value of type '@' (atom) + ... + ``` + + then your C code—at least when you're stubbing it out—can do something like + + ```c + return(123); + ``` + + Or, if you want to create an atom more formally, you can build it like this + + ```c + // this variable is on the stack and will disappear + unsigned char nonce32[32]; + + // this allocates an indirect (> 31 bits) atom in the loom, + // does appropriate reference count, and returns the 32 bit handle to the atom + u3_noun nonce = u3i_words(8, (const c3_w*) nonce32); + + // this returns the 32 bit handle to the atom + return(nonce); + ``` + + If, on the other hand, your Hoon looks like + + ```hoon + ++ ecdsa-raw-sign + ~/ %ecdsa-raw-sign + |= [has=@uvI prv=@] :: <---- input parguments + ^- [v=@ r=@ s=@] :: <---- return value is a cell + ... + ``` + + ending your C code with + + ```c + return(123); + ``` + + is wrong and will result in a runtime error because you are returning a single atom, instead of a tuple containing three atoms. + + Instead do one of these: + + ```c + return(u3nc(a, b)); // for two atoms + return(u3nt(a, b, c)); // for three atoms + return(u3nq(a, b, c, d)); // for four atoms + ``` + + If you need to return a longer tuple, you can compose your own. Look at the definitions of these three functions and you will see that they are just recursive calls to the cell constructor `u3i_cell()` e.g. + + ```c + u3i_cell(a, u3i_cell(b, u3i_cell(c, d)); + ``` + + This implies that, to create a list instead of a cell, you will need to append `u3_nul` to the appropriately-sized tuple constructor: + + ```c + return(u3nt(a, b, u3_nul)); // for two atoms as a list + ``` +2. **Memory allocation.** Understanding the memory model, allocation, freeing, and ownership ('transfer' vs 'retain' semantics) is important. More information is available in [the “Nouns” docs](nouns.md). + +### Pills <a href="#pills" id="pills"></a> + +A _pill_ is a Nock “binary blob”, really a parsed Hoon abstract syntax tree. Pills are used to bypass the bootstrapping procedure for a new ship, and are particularly helpful when jetting code in `hoon.hoon`, `%zuse`, `%lull`, or the main Arvo vanes. + +An Urbit ship has to boot into the Arvo kernel—a Nock core with a particular interface. While it would be possible to make some ad-hoc procedure to initialize Arvo, it would be a drastic layering violation and couple Urbit to all sorts of internal implementation details of Arvo and Hoon. In contrast, a pill is basically a serialized set of declarative steps to initialize Arvo. + +You don't strictly need to use pills in producing jets in `/lib`, but it can speed up your development cycle significantly. However, you _must_ use pills when working on the core kernel (`hoon.hoon`, `zuse.hoon`, `arvo.hoon`). + +#### Producing a Pill <a href="#producing-a-pill" id="producing-a-pill"></a> + +Having edited the C code, you now need to compile it to build a new runtime executable. + +```sh +$ cd ~/jetting/vere +$ bazel build :urbit +``` + +You need to compile this in C and in Hoon, however. When the Urbit executable runs, the first thing it does is load the complete Arvo operating system. That step is much faster if it can load a jammed pill, where all of the Hoon has already been parsed from text file into Hoon abstract syntax tree, and then compiled from the Hoon into the Nock equivalent. + +Critically, this means that if you edit `hoon.hoon`, `zuse.hoon`, `arvo.hoon`, `lull.hoon`, `/sys/vane/ames.hoon`, etc., and then restart the executable, **you are not running your new code**. + +The only way to run the new code is to follow the following process: + +1. Start up a new fakeship (typically `~zod`) which knows where your edited Arvo files are (although it will not execute them, as discussed above). +2. From the Dojo command line, load the Hoon files and compile them into a `pill` file: + + ```hoon + > .pill +pill/solid + %solid-start + %solid-loaded + %solid-parsed + %solid-compiled + %solid-arvo + [%solid-kernel 0x6aa7.627e] + %arvo-assembly + [%solid-veer p=%$ q=/zuse] + [%tang /~zod/home/~2018.7.25..20.47.51..0027/sys/zuse ~mondyr-rovmes] + ``` + + If this is successful, then you are ready to move forwards. Otherwise, correct the syntax errors and iterate. +3. Exit the ship with `Ctrl+D` or `|exit`. +4. Save the pill file. + + ```sh + $ cd ~/jetting + $ cp zod/.urb/put/.pill ./mypill.pill + ``` + +**Run the Compiled C/Compiled Hoon Pill** + +5. Prepare a new fakezod (you can't use a backup fakezod here because the point is to boot from scratch again): + + ```sh + $ cd ~/jetting + $ rm -rf zod + $ /path/to/new/urbit -F zod -B ~/tlon/mypill.pill` + ``` + +If booting takes more than about 90 seconds, you may have created a ‘poison pill’, which hangs things. Try booting without the `-B` flag, and/or reverting your Hoon changes, generating a new pill based on that, and launching `urbit` with the known-clean pill. If these steps and boot in <90 seconds, but a boot with a pill created from your own Hoon does not, you have a Hoon bug of some sort. + +Hoon bugs that disable booting can be as simple as the wrong number of spaces. Many, but not all of them, will result in compile errors during the `.pill +pill/solid` step. If your booting takes >90 seconds, abort it, and debug at your Hoon code. + +* Inside the Dojo, `|commit` the changed `hoon.hoon` or other system file. It should automatically recompile if correct. + +You now have created a galaxy fakezod, on its own detached network, running your own strange variant of the OS. + +6. Run and test your jetting code, e.g. `(ccc:bbb:aaa 1 2 3)`. + +(As an aside, should you see “biblical” names like `noah`, this means that you are using a feature of the kernel in a core before it is available. You'll need to move things to a later point in the file or change your code if that happens.) + +### Testing Jets <a href="#testing-jets" id="testing-jets"></a> + +All nontrivial code should be thoroughly tested to ensure software quality. To rigorously verify the jet's behavior and performance, we will combine live testing in a single Urbit session, comparative behavior between a reference Urbit binary and our modified binary, and unit testing. + +1. Live spot checks rely on you modifying the generator `trig-rs.hoon` and observing whether the jet works as expected. + + When producing a library, one may use the `-build-file` thread to build and load a library core through a face. Two fakezods can be operated side-by-side in order to verify consistency between the Hoon and C code. + + ```hoon + > =trig-rs -build-file %/lib/trig-rs/hoon + + > (exp:trig-rs .5) + ``` +2. Comparison to the reference Urbit binary can be done with a second development ship and the same Hoon library and generator. +3. Unit tests rely on using the `-test` thread as covered in [Hoon School](../hoon-school/I-testing.md) and the [testing guide](../userspace/unit-tests.md). + + ```hoon + > -test %/tests/lib/trig-rs ~ + ``` +4. One of the arguments to the C function registration forces comparison of the results of the Hoon/Nock code and the C jet. + + It can take value `c3n` (which means to verify the jet's behavior at run time) or `c3y` (which means to not verify). If not present, it will not verify. + + Why is `c3y` ("yes") used to turn OFF verification? Because the flag is actually asking, “Is this jet already known to be correct?” +5. There are integration tests available for the Urbit repository; you should investigate the now-current standard of practice for implementing and including these with your jetted code submission. + +### Et Cetera <a href="#et-cetera" id="et-cetera"></a> + +We omit from the current discussion a few salient points: + +1. Reference counting with transfer and retain semantics. (For everything the new developer does outside of real kernel shovel work, one will use transfer semantics.) These are discussed in [the “Noun” docs](nouns.md). +2. The structure of memory: the loom, with outer and inner roads. This is discussed in [the “Noun” docs](nouns.md). +3. Many details of C-side atom declaration and manipulation from the `u3` library. These are discussed in [the API docs](api.md). +4. `fprintf`-based output should be done using `fprintf()` to `stderr`. Use both and to achieve line feed (move cursor down one line) and carriage return (move it to the left). You can also use `u3l_log` which does not require `\r`, but should not be used in cases where the IO drivers have not yet been initialized or can no longer be relied upon, e.g. crashing or shutdown. +5. A jet can be partial: it can solve certain cases efficiently but leave others to the Hoon implementation. A `u3w_*` jet interface function takes the entire core as one noun argument and returns a `u3_weak` result. If the return value is `u3_none` (distinct from `u3_nul`, `~` null), the core is evaluated; otherwise the resulting noun is produced in place of the nock. diff --git a/docs/runtime/nouns.md b/docs/runtime/nouns.md new file mode 100644 index 0000000..6ec82d1 --- /dev/null +++ b/docs/runtime/nouns.md @@ -0,0 +1,311 @@ +--- +description: "Comprehensive guide to u3 noun processing in C. Memory model, reference counting, road system architecture." +layout: + title: + visible: true + description: + visible: false + tableOfContents: + visible: true + outline: + visible: true + pagination: + visible: true +--- + +# Land of Nouns + +The division between `c3` and `u3` is that you could theoretically imagine using `c3` as just a generic C environment. Anything to do with nouns is in `u3`. + +## u3: a map of the system {#u3-a-map-of-the-system} + +These are the symbols you'll need to know about to program in `u3`. All files listed below are found in the [`pkg/noun`](https://github.com/urbit/vere/tree/develop/pkg/noun) directory. Symbols follow this pattern: + +``` +prefix purpose .h .c +------------------------------------------------------------------- +u3a_ allocation allocate.h allocate.c +u3e_ persistence events.h events.c +u3h_ hashtables hashtable.h hashtable.c +u3i_ noun construction imprison.h imprison.c +u3j_ jet control jets.h jets.c +u3l_ logging log.h log.c +u3m_ system management manage.h manage.c +u3n_ nock computation nock.h nock.c +u3o_ command-line options options.h options.c +u3r_ noun access (error returns) retrieve.h retrieve.c +u3s_ noun serialization serial.h serial.c +u3t_ profiling trace.h trace.c +u3u_ urth (memory management) urth.h urth.c +u3v_ arvo vortex.h vortex.c +u3x_ noun access (error crashes) xtract.h xtract.c +u3z_ memoization zave.h zave.c +u3k[a-g] jets (transfer, C args) jets/k.h jets/[a-g]/*.c +u3q[a-g] jets (retain, C args) jets/q.h jets/[a-g]/*.c +u3w[a-g] jets (retain, nock core) jets/w.h jets/[a-g]/*.c +``` + +Additionally, various noun type definition are found in `pkg/noun/types.h`. + +## u3: noun internals {#u3-noun-internals} + +A noun is a `u3_noun` - currently defined as a 32-bit `c3_w`. (This is zero-indexed so bit `31` is the high bit.) + +If your `u3_noun` is less than `(1 << 31)`, it's a direct atom. Every unsigned integer between `0` and `0x7fffffff` inclusive is its own noun. + +If bit `31` is set in a `u3_noun` and bit `30` is `1` the noun is an indirect cell. If bit `31` is set and bit `30` is `0` the noun is an indirect atom. Bits `29` through `0` are a word pointer into the loom - see below. The structures are: + +```c +typedef struct { + c3_w mug_w; + c3_w len_w; + c3_w buf_w[0]; // actually [len_w] +} u3a_atom; + +typedef struct { + c3_w mug_w; + u3_noun hed; + u3_noun tel; +} u3a_cell; +``` + +The only thing that should be mysterious here is `mug_w`, which is a 31-bit lazily computed nonzero short hash ([Murmur3](https://github.com/PeterScott/murmur3)). If `mug_w` is 0, the hash is not yet computed. We also hijack this field for various hacks, such as saving the new address of a noun when copying over. + +Also, the value `0xffffffff` is `u3_none`, which is never a valid noun. Use the type `u3_weak` to express that a noun variable may be `u3_none`. + +## u3: reference counts {#u3-reference-counts} + +The only really essential thing you need to know about `u3` is how to handle reference counts. Everything else, you can skip and just get to work. + +u3 deals with reference-counted, immutable, acyclic nouns. Unfortunately, we are not Apple and can't build reference counting into your C compiler, so you need to count by hand. + +Every allocated noun (or any allocation object, because our allocator is general-purpose) contains a counter which counts the number of references to it - typically variables with type `u3_noun`. When this counter goes to 0, the noun is freed. + +To tell `u3` that you've added a reference to a noun, call the function `u3a_gain()` or its shorthand `u3k()`. (For your convenience, this function returns its argument.) To tell `u3` that you've destroyed a reference, call `u3a_lose()` or `u3z()`. + +(If you screw up by decrementing the counter too much, `u3` will dump core in horrible ways. If you screw up by incrementing it too much, `u3` will leak memory. To check for memory leaks, set the `bug_o` flag in `u3e_boot()` - eg, run `vere` with `-g`. Memory leaks are difficult to debug - the best way to handle leaks is just to revert to a version that didn't have them, and look over your code again.) + +(You can gain or lose a direct atom. It does nothing.) + +## u3: reference protocols {#u3-reference-protocols} + +**THIS IS THE MOST CRITICAL SECTION IN THE `u3` DOCUMENTATION.** + +The key question when calling a C function in a refcounted world is what the function will do to the noun refcounts - and, if the function returns a noun, what it does to the return. + +There are two semantic patterns, `transfer` and `retain`. In `transfer` semantics, the caller "gives" a use count to the callee, which "gives back" any return. For instance, if I have + +```c + { + u3_noun foo = u3i_string("foobar"); + u3_noun bar; + + bar = u3f_futz(foo); + [...] + u3z(bar); + } +``` + +Suppose `u3f_futz()` has `transfer` semantics. At `[...]`, my code holds one reference to `bar` and zero references to `foo` - which has been freed, unless it's part of `bar`. My code now owns `bar` and gets to work with it until it's done, at which point a `u3z()` is required. + +On the other hand, if `u3f_futz()` has `retain` semantics, we need to write + +```c + { + u3_noun foo = u3i_string("foobar"); + u3_noun bar; + + bar = u3f_futz(foo); + [...] + u3z(foo); + } +``` + +because calling `u3f_futz()` does not release our ownership of `foo`, which we have to free ourselves. + +But if we free `bar`, we are making a great mistake, because our reference to it is not in any way registered in the memory manager (which cannot track references in C variables, of course). It is normal and healthy to have these uncounted C references, but they must be treated with care. + +The bottom line is that it's essential for the caller to know the refcount semantics of any function which takes or returns a noun. (In some unusual circumstances, different arguments or returns in one function may be handled differently.) + +Broadly speaking, as a design question, retain semantics are more appropriate for functions which inspect or query nouns. For instance, `u3h()` (which takes the head of a noun) retains, so that we can traverse a noun tree without constantly incrementing and decrementing. + +Transfer semantics are more appropriate for functions which make nouns, which is obviously what most functions do. + +In general, though, in most places it's not worth thinking about what your function does. There is a convention for it, which depends on where it is, not what it does. Follow the convention. + +## u3: reference conventions {#u3-reference-conventions} + +The `u3` convention is that, unless otherwise specified, **all functions have transfer semantics** - with the exception of the prefixes: `u3r`, `u3x`, `u3z`, `u3q` and `u3w`. Also, within jet directories `a` through `f` (but not `g`), internal functions retain (for historical reasons). + +If functions outside this set have retain semantics, they need to be commented, both in the `.h` and `.c` file, with `RETAIN` in all caps. Yes, it's this important. + +## u3: system architecture {#u3-system-architecture} + +If you just want to tinker with some existing code, it might be enough to understand the above. If not, it's probably worth taking the time to look at `u3` as a whole. + +`u3` is designed to work as a persistent event processor. Logically, it computes a function of the form + +``` + f(event, old state) -> (actions, new state) +``` + +Obviously almost any computing model - including, but not limited to, Urbit - can be defined in this form. To create the illusion of a computer that never loses state and never fails, we: + +- log every event externally before it goes into u3 +- keep a single reference to a permanent state noun. +- can abort any event without damaging the permanent state. +- snapshot the permanent state periodically, and/or prune logs. + +## u3: the road model {#u3-the-road-model} + +`u3` uses a memory design which I'm sure someone has invented somewhere before, because it's not very clever, but I've never seen it anywhere in particular. + +Every allocation starts with a solid block of memory, which `u3` calls the `loom`. How do we allocate on the loom? You're probably familiar with the Unix heap-stack design, in which the stack grows downward and the heap (malloc arena) grows upward: + +``` + 0 brk ffff + | heap | stack | + |------------#################################+++++++++++++| + | | | + 0 sp ffff +``` + +A road is a normal heap-stack system, except that the heap and stack can point in **either direction**. Therefore, inside a road, we can nest another road in the **opposite direction**. + +When the opposite road completes, its heap is left on top of the opposite heap's stack. It's no more than the normal behavior of a stack machine for all subcomputations to push their results on the stack. + +The performance tradeoff of "leaping" - reversing directions in the road - is that if the outer computation wants to preserve the results of the inner one, not just use them for temporary purposes, it has to **copy them**. + +This is a trivial cost in some cases, a prohibitive cost in others. The upside, of course, is that all garbage accrued in the inner computation is discarded at zero cost. + +The goal of the road system is the ability to **layer** memory models. If you are allocating on a road, you have no idea how deep within a nested road system you are - in other words, you have no idea exactly how durable your result may be. But free space is never fragmented within a road. + +Roads do not reduce the generality or performance of a memory system, since even the most complex GC system can be nested within a road at no particular loss of performance - a road is just a block of memory. + +Each road (`u3a_road` to be exact) uses four pointers: `rut` is the bottom of the arena, `hat` the top of the arena, `mat` the bottom of the stack, `cap` the top of the stack. (Bear in mind that the road "stack" is not actually used as the C function-call stack, though it probably should be.) + +A "north" road has the stack high and the heap low: + +``` + 0 rut hat ffff + | | | | + |~~~~~~~~~~~~-------##########################+++++++$~~~~~| + | | | | + 0 cap mat ffff +``` + +A "south" road is the other way around: + +``` + 0 mat cap ffff + | | | | + |~~~~~~~~~~~~$++++++##########################--------~~~~~| + | | | | + 0 hat rut ffff +``` + +Legend: `-` is durable storage (heap); `+` is temporary storage (stack); `~` is deep storage (immutable); `$` is the allocation frame `#` is free memory. + +Pointer restrictions: pointers stored in `+` can point anywhere. Pointers in `-` can only point to `-` or `~`; pointers in `~` only point to `~`. + +To "leap" is to create a new inner road in the `###` free space. but in the reverse direction, so that when the inner road "falls" (terminates), its durable storage is left on the temporary storage of the outer road. + +`u3` keeps a global variable, `u3_Road` or its alias `u3R`, which points to the current road. (If we ever run threads in inner roads - see below - this will become a thread-local variable.) Relative to `u3R`, `+` memory is called `junior` memory; `-` memory is `normal` memory; `~` is `senior` memory. + +## u3: explaining the road model {#u3-explaining-the-road-model} + +But... why? + +We're now ready to understand why the road system works so logically with the event and persistence model. + +The key is that **we don't update refcounts in senior memory.** A pointer from an inner road to an outer road is not counted. Also, the outmost, or `surface` road, is the only part of the image that gets checkpointed. + +So the surface road contains the entire durable state of `u3`. When we process an event, or perform any kind of complicated or interesting calculation, **we process it in an inner road**. If its results are saved, they need to be copied. + +Since processing in an inner road does not touch surface memory, (a) we can leave the surface road in a read-only state and not mark its pages dirty; (b) we can abort an inner calculation without screwing up the surface; and (c) because inner results are copied onto the surface, the surface doesn't get fragmented. + +All of (a), (b) and (c) are needed for checkpointing to be easy. It might be tractable otherwise, but easy is even better. + +Moreover, while the surface is most definitely single-threaded, we could easily run multiple threads in multiple inner roads (as long as the threads don't have pointers into each others' memory, which they obviously shouldn't). + +Moreover, in future, we'll experiment more with adding road control hints to the programmer's toolbox. Reference counting is expensive. We hypothesize that in many - if not most - cases, the programmer can identify procedural structures whose garbage should be discarded in one step by copying the results. Then, within the procedure, we can switch the allocator into `sand` mode, and stop tracking references at all. + +## u3: rules for C programming {#u3-rules-for-c-programming} + +There are two levels at which we program in C: (1) above the interpreter; (2) within the interpreter or jets. These have separate rules which need to be respected. + +## u3: rules above the interpreter {#u3-rules-above-the-interpreter} + +In its relations with Unix, Urbit follows a strict rule of "call me, I won't call you." We do of course call Unix system calls, but only for the purpose of actually computing. + +Above Urbit, you are in a normal C/Unix programming environment and can call anything in or out of Urbit. Note that when using `u3`, you're always on the surface road, which is not thread-safe by default. Generally speaking, `u3` is designed to support event-oriented, single-threaded programming. + +If you need threads which create nouns, you could use `u3m_hate()` and `u3m_love()` to run these threads in subroads. You'd need to make the global road pointer, `u3R`, a thread-local variable instead. This seems perfectly practical, but we haven't done it because we haven't needed to. + +## u3: rules within the interpreter {#u3-rules-within-the-interpreter} + +Within the interpreter, your code can run either in the surface road or in a deep road. You can test this by testing + +```c + (&u3H->rod_u == u3R) +``` + +ie: does the pier's home road equal the current road pointer? Normally in this context you assume you're obeying the rules of running on an inner road, ie, "deep memory." Remember, however, that the interpreter **can** run on surface memory - but anything you can do deep, you can do on the surface. The converse is by no means the case. + +In deep memory, think of yourself as if in a signal handler. Your execution context is extremely fragile and may be terminated without warning or cleanup at any time (for instance, by `Ctrl+C`). + +For instance, you can't call `malloc` (or C++ `new`) in your C code, because you don't have the right to modify data structures at the global level, and will leave them in an inconsistent state if your inner road gets terminated. (Instead, use our drop-in replacements, `u3a_malloc()`, `u3a_free()`, `u3a_realloc()`.) + +A good example is the different meaning of `c3_assert()` inside and outside the interpreter. At either layer, you can use regular assert(), which will just kill your process. On the surface, `c3_assert()` will just... kill your process. + +In deep execution, `c3_assert()` will issue an exception that queues an error event, complete with trace stack, on the Arvo event queue. Let's see how this happens. + +## u3: exceptions {#u3-exceptions} + +You produce an exception with + +```c + /* u3m_bail(): bail out. Does not return. + ** + ** Bail motes: + ** + ** %exit :: semantic failure + ** %evil :: bad crypto + ** %intr :: interrupt + ** %fail :: execution failure + ** %foul :: assert failure + ** %need :: network block + ** %meme :: out of memory + ** %time :: timed out + ** %oops :: assertion failure + */ + c3_i + u3m_bail(c3_m how_m); +``` + +Broadly speaking, there are two classes of exception: internal and external. An external exception begins in a Unix signal handler. An internal exception begins with a call to longjmp() on the main thread. + +There are also two kinds of exception: mild and severe. An external exception is always severe. An internal exception is normally mild, but some (like `c3__oops`, generated by `c3_assert()`) are severe. + +Either way, exceptions come with a stack trace. The `u3` nock interpreter is instrumented to retain stack trace hints and produce them as a printable `(list tank)`. + +Mild exceptions are caught by the first virtualization layer and returned to the caller, following the behavior of the Nock virtualizer `+mock` (in `hoon.hoon`) + +Severe exceptions, or mild exceptions at the surface, terminate the entire execution stack at any depth and send the cumulative trace back to the `u3` caller. + +For instance, `vere` uses this trace to construct a `%crud` event, which conveys our trace back toward the Arvo context where it crashed. This lets any UI component anywhere, even on a remote node, render the stacktrace as a consequence of the user's action - even if its its direct cause was (for instance) a Unix SIGINT or SIGALRM. + +## u3: C structures on the loom {#u3-c-structures-on-the-loom} + +Normally, all data on the loom is nouns. Sometimes we break this rule just a little, though - eg, in the `u3h` hashtables. + +To point to non-noun C structs on the loom, we use a `u3_post`, which is just a loom word offset. A macro lets us declare this as if it was a pointer: + +```c + typedef c3_w u3_post; + #define u3p(type) u3_post +``` + +Some may regard this as clever, others as pointless. Anyway, use `u3to()` and `u3of()` to convert to and from pointers. + +When using C structs on the loom - generally a bad idea - make sure anything which could be on the surface road is structurally portable, eg, won't change size when the pointer size changes. (Note also: we consider little-endian, rightly or wrongly, to have won the endian wars.) diff --git a/docs/runtime/u3.md b/docs/runtime/u3.md new file mode 100644 index 0000000..5f1457f --- /dev/null +++ b/docs/runtime/u3.md @@ -0,0 +1,34 @@ +--- +description: "Introduction to u3, the Nock runtime system. C library for making and storing nouns. Solid-state interpreter." +layout: + title: + visible: true + description: + visible: false + tableOfContents: + visible: true + outline: + visible: true + pagination: + visible: true +--- + +# U3 + +The Urbit interpreter is built on a Nock runtime system written in C, `u3`. This section is a relatively complete description. + +You should keep reading if (a) you're planning to work on the Urbit interpreter; (b) you're a language implementation geek; or (c) you don't really understand anything until you've seen the actual structs. + +## u3: Noun processing in C {#u3-noun-processing-in-c} + +`u3` is the C library that makes Urbit work. If it wasn't called `u3`, it might be called `libnoun` - it's a library for making and storing nouns. + +What's a noun? A noun is either a cell or an atom. A cell is an ordered pair of any two nouns. An atom is an unsigned integer of any size. + +To the C programmer, this is not a terribly complicated data structure, so why do you need a library for it? + +One: nouns have a well-defined computation kernel, Nock, whose spec fits on a page and gzips to 340 bytes. But the only arithmetic operation in Nock is increment. So it's nontrivial to compute both efficiently and correctly. + +Two: `u3` is designed to be a "solid-state interpreter," ie, a single-level store which is transparently snapshotted. This implies a specialized memory-management model, etc, etc. + +(Does `u3` depend on the higher levels of Urbit, Arvo and Hoon? Yes and no. `u3` expects you to load something shaped like an Arvo kernel, and use it as an event-processing function. But you don't need to use this feature if you don't want, and your kernel doesn't have to be Arvo proper - just Arvo-compatible. Think of `u3` as the BIOS and Arvo as the boot kernel. And there are no dependencies at all between Hoon the language and `u3`.) |