# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Overview This is Vere, the Urbit runtime environment - the lowest layer of the Urbit stack. It includes the Nock virtual machine, I/O drivers, event log, and snapshotting system. The codebase is written in C and uses Zig as the build system. **Active Project**: We are porting Vere from C to OCaml 5.x with Eio, using a hybrid approach where jets remain in C and are called via FFI. ## Current Vere (C) Documentation ### Build System #### Primary Commands - **Build native debug binary**: `zig build` - **Run specific tests**: `zig build --summary all` - Available tests: `nock-test`, `ames-test`, `palloc-test`, `equality-test`, `hashtable-test`, `hamt-test`, `jets-test`, `retrieve-test`, `serial-test`, `boot-test`, `newt-test`, `vere-noun-test`, `unix-test`, `pact-test`, `tracy-test` - **Build for all supported targets**: `zig build -Dall` - **Release build**: `zig build -Drelease` - **Build with optimization**: `zig build -Doptimize=ReleaseFast` #### Common Build Options - `-Dtarget=`: Cross-compile (e.g., `aarch64-linux-musl`, `x86_64-macos`) - `-Doptimize=`: Debug (default), ReleaseSafe, ReleaseSmall, ReleaseFast - `-Dpace=`: Release train - once (default), live, soon, edge - `-Dcopt=`: Additional compiler flags (can be specified multiple times) - `-Dasan`: Enable address sanitizer (native only, requires llvm@19) - `-Dubsan`: Enable undefined behavior sanitizer (native only, requires llvm@19) - `-Dtracy`: Enable Tracy profiler integration - `-Dcpu-dbg`: Enable CPU debug mode (-DU3_CPU_DEBUG) - `-Dmem-dbg`: Enable memory debug mode (-DU3_MEMORY_DEBUG) #### Build Output Built binaries are placed in `zig-out//` (e.g., `zig-out/x86_64-linux-musl/urbit`) ### Architecture #### Package Structure The runtime is organized into distinct packages in the `pkg/` directory: - **pkg/c3**: Basic utilities for Urbit's C style (types, macros, portability) - **pkg/ent**: Cross-platform entropy source wrapper (`getentropy(2)`) - **pkg/ur**: Jam/cue implementation (Urbit's bitwise noun serialization) - **pkg/noun**: Nock virtual machine, memory management, jets, and snapshotting - **pkg/past**: Parser for Urbit's bytecode format - **pkg/vere**: I/O drivers, event log (LMDB), main event loop, pier management #### Key Components **Nock VM (pkg/noun)**: - `nock.c`: Nock interpreter and execution - `jets.c`: Jet registration and acceleration (optimized implementations of Nock formulas) - `allocate.c`: Loom memory management (arena allocator) - `manage.c`: Noun memory lifecycle and garbage collection - `events.c`: Event processing and persistence - `jets/`: Directory tree of jet implementations organized by Hoon library structure **I/O System (pkg/vere/io)**: - `ames.c`: UDP networking driver (Urbit's P2P protocol) - `behn.c`: Timer driver - `term.c`: Terminal I/O - `http.c`: HTTP server/client (using h2o) - `unix.c`: Filesystem synchronization - `conn.c`: IPC connections - `lick.c`: Inter-process communication - `mesa.c`: Alternative networking protocol - `cttp.c`: HTTP client effects **Event Persistence**: - `pkg/vere/disk.c`: Event log management using LMDB - Events are persisted to disk and can be replayed for crash recovery **Main Entry Points**: - `pkg/vere/main.c`: CLI argument parsing and runtime initialization - `pkg/vere/lord.c`: "Lord" process management (worker process coordination) - `pkg/vere/pier.c`: Pier (ship instance) lifecycle #### Memory Management Urbit uses a custom memory model called the "loom": - Fixed-size arena allocator (2GB or 4GB depending on architecture) - All Urbit data structures (nouns) live in the loom - Snapshot-based persistence allows complete memory dumps - Reference counting and mark-and-sweep GC for cleanup ### Development Workflow #### Working with Fake Ships Always develop on fake ships, not live network ships. Fake ships use deterministic keys and communicate over local loopback. **Boot a new fake ship**: ```console zig build ./zig-out//urbit -F zod ``` **Boot with development pill** (faster): ```console ./zig-out//urbit -F zod -B solid.pill ``` **Launch existing fake ship**: ```console ./zig-out//urbit zod ``` #### Debugging **GDB debugging**: ```bash zig build gdb --args ./zig-out//urbit zod ``` In GDB, set: ```gdb set follow-fork-mode child handle SIGSEGV nostop noprint ``` **macOS lldb debugging**: On macOS, you must configure lldb to handle Mach exceptions properly. Start the ship with `-t` flag when debugging, or attach after starting. Then run: ```lldb p (void)darwin_register_mach_exception_handler() pro hand -p true -s false -n false SIGBUS pro hand -p true -s false -n false SIGSEGV ``` ### Git Workflow #### Branch Naming All branches for review must follow: `i//` where `` is the GitHub issue number. #### Commit Style - Use imperative mood for commit messages - Include short description (required) and optional long description #### Pull Request Format ```markdown ### Description Resolves #. [Thorough description of changes] ### Related [Related issues, links, papers, etc.] ``` #### Branch Structure - `develop` (default): edge train - for runtime developers - `release`: soon train - for early adopters - `master`: live train - for production PRs should target `develop` by default. ### Testing Tests are colocated with implementation code: - `pkg/noun/*_tests.c`: Noun system tests - `pkg/vere/*_tests.c`: Vere I/O and persistence tests - `pkg/ur/tests.c`, `pkg/ent/tests.c`: Package-specific tests Run individual test suites with `zig build --summary all`. ### Dependencies External libraries (managed by Zig build system in `ext/`): - **GMP**: Multi-precision arithmetic - **OpenSSL**: Cryptography - **libuv**: Async I/O event loop - **LMDB**: Memory-mapped database for event log - **h2o**: HTTP server - **curl**: HTTP client - **libsigsegv**: Signal handling for memory protection - **urcrypt**: Urbit cryptographic primitives - **wasm3**: WebAssembly interpreter - **natpmp**: NAT port mapping - **zlib**: Compression ### Important Defines - `U3_OS_osx` / `U3_OS_linux` / `U3_OS_windows`: Platform detection - `U3_CPU_aarch64`: ARM64 architecture - `U3_CPU_DEBUG`: Enable CPU debugging - `U3_MEMORY_DEBUG`: Enable memory debugging - `U3_GUARD_PAGE`: Enable guard pages for loom - `U3_SNAPSHOT_VALIDATION`: Validate snapshots on load - `C3DBG`: Enable debug assertions ### Code Style The codebase uses Urbit-specific C conventions: - Custom types: `c3_w` (word), `c3_y` (byte), `c3_o` (loobean), etc. - Naming: `u3_` prefix for public APIs, `u3X_` for module X - Heavy use of macros for memory management and control flow - Arena-based allocation rather than malloc/free --- ## OCaml Port Plan ### Executive Summary This is a phased approach to porting Urbit's runtime (Vere) from C to OCaml 5.x with Eio. The port targets ~32,500 lines of C code (excluding jets). Using a hybrid approach where jets remain in C and are called via FFI, ### Why OCaml? 1. **Functional alignment**: Hoon (Urbit's language) is functional; OCaml's paradigm matches better than imperative C 2. **Safety**: Strong typing, exhaustive pattern matching, immutability by default prevent entire classes of bugs 3. **Performance**: OCaml 5.x multicore + Eio provides excellent performance with effect handlers 4. **GC integration**: OCaml's GC can be integrated with noun reference counting more naturally than manual C memory management 5. **Maintenance**: More maintainable codebase with algebraic data types and pattern matching ### Hybrid Strategy: Keep Jets in C **Key Decision**: Keep jets in C, call via FFI from OCaml. This dramatically reduces scope: - **Don't port**: 187 jet files (~15k LOC) - **Don't port**: urcrypt, wasm3, softfloat dependencies - **Do port**: Core noun system, Nock interpreter, I/O drivers #### Why Keep Jets in C? 1. **Proven implementations**: Jets are highly optimized and battle-tested 2. **Crypto dependencies**: Ed25519, ECDSA, etc. already use C libraries (urcrypt) 3. **WebAssembly**: wasm3 integration (3k LOC jet) stays in C 4. **Floating point**: softfloat dependency for IEEE compliance 5. **Reduced risk**: Don't need to reimplement/validate 187 jets 6. **Performance**: C jets are already fast; FFI overhead negligible for typical jet calls ### Target OCaml Structure ``` urbit-ocaml/ ├── dune-project # Project metadata ├── dune-workspace # Workspace config │ ├── lib/ # OCaml libraries │ ├── noun/ # Core noun system (port from pkg/noun) │ │ ├── dune │ │ ├── types.ml[i] # Noun ADTs │ │ ├── loom.ml[i] # Memory management │ │ ├── jam.ml[i] # Serialization │ │ ├── nock.ml[i] # Nock interpreter │ │ ├── jets_ffi.ml[i] # FFI to C jets │ │ └── jets_registry.ml[i] # Jet dispatch │ │ │ ├── runtime/ # Runtime I/O (port from pkg/vere) │ │ ├── dune │ │ ├── db.ml[i] # Event log │ │ ├── pier.ml[i] # Pier management │ │ ├── io/ │ │ │ ├── ames.ml[i] # UDP networking │ │ │ ├── http.ml[i] # HTTP server/client │ │ │ ├── term.ml[i] # Terminal │ │ │ ├── unix.ml[i] # Filesystem │ │ │ └── behn.ml[i] # Timers │ │ └── king.ml[i] # Main orchestrator │ │ │ └── c_bridge/ # C FFI bridge │ ├── dune # Links against existing C code │ ├── noun_ffi.ml[i] # Noun <-> C noun conversion │ ├── jets_ffi.ml[i] # Call C jets │ └── stubs/ # C stubs for FFI │ ├── noun_stubs.c │ └── jet_stubs.c │ ├── bin/ # Executables │ ├── dune │ └── urbit.ml # Main entry point │ ├── test/ # Tests │ ├── dune │ ├── test_noun.ml # Noun system tests │ ├── test_nock.ml # Nock interpreter tests │ ├── test_jets.ml # Jet FFI tests │ └── test_integration.ml # End-to-end tests │ ├── bench/ # Benchmarks │ ├── dune │ └── bench_nock.ml │ ├── c/ # Keep existing C code │ ├── pkg/noun/ # Copied from vere │ ├── pkg/ur/ │ └── ext/ # External deps (gmp, urcrypt, etc.) │ └── doc/ # Documentation └── architecture.md ``` --- ## Phase-by-Phase Implementation Plan ### Phase 0: Foundation & Development Environment **Goal**: Set up OCaml development environment and validate approach #### Prerequisites ```bash # OCaml 5.2+ (for multicore/effects) opam switch create vere-ocaml 5.2.1 eval $(opam env) # Core build tools opam install dune ocamlformat ocaml-lsp-server # Essential libraries opam install \ eio_main \ # Effects-based I/O zarith \ # Bignum arithmetic cmdliner \ # CLI parsing logs \ # Structured logging fmt \ # Formatting/pretty-printing \ alcotest \ # Testing qcheck \ # Property-based testing qcheck-alcotest \ # QCheck integration bechamel \ # Benchmarking \ ctypes \ # FFI to C ctypes-foreign # Dynamic FFI # Optional but recommended opam install \ ocaml-lsp-server \ # LSP for editor support ocamlformat \ # Code formatting odoc \ # Documentation generation utop \ # Better REPL landmarks # Profiling ``` #### Initial Project Setup ```bash # Create directory outside existing vere repo cd ~/code/urbit mkdir vere-ocaml && cd vere-ocaml # Initialize dune project cat > dune-project << 'EOF' (lang dune 3.16) (name urbit) (version 0.1.0) (generate_opam_files true) (package (name urbit) (synopsis "Urbit runtime in OCaml") (description "Urbit's Nock VM and I/O drivers implemented in OCaml with Eio") (depends (ocaml (>= 5.2.0)) dune eio_main zarith cmdliner logs fmt ctypes ctypes-foreign (alcotest :with-test) (qcheck :with-test) (qcheck-alcotest :with-test))) (using ctypes 0.3) EOF ``` **Deliverables**: - Working `dune build` and `dune test` - FFI examples calling C Vere functions from OCaml - CI/CD pipeline (GitHub Actions) --- ### Phase 1: Core Noun System **Goal**: Implement the foundational noun data structures and basic operations #### 1.1: Noun Type System **Implementation**: ```ocaml (* types.ml *) type noun = | Direct of int (* 31-bit direct atoms *) | Indirect of indirect and indirect = | Atom of bigint (* Arbitrary precision atoms *) | Cell of noun * noun (* Pairs [head tail] *) ``` #### 1.2: Jam/Cue Serialization **Files to port**: - `pkg/ur/serial.c` (~500 LOC) **Strategy**: - Maintain wire-format compatibility with C version - Optimize for OCaml's GC characteristics #### 1.3: Memory Management **OCaml approach**: ```ocaml (* loom.ml *) module Loom : sig type t val create : size:int -> t val allocate : t -> noun -> noun (* Intern in loom *) val snapshot : t -> bytes (* For persistence *) val restore : bytes -> t end ``` **Strategy**: Hybrid approach - Use OCaml GC for most nouns, reserve loom for snapshot/restore --- ### Phase 2: Nock Interpreter **Goal**: Implement a working Nock interpreter that can execute Nock formulas #### 2.1: Basic Interpreter **Implementation**: ```ocaml (* nock.ml *) type formula = | Axis of int | Const of noun | Cell of formula * formula | Inc of formula | Eq of formula * formula | If of formula * formula * formula | Compose of formula * formula | Push of formula * formula | Hint of hint * formula | ... val nock : subject:noun -> formula:noun -> noun ``` #### 2.2: Jet Infrastructure & FFI **FFI Strategy**: ```ocaml (* jets_ffi.ml *) module C = struct (* Convert OCaml noun to C u3_noun *) let to_c_noun : Types.noun -> uint32_t = (* ... *) (* Convert C u3_noun to OCaml noun *) let of_c_noun : uint32_t -> Types.noun = (* ... *) (* Foreign function binding *) let u3qa_add = foreign "u3qa_add" (uint32_t @-> uint32_t @-> returning uint32_t) end (* High-level wrapper *) let add a b = let a_c = C.to_c_noun a in let b_c = C.to_c_noun b in let result_c = C.u3qa_add a_c b_c in C.of_c_noun result_c ``` --- ### Phase 3: I/O System with Eio **Goal**: Port I/O drivers to Eio's structured concurrency model #### 3.1: Event Log & Persistence **Implementation**: ```ocaml (* db.ml *) module EventLog : sig type t val open_ : path:string -> t val append : t -> event:noun -> unit val read : t -> from:int -> noun Seq.t val snapshot : t -> noun -> unit end ``` #### 3.2: Ames (UDP Networking) ```ocaml (* io/ames.ml *) module Ames : sig val start : sw:Eio.Switch.t -> net:_ Eio.Net.t -> port:int -> on_packet:(noun -> unit) -> unit end ``` #### 3.3: HTTP Server/Client **Strategy**: Use OCaml-native HTTP (cohttp-eio or dream) instead of porting h2o #### 3.4: Other I/O Drivers - Terminal I/O using Lambda-Term or Notty - Unix filesystem using Eio.Path - Timers using Eio.Time - IPC using Unix domain sockets --- ### Phase 4: Pier Management & Orchestration **Goal**: Implement high-level runtime orchestration ```ocaml (* pier.ml *) module Pier : sig type t val boot : sw:Eio.Switch.t -> env:_ Eio.Stdenv.t -> path:string -> pill:noun -> t val resume : sw:Eio.Switch.t -> env:_ Eio.Stdenv.t -> path:string -> t val poke : t -> noun -> unit val scry : t -> path:noun -> noun option end ``` --- ### Phase 5: Performance & Polish **Goal**: Match or exceed C performance and prepare for production #### Key Optimizations 1. Noun allocation/deallocation 2. Nock interpreter inner loop 3. Jet dispatch via FFI 4. Serialization (jam/cue) 5. Hash table operations #### Production Readiness - Comprehensive error handling - Structured logging - Documentation (odoc) - Network compatibility testing with C Vere - Distribution packaging --- ## Testing Strategy ### 1. Unit Tests (Alcotest) ```ocaml (* test/test_noun.ml *) let test_atom_small () = let n = Types.atom (Z.of_int 42) in check bool "is atom" true (Types.is_atom n) ``` ### 2. Property Tests (QCheck) ```ocaml (* Roundtrip property: cue(jam(x)) = x *) let prop_jam_cue_roundtrip = Test.make ~name:"jam/cue roundtrip" (arbitrary_noun ()) (fun n -> let serialized = Jam.jam n in let deserialized = Jam.cue serialized in noun_equal n deserialized) ``` ### 3. FFI Validation Compare C jet output vs OCaml Nock interpretation for all jet calls ### 4. Cross-Validation Test binary compatibility with existing C Vere for: - Network protocols - Event logs - Snapshots - Pills --- ## Success Criteria ### Milestone 1 - [ ] Can execute basic Nock programs - [ ] Jam/cue roundtrip works - [ ] Jets callable via FFI - [ ] Performance within 2x of C version ### Milestone 2 - [ ] Can boot a fake ship - [ ] Event log persistence works - [ ] Basic I/O (Ames, HTTP, terminal) functional - [ ] Can process simple pokes ### Milestone 3 - [ ] Full feature parity with C Vere - [ ] Performance at or better than C version - [ ] Production-ready (error handling, logging, monitoring) - [ ] Network-compatible with C Vere --- ## Risk Assessment & Mitigation ### High Risks 1. **Memory Model Mismatch** - **Risk**: OCaml GC vs C loom semantics - **Mitigation**: Hybrid approach, extensive testing, gradual migration 2. **Performance Regression** - **Risk**: OCaml slower than hand-tuned C - **Mitigation**: Benchmark-driven development, optimization phase, compiler flags ### Medium Risks 3. **FFI Complexity** - **Risk**: Noun conversion overhead between OCaml and C - **Mitigation**: Optimize conversion layer, batch operations 4. **I/O Performance** - **Risk**: Eio maturity, performance characteristics - **Mitigation**: Benchmarks, fallback to Lwt if needed --- ## Revised Timeline **Phases**: - Foundation & Environment - Core Noun System - Nock Interpreter & Jet FFI - I/O System - Pier Management - Performance & Polish --- ## Next Steps 1. Set up OCaml development environment 2. Create project structure with dune 3. Implement basic noun types 4. Create FFI bridge to C jets 5. Begin porting jam/cue serialization --- *This plan is a living document. Update as we learn from implementation.*