# The Loom Question: Memory Architecture in Neovere ## What is Vere's Loom? Vere allocates a **contiguous arena** of memory (originally 2GB, now expandable to larger sizes) called the "loom". All Urbit nouns live within this arena. The loom is: 1. **Pre-allocated**: Grab 2GB (or more) of virtual memory at startup 2. **Contiguous**: All nouns at known offsets within this range 3. **Self-contained**: The entire Urbit state fits in one memory region 4. **Snapshotable**: Can write the whole loom to disk for persistence 5. **Fixed-size** (traditionally): Can't grow beyond initial allocation ### Why Did Vere Choose This? Vere is written in C, which has no garbage collector. The loom provides: **1. Memory Management** - Custom allocator for nouns - Reference counting for cleanup - Memory pools for different noun sizes - Compaction/defragmentation **2. Persistence** - Snapshot = copy loom to disk - Resume = load loom from disk - Extremely fast (just `memcpy`) - Deterministic memory layout **3. Performance** - Nouns are 32-bit offsets into loom (compact!) - Pointer arithmetic for tree navigation - Cache-friendly sequential allocation - No malloc/free overhead **4. Determinism** - Same events → same memory layout - Reproducible across machines - Critical for consensus/jets **5. Jets** - C code can use raw pointers to nouns - No GC to worry about - Direct memory access for performance ### The Loom's Limitations **Hard Limit**: 2GB was the original ceiling - Large ships hit this and crashed - Recent Vere expanded to 4GB, 8GB, etc. - But still fundamentally limited **Fragmentation**: Memory can't be reclaimed easily - Dead nouns leave holes - Compaction is expensive - Can run out of contiguous space **Complexity**: Manual memory management - Reference counting bugs - Memory leaks - Segfaults ## Sword's Revolution: Loom-Free Architecture Sword (Rust implementation) took a radically different approach: **No Fixed Arena** - Nouns live on Rust's heap - No size limit (grows with actual usage) - Standard Rust allocator **Hash-Consing** - Every noun gets a unique hash - Duplicate nouns share memory automatically - Efficient deduplication **Indirect References** - Nouns reference each other via handles/IDs - Not raw pointers - Relocatable (for GC) **Persistence** - Use jam/cue serialization - Snapshot = jam the entire state - Resume = cue from disk - Slower than memcpy but more flexible ### Sword's Wins ✅ **No Memory Limit**: Ships can grow arbitrarily large ✅ **Simpler**: Rust's allocator handles complexity ✅ **Safer**: No manual memory management bugs ✅ **Still Deterministic**: Hash-consing ensures consistency ### Sword's Tradeoffs ❌ **Slower Snapshots**: jam/cue vs raw memory copy ❌ **Larger Memory Overhead**: Rust allocator + hashes ❌ **More Complex Jets**: Can't use raw pointers But the wins far outweigh the costs! ## OCaml's Position: Best of Both Worlds? OCaml gives us unique advantages: ### What We Already Have **1. Automatic GC** - OCaml's generational GC manages memory - No manual allocation/deallocation - Mature, well-tested, fast **2. Zarith (GMP) for Atoms** - Arbitrary-precision integers - Efficient representation - Automatic memory management **3. Jam/Cue Serialization** - Already implemented and working! - Can serialize entire kernel - Proven approach (Sword uses it) **4. No Segfaults** - Memory safety by default - Type system prevents many bugs - Runtime checks where needed ### Three Possible Approaches #### Option 1: Pure OCaml Heap (Current - Recommended!) **Implementation:** ```ocaml type noun = | Atom of Z.t (* Zarith handles memory *) | Cell of noun * noun (* OCaml GC handles cells *) ``` **Pros:** - ✅ Simple, already working - ✅ No memory limit - ✅ Automatic memory management - ✅ Type-safe - ✅ Persistence via jam/cue (working!) **Cons:** - ❌ Larger memory overhead than loom - ❌ Slower snapshots (jam/cue vs memcpy) - ❌ GC pauses (though generational GC helps) **Status:** This is what we have now and it works great! #### Option 2: OCaml Loom (Not Recommended) **Implementation:** ```ocaml (* Allocate 2GB Bigarray *) let loom = Bigarray.Array1.create Bigarray.char Bigarray.c_layout (2 * 1024 * 1024 * 1024) type noun_offset = int (* Offset into loom *) ``` **Pros:** - ✅ Matches Vere architecture - ✅ Fast snapshots (write Bigarray to disk) - ✅ Compact representation **Cons:** - ❌ 2GB hard limit (or whatever we choose) - ❌ Manual memory management (complex!) - ❌ Fights OCaml's GC - ❌ Unsafe (index out of bounds = crash) - ❌ Fragmentation issues - ❌ Complex to implement correctly **Status:** Possible but not worth it - we'd be reimplementing malloc in OCaml! #### Option 3: Hybrid with Hash-Consing **Implementation:** ```ocaml module NounTable = Hashtbl.Make(struct type t = noun let equal = (=) let hash = Hashtbl.hash end) let noun_cache : noun NounTable.t = NounTable.create 100000 let make_cell h t = let candidate = Cell (h, t) in match NounTable.find_opt noun_cache candidate with | Some existing -> existing (* Reuse! *) | None -> NounTable.add noun_cache candidate candidate; candidate ``` **Pros:** - ✅ Automatic deduplication - ✅ Memory savings (like Sword) - ✅ Still no size limit - ✅ Deterministic (same nouns = same structure) **Cons:** - ❌ Hash table overhead - ❌ GC pressure from large hash tables - ❌ Need to decide when to clear cache **Status:** Optional future optimization, not needed yet ## Recommendation: Stay Loom-Free **Thesis:** The loom is a C-ism that doesn't make sense in OCaml (or Rust). ### Why No Loom? 1. **OCaml has a GC** - Using it is simpler and safer than fighting it 2. **No size limit** - Ships can grow as large as system RAM allows 3. **Jam/Cue works** - We already have working persistence 4. **Type safety** - OCaml prevents many memory bugs automatically 5. **Simpler code** - No custom allocator to maintain ### Counter: "But Vere snapshots are instant!" True, but: - Our jam is fast enough (1.2s for solid pill) - Most ships snapshot infrequently - Correctness > raw speed for snapshots - Can optimize jam/cue later if needed ### Counter: "But the loom is deterministic!" OCaml is deterministic too: - Same nouns = same structure (referential transparency) - Jam output is deterministic (same noun = same bytes) - GC doesn't affect logical structure - Hash-consing can ensure physical equality if needed ### Counter: "But jets need raw pointers!" Not in modern languages: - Sword jets work with indirect references - OCaml jets can operate on `noun` type directly - Type safety > raw speed (until proven bottleneck) - Can use unsafe when absolutely necessary ## Practical Implications ### Current Architecture (Keep It!) **Allocation:** ```ocaml (* Atoms: Zarith manages memory *) let a = Atom (Z.of_int 42) (* Cells: OCaml GC manages memory *) let c = Cell (a, a) ``` **Persistence:** ```ocaml (* Snapshot: jam to bytes, write to file *) let snapshot kernel = let bytes = Serial.jam kernel in write_file "kernel.jam" bytes (* Resume: read file, cue to noun *) let resume () = let bytes = read_file "kernel.jam" in Serial.cue bytes ``` **Memory Growth:** - Automatic! GC expands heap as needed - System RAM is the only limit - Large ships work fine ### Future Optimizations (If Needed) **1. Hash-Consing** (memory deduplication) - Add global noun cache - Reuse identical nouns - Reduces memory footprint **2. Incremental Snapshots** - Snapshot diffs instead of full kernel - Faster save/resume - More complex but doable **3. Memory-Mapped I/O** - Use mmap for large nouns - OS handles paging - Transparent to our code **4. Generational Snapshots** - Keep old/new separated - Only snapshot changed parts - Faster incremental saves ## Comparison Table | Feature | Vere Loom | Sword Heap | Neovere (Current) | |---------|-----------|------------|-------------------| | Size limit | 2-8GB | None | None | | Memory management | Manual | Rust allocator | OCaml GC | | Snapshot speed | Instant (memcpy) | Slow (jam/cue) | Slow (jam/cue) | | Memory safety | Unsafe (C) | Safe (Rust) | Safe (OCaml) | | Complexity | High | Medium | Low | | Deduplication | Manual | Hash-consing | Optional | | Growing ships | Hard limit | ✓ Works | ✓ Works | ## Conclusion **We should NOT implement a loom in OCaml.** Instead: 1. ✅ Use OCaml's GC (already working) 2. ✅ Use jam/cue for persistence (already working) 3. ✅ Let memory grow naturally (no limits) 4. ⏳ Add hash-consing later if needed (optimization) 5. ⏳ Optimize jam/cue if snapshots become a bottleneck The loom made sense for C in 2013. It doesn't make sense for OCaml in 2025. **Follow Sword's lead:** Modern languages with GC don't need manual memory arenas. ## Open Questions 1. **Should we add hash-consing now or later?** - Later. Current approach works fine. - Add when memory usage becomes a concern. 2. **How do we handle very large ships (100GB+ kernel)?** - Cross that bridge when we come to it - System RAM is cheap - Incremental snapshots if needed 3. **What about jets that need performance?** - Start with safe OCaml - Profile to find bottlenecks - Use unsafe/Obj.magic only where proven necessary - Still better than C's unsafety 4. **Can we beat Vere's snapshot speed?** - Probably not for raw memcpy - But we can get close with optimized jam/cue - And we win on flexibility/safety ## References - Vere loom: `pkg/noun/allocate.c`, `pkg/noun/manage.c` - Sword architecture: `rust/sword/src/mem.rs`, `rust/sword/src/noun.rs` - OCaml GC: https://v2.ocaml.org/manual/gc.html - Zarith: https://github.com/ocaml/Zarith ## Decision **Status: LOOM-FREE ARCHITECTURE ✅** We will continue using OCaml's heap and GC. No loom needed.