diff options
Diffstat (limited to 'ocaml/LOOM.md')
| -rw-r--r-- | ocaml/LOOM.md | 361 |
1 files changed, 361 insertions, 0 deletions
diff --git a/ocaml/LOOM.md b/ocaml/LOOM.md new file mode 100644 index 0000000..4cc95a3 --- /dev/null +++ b/ocaml/LOOM.md @@ -0,0 +1,361 @@ +# The Loom Question: Memory Architecture in Neovere + +## What is Vere's Loom? + +Vere allocates a **contiguous arena** of memory (originally 2GB, now expandable to larger sizes) called the "loom". All Urbit nouns live within this arena. The loom is: + +1. **Pre-allocated**: Grab 2GB (or more) of virtual memory at startup +2. **Contiguous**: All nouns at known offsets within this range +3. **Self-contained**: The entire Urbit state fits in one memory region +4. **Snapshotable**: Can write the whole loom to disk for persistence +5. **Fixed-size** (traditionally): Can't grow beyond initial allocation + +### Why Did Vere Choose This? + +Vere is written in C, which has no garbage collector. The loom provides: + +**1. Memory Management** +- Custom allocator for nouns +- Reference counting for cleanup +- Memory pools for different noun sizes +- Compaction/defragmentation + +**2. Persistence** +- Snapshot = copy loom to disk +- Resume = load loom from disk +- Extremely fast (just `memcpy`) +- Deterministic memory layout + +**3. Performance** +- Nouns are 32-bit offsets into loom (compact!) +- Pointer arithmetic for tree navigation +- Cache-friendly sequential allocation +- No malloc/free overhead + +**4. Determinism** +- Same events → same memory layout +- Reproducible across machines +- Critical for consensus/jets + +**5. Jets** +- C code can use raw pointers to nouns +- No GC to worry about +- Direct memory access for performance + +### The Loom's Limitations + +**Hard Limit**: 2GB was the original ceiling +- Large ships hit this and crashed +- Recent Vere expanded to 4GB, 8GB, etc. +- But still fundamentally limited + +**Fragmentation**: Memory can't be reclaimed easily +- Dead nouns leave holes +- Compaction is expensive +- Can run out of contiguous space + +**Complexity**: Manual memory management +- Reference counting bugs +- Memory leaks +- Segfaults + +## Sword's Revolution: Loom-Free Architecture + +Sword (Rust implementation) took a radically different approach: + +**No Fixed Arena** +- Nouns live on Rust's heap +- No size limit (grows with actual usage) +- Standard Rust allocator + +**Hash-Consing** +- Every noun gets a unique hash +- Duplicate nouns share memory automatically +- Efficient deduplication + +**Indirect References** +- Nouns reference each other via handles/IDs +- Not raw pointers +- Relocatable (for GC) + +**Persistence** +- Use jam/cue serialization +- Snapshot = jam the entire state +- Resume = cue from disk +- Slower than memcpy but more flexible + +### Sword's Wins + +✅ **No Memory Limit**: Ships can grow arbitrarily large +✅ **Simpler**: Rust's allocator handles complexity +✅ **Safer**: No manual memory management bugs +✅ **Still Deterministic**: Hash-consing ensures consistency + +### Sword's Tradeoffs + +❌ **Slower Snapshots**: jam/cue vs raw memory copy +❌ **Larger Memory Overhead**: Rust allocator + hashes +❌ **More Complex Jets**: Can't use raw pointers + +But the wins far outweigh the costs! + +## OCaml's Position: Best of Both Worlds? + +OCaml gives us unique advantages: + +### What We Already Have + +**1. Automatic GC** +- OCaml's generational GC manages memory +- No manual allocation/deallocation +- Mature, well-tested, fast + +**2. Zarith (GMP) for Atoms** +- Arbitrary-precision integers +- Efficient representation +- Automatic memory management + +**3. Jam/Cue Serialization** +- Already implemented and working! +- Can serialize entire kernel +- Proven approach (Sword uses it) + +**4. No Segfaults** +- Memory safety by default +- Type system prevents many bugs +- Runtime checks where needed + +### Three Possible Approaches + +#### Option 1: Pure OCaml Heap (Current - Recommended!) + +**Implementation:** +```ocaml +type noun = + | Atom of Z.t (* Zarith handles memory *) + | Cell of noun * noun (* OCaml GC handles cells *) +``` + +**Pros:** +- ✅ Simple, already working +- ✅ No memory limit +- ✅ Automatic memory management +- ✅ Type-safe +- ✅ Persistence via jam/cue (working!) + +**Cons:** +- ❌ Larger memory overhead than loom +- ❌ Slower snapshots (jam/cue vs memcpy) +- ❌ GC pauses (though generational GC helps) + +**Status:** This is what we have now and it works great! + +#### Option 2: OCaml Loom (Not Recommended) + +**Implementation:** +```ocaml +(* Allocate 2GB Bigarray *) +let loom = Bigarray.Array1.create Bigarray.char Bigarray.c_layout (2 * 1024 * 1024 * 1024) + +type noun_offset = int (* Offset into loom *) +``` + +**Pros:** +- ✅ Matches Vere architecture +- ✅ Fast snapshots (write Bigarray to disk) +- ✅ Compact representation + +**Cons:** +- ❌ 2GB hard limit (or whatever we choose) +- ❌ Manual memory management (complex!) +- ❌ Fights OCaml's GC +- ❌ Unsafe (index out of bounds = crash) +- ❌ Fragmentation issues +- ❌ Complex to implement correctly + +**Status:** Possible but not worth it - we'd be reimplementing malloc in OCaml! + +#### Option 3: Hybrid with Hash-Consing + +**Implementation:** +```ocaml +module NounTable = Hashtbl.Make(struct + type t = noun + let equal = (=) + let hash = Hashtbl.hash +end) + +let noun_cache : noun NounTable.t = NounTable.create 100000 + +let make_cell h t = + let candidate = Cell (h, t) in + match NounTable.find_opt noun_cache candidate with + | Some existing -> existing (* Reuse! *) + | None -> + NounTable.add noun_cache candidate candidate; + candidate +``` + +**Pros:** +- ✅ Automatic deduplication +- ✅ Memory savings (like Sword) +- ✅ Still no size limit +- ✅ Deterministic (same nouns = same structure) + +**Cons:** +- ❌ Hash table overhead +- ❌ GC pressure from large hash tables +- ❌ Need to decide when to clear cache + +**Status:** Optional future optimization, not needed yet + +## Recommendation: Stay Loom-Free + +**Thesis:** The loom is a C-ism that doesn't make sense in OCaml (or Rust). + +### Why No Loom? + +1. **OCaml has a GC** - Using it is simpler and safer than fighting it +2. **No size limit** - Ships can grow as large as system RAM allows +3. **Jam/Cue works** - We already have working persistence +4. **Type safety** - OCaml prevents many memory bugs automatically +5. **Simpler code** - No custom allocator to maintain + +### Counter: "But Vere snapshots are instant!" + +True, but: +- Our jam is fast enough (1.2s for solid pill) +- Most ships snapshot infrequently +- Correctness > raw speed for snapshots +- Can optimize jam/cue later if needed + +### Counter: "But the loom is deterministic!" + +OCaml is deterministic too: +- Same nouns = same structure (referential transparency) +- Jam output is deterministic (same noun = same bytes) +- GC doesn't affect logical structure +- Hash-consing can ensure physical equality if needed + +### Counter: "But jets need raw pointers!" + +Not in modern languages: +- Sword jets work with indirect references +- OCaml jets can operate on `noun` type directly +- Type safety > raw speed (until proven bottleneck) +- Can use unsafe when absolutely necessary + +## Practical Implications + +### Current Architecture (Keep It!) + +**Allocation:** +```ocaml +(* Atoms: Zarith manages memory *) +let a = Atom (Z.of_int 42) + +(* Cells: OCaml GC manages memory *) +let c = Cell (a, a) +``` + +**Persistence:** +```ocaml +(* Snapshot: jam to bytes, write to file *) +let snapshot kernel = + let bytes = Serial.jam kernel in + write_file "kernel.jam" bytes + +(* Resume: read file, cue to noun *) +let resume () = + let bytes = read_file "kernel.jam" in + Serial.cue bytes +``` + +**Memory Growth:** +- Automatic! GC expands heap as needed +- System RAM is the only limit +- Large ships work fine + +### Future Optimizations (If Needed) + +**1. Hash-Consing** (memory deduplication) +- Add global noun cache +- Reuse identical nouns +- Reduces memory footprint + +**2. Incremental Snapshots** +- Snapshot diffs instead of full kernel +- Faster save/resume +- More complex but doable + +**3. Memory-Mapped I/O** +- Use mmap for large nouns +- OS handles paging +- Transparent to our code + +**4. Generational Snapshots** +- Keep old/new separated +- Only snapshot changed parts +- Faster incremental saves + +## Comparison Table + +| Feature | Vere Loom | Sword Heap | Neovere (Current) | +|---------|-----------|------------|-------------------| +| Size limit | 2-8GB | None | None | +| Memory management | Manual | Rust allocator | OCaml GC | +| Snapshot speed | Instant (memcpy) | Slow (jam/cue) | Slow (jam/cue) | +| Memory safety | Unsafe (C) | Safe (Rust) | Safe (OCaml) | +| Complexity | High | Medium | Low | +| Deduplication | Manual | Hash-consing | Optional | +| Growing ships | Hard limit | ✓ Works | ✓ Works | + +## Conclusion + +**We should NOT implement a loom in OCaml.** + +Instead: +1. ✅ Use OCaml's GC (already working) +2. ✅ Use jam/cue for persistence (already working) +3. ✅ Let memory grow naturally (no limits) +4. ⏳ Add hash-consing later if needed (optimization) +5. ⏳ Optimize jam/cue if snapshots become a bottleneck + +The loom made sense for C in 2013. It doesn't make sense for OCaml in 2025. + +**Follow Sword's lead:** Modern languages with GC don't need manual memory arenas. + +## Open Questions + +1. **Should we add hash-consing now or later?** + - Later. Current approach works fine. + - Add when memory usage becomes a concern. + +2. **How do we handle very large ships (100GB+ kernel)?** + - Cross that bridge when we come to it + - System RAM is cheap + - Incremental snapshots if needed + +3. **What about jets that need performance?** + - Start with safe OCaml + - Profile to find bottlenecks + - Use unsafe/Obj.magic only where proven necessary + - Still better than C's unsafety + +4. **Can we beat Vere's snapshot speed?** + - Probably not for raw memcpy + - But we can get close with optimized jam/cue + - And we win on flexibility/safety + +## References + +- Vere loom: `pkg/noun/allocate.c`, `pkg/noun/manage.c` +- Sword architecture: `rust/sword/src/mem.rs`, `rust/sword/src/noun.rs` +- OCaml GC: https://v2.ocaml.org/manual/gc.html +- Zarith: https://github.com/ocaml/Zarith + +## Decision + +**Status: LOOM-FREE ARCHITECTURE ✅** + +We will continue using OCaml's heap and GC. No loom needed. |
