summaryrefslogtreecommitdiff
path: root/ocaml/CLAUDE.md
diff options
context:
space:
mode:
authorpolwex <polwex@sortug.com>2025-10-05 22:57:55 +0700
committerpolwex <polwex@sortug.com>2025-10-05 22:57:55 +0700
commitc4b71435d9afdb67450f320f54fb7aa99dcae85e (patch)
treea08c4c2f7965a95fcfe6dda09629d3f103d25a0b /ocaml/CLAUDE.md
parentfcedfddf00b3f994e4f4e40332ac7fc192c63244 (diff)
fixed jamcue
Diffstat (limited to 'ocaml/CLAUDE.md')
-rw-r--r--ocaml/CLAUDE.md278
1 files changed, 278 insertions, 0 deletions
diff --git a/ocaml/CLAUDE.md b/ocaml/CLAUDE.md
new file mode 100644
index 0000000..2577dee
--- /dev/null
+++ b/ocaml/CLAUDE.md
@@ -0,0 +1,278 @@
+# Overe - OCaml Port of Vere (Urbit Runtime)
+
+## Project Goal
+
+Port the Vere (Urbit C runtime) to OCaml piece by piece, starting with core components and comparing performance against the original C implementation.
+
+## Completed Work
+
+### 1. Nock Interpreter (`lib/nock.ml`)
+
+**Status**: ✅ Complete and tested
+
+- Implemented complete Nock interpreter with all 12 opcodes
+- Direct port of `_n_nock_on` from C implementation in `pkg/noun/nock.c`
+- Uses Zarith (GMP-backed) for arbitrary-precision integers
+- All test cases pass (`test/test_nock.ml`)
+
+**Key Functions**:
+- `nock_on`: Main interpreter loop
+- `slot`: Tree addressing using bit notation
+- All 12 opcodes: 0 (slot), 1 (constant), 2 (nock), 3 (cell test), 4 (increment), 5 (equality), 6 (if-then-else), 7 (compose), 8 (push), 9 (call), 10 (hint), 11 (wish/deprecated)
+
+### 2. Performance Benchmarking
+
+**Status**: ✅ Complete with detailed results
+
+Created comprehensive benchmark suite comparing C vs OCaml:
+- `bench_nock.ml`: OCaml benchmarks
+- `bench_simple.c`: Standalone C implementation for comparison
+- `compare.sh`: Comparison script
+- `BENCHMARKS.md`: Full results documentation
+
+**Key Findings**:
+- C is 2-5x faster on average
+- OCaml is competitive or faster on allocation-heavy operations
+- OCaml shows more consistent performance (less variance)
+- For decrement: C 1.39x faster
+- For tree navigation: C 4.87x faster
+- For nock(decrement): OCaml 1.12x faster
+- For deep recursion: C 2.41x faster
+
+### 3. Noun Type System (`lib/noun.ml`)
+
+**Status**: ✅ Complete
+
+```ocaml
+type noun =
+ | Atom of Z.t
+ | Cell of noun * noun
+```
+
+Helper functions:
+- `atom`, `cell`: Constructors
+- `is_atom`, `is_cell`: Type predicates
+- `head`, `tail`: Cell accessors
+- `slot`: Tree addressing (axis addressing)
+- `atom_to_int`, `noun_to_string`: Conversions
+
+### 4. Bitstream Utilities (`lib/bitstream.ml`)
+
+**Status**: ✅ Complete
+
+Implements bit-level reading/writing for jam/cue serialization:
+
+**Writer**:
+```ocaml
+type writer = {
+ buf: bytes ref;
+ mutable bit_pos: int;
+}
+```
+- `writer_create()`: Initialize writer
+- `write_bit`: Write single bit
+- `write_bits`: Write multiple bits from a Z.t value
+- `writer_to_bytes`: Extract final bytes
+- `writer_ensure`: Dynamic buffer resizing
+
+**Reader**:
+```ocaml
+type reader = {
+ bytes: bytes;
+ mutable bit_pos: int;
+}
+```
+- `reader_create`: Initialize from bytes
+- `read_bit`: Read single bit
+- `read_bits`: Read multiple bits into Z.t
+- `reader_pos`: Get current bit position
+
+**Critical Fix**: OCaml operator precedence issue
+- Problem: `!(w.buf)` parsed as `(!w).buf`
+- Solution: Use intermediate variable pattern:
+ ```ocaml
+ let buf_ref : bytes ref = w.buf in
+ let buf : bytes = !buf_ref in
+ ```
+
+### 5. Jam/Cue Serialization (`lib/serial.ml`)
+
+**Status**: 🔴 IN PROGRESS - Has bugs
+
+Implementation of noun serialization with backreference compression.
+
+**Jam Encoding Format**:
+- Atoms: tag bit `0`, then mat-encoded value
+- Cells: tag bits `01`, then recursively encode head and tail
+- Backrefs: tag bits `11`, then mat-encoded position
+
+**Mat Encoding** (variable-length integers):
+- For 0: single `0` bit
+- For n > 0:
+ - a = bit-width of n (`Z.numbits n`)
+ - b = bit-width of a
+ - Write b `1`-bits, then one `0`-bit
+ - Write a in b-1 bits
+ - Write n in a bits
+
+**Current Implementation**:
+```ocaml
+let mat_encode w n =
+ if Z.equal n Z.zero then
+ write_bit w false
+ else begin
+ let a = Z.numbits n in
+ let b = Z.numbits (Z.of_int a) in
+ for _i = 1 to b do write_bit w true done;
+ write_bit w false;
+ write_bits w (Z.of_int a) (b - 1); (* Write a, not a-1 *)
+ write_bits w n a
+ end
+```
+
+**Known Bugs**:
+1. ✅ Fixed: mat_encode for 0 wrote `true` instead of `false`
+2. ✅ Fixed: Was writing `a-1` in `b-1` bits, changed to `a` based on C code analysis
+3. 🔴 **CURRENT BUG**: Roundtrip test passes for 0,1,2,3 but fails at 4
+ - Test shows: `FAIL: 4 -> 0`
+ - Need to compare against actual Urbit jam output to find discrepancy
+
+**Test Status** (`test/test_serial.ml`):
+- Atom roundtrip: FAILS at value 4
+- Basic jam/cue: Not yet fully validated
+
+### 6. Python Urbit Interface (`urb.py`)
+
+**Status**: ✅ Updated to Python 3
+
+Script to communicate with running Urbit ship via lens port. Can be used to get correct jam/cue test vectors.
+
+**Changes Made**:
+- Shebang: `#!/usr/bin/env python3`
+- String decoding: `.encode().decode('unicode_escape')`
+- File write: mode `'wb'` for binary
+- Logging: `warn` → `warning`
+- Idiom: `'X' not in dict`
+
+**Usage** (needs running Urbit ship):
+```bash
+python3 urb.py -d "(jam 42)" # Get jam encoding of 42
+python3 urb.py -d "(cue 42)" # Decode jam encoding
+```
+
+## Project Structure
+
+```
+ocaml/
+├── dune-project # Project configuration (package: overe)
+├── lib/
+│ ├── dune # Library stanza
+│ ├── noun.ml # Noun type system
+│ ├── nock.ml # Nock interpreter
+│ ├── bitstream.ml # Bit-level I/O
+│ └── serial.ml # Jam/cue serialization
+├── test/
+│ ├── dune # Test executables
+│ ├── test_nock.ml # Nock tests (all passing)
+│ └── test_serial.ml # Jam/cue tests (failing at 4)
+├── tmp/ # Temporary test files
+├── urb.py # Python script to query Urbit ship
+├── BENCHMARKS.md # Performance comparison results
+└── CLAUDE.md # This file
+```
+
+## Next Steps
+
+### Immediate Priority: Fix Jam/Cue Bug
+
+1. **Get test vectors from Urbit**:
+ ```bash
+ python3 urb.py -d "(jam 0)"
+ python3 urb.py -d "(jam 1)"
+ python3 urb.py -d "(jam 2)"
+ python3 urb.py -d "(jam 3)"
+ python3 urb.py -d "(jam 4)" # This is where OCaml fails
+ ```
+
+2. **Compare OCaml output**:
+ - Create test program to print hex output of jam for 0-10
+ - Compare against Urbit's output byte-by-byte
+ - Identify exact discrepancy in mat encoding
+
+3. **Fix mat_decode**:
+ - Ensure it matches the corrected mat_encode formula
+ - Currently may be reading `a-1` when it should read `a`
+
+### Future Work
+
+1. **Complete Jam/Cue**:
+ - Fix current bugs
+ - Add comprehensive tests
+ - Benchmark performance vs C
+
+2. **Memory Management**:
+ - Implement noun hash-consing/deduplication
+ - Add reference counting or GC integration
+ - Study u3 memory system from C implementation
+
+3. **Jets**:
+ - Port jet dashboard system
+ - Implement critical jets for performance
+ - Add jet registration and lookup
+
+4. **Persistence**:
+ - Implement snapshot system
+ - Port event log handling
+ - Add checkpoint/restore functionality
+
+## Build Instructions
+
+```bash
+# Build library and tests
+dune build
+
+# Run Nock tests (all passing)
+dune exec test/test_nock.exe
+
+# Run jam/cue tests (currently failing at 4)
+dune exec test/test_serial.exe
+
+# Query running Urbit (requires ship running on port 12323 or with .http.ports)
+python urb.py -d "(jam 42)"
+```
+
+## Technical Notes
+
+### OCaml Challenges Encountered
+
+1. **Operator precedence with refs**: `!(r.field)` doesn't work, need intermediate variable
+2. **Zarith numbits**: Returns actual bit count, not bit index (e.g., 4 needs 3 bits, `numbits` returns 3)
+3. **Mutable fields**: Need to use `mutable` keyword and assignment operator `:=`
+4. **Bytes are mutable**: Use `Bytes.sub` to copy, not share
+
+### Mat Encoding Details from C Code
+
+From `pkg/noun/serial.c` line 126:
+```c
+// Write a_w (the width) in b_w-1 bits
+u3r_chop(0, 0, b_w - 1, 0, &a_w, &bits);
+```
+
+This writes `a` (the width), NOT `a-1`. The OCaml implementation has been updated to match this.
+
+### References
+
+- Vere C implementation: `/home/y/code/urbit/vere/pkg/`
+- Nock specification: https://urbit.org/docs/nock/reference/
+- Jam/cue format: See official Urbit docs (provided in conversation)
+- u3 system architecture: See docs on king/serf, jets, persistence
+
+## Performance Philosophy
+
+Goal is not to beat C in raw speed, but to:
+- Provide maintainable, type-safe implementation
+- Enable experimentation with runtime features
+- Achieve competitive performance on real workloads
+- Leverage OCaml's strengths (allocation, GC, pattern matching)
+
+Current results show this is achievable: OCaml is within 2-5x of C, and actually faster for some allocation-heavy operations.