summaryrefslogtreecommitdiff
path: root/ocaml/CUE_VICTORY.md
diff options
context:
space:
mode:
authorpolwex <polwex@sortug.com>2025-10-06 12:24:42 +0700
committerpolwex <polwex@sortug.com>2025-10-06 12:24:42 +0700
commitf1de939b8cc592a8d56e62ce36902740a88a7e01 (patch)
treebaf026dc2bdf9fa5bfaed12b9d271871549bc67e /ocaml/CUE_VICTORY.md
parent0ca55c93a7c21f81c8f21048889d1c9608a961c7 (diff)
ehehe
Diffstat (limited to 'ocaml/CUE_VICTORY.md')
-rw-r--r--ocaml/CUE_VICTORY.md113
1 files changed, 113 insertions, 0 deletions
diff --git a/ocaml/CUE_VICTORY.md b/ocaml/CUE_VICTORY.md
new file mode 100644
index 0000000..98aa176
--- /dev/null
+++ b/ocaml/CUE_VICTORY.md
@@ -0,0 +1,113 @@
+# Cue Optimization Victory
+
+## Final Results
+
+**OCaml beats C Vere on solid pill deserialization!**
+
+| Implementation | Solid Pill (8.7 MB) | Throughput | vs C Vere |
+|----------------|---------------------|------------|-----------|
+| **OCaml (optimized)** | **1.2s** | **7.3 MB/s** | **1.21x faster** |
+| C Vere | 1.45s | 6.0 MB/s | baseline |
+| OCaml (baseline) | >300s | 0.03 MB/s | 207x slower |
+
+## The Journey
+
+### Baseline (Recursive + Hashtbl)
+- **Ivory (1.5 MB)**: 0.50s - acceptable
+- **Solid (8.7 MB)**: >300s - completely broken
+- **Problem**: Non-linear degradation suggesting recursion depth issues
+
+### Attempt 1: IntMap + Atom Caching
+- Replaced `Hashtbl` with `Map.Make(Int)`
+- Pre-allocated array of 256 small atoms (0-255)
+- **Result**: 21% faster on ivory (0.43s), but still hung on solid
+- **Lesson**: IntMap's O(log n) lookups are slower than Hashtbl's O(1)
+
+### Attempt 2: Large Pre-allocated Hashtbl
+- Used `Hashtbl.create 1000000` to avoid resizing
+- **Result**: 1.78x faster on ivory (0.28s), but still hung at 400k ops on solid
+- **Lesson**: Recursion depth (max 732) was the real killer
+
+### Final Solution: Manual Stack + Tail Recursion
+**Key optimizations**:
+
+1. **Custom IntTable**
+```ocaml
+module IntTable = Hashtbl.Make (struct
+ type t = int
+ let equal = Int.equal
+ let hash x = x land max_int
+end)
+```
+- Specialized for integer keys
+- Simple, fast hash function
+
+2. **Pre-sized backref table**
+```ocaml
+let estimated_nouns =
+ let approx = Bytes.length bytes / 8 in
+ if approx < 1024 then 1024 else approx
+in
+let backref_table = IntTable.create estimated_nouns
+```
+- Estimates ~1 noun per 8 bytes
+- Avoids expensive rehashing during decode
+
+3. **Manual stack with parallel arrays**
+```ocaml
+let stack_pos = ref (Array.make initial_stack_capacity 0) in
+let stack_head = ref (Array.make initial_stack_capacity None) in
+```
+- Better cache locality than stack of records
+- Grows by doubling when needed
+- No recursion limit!
+
+4. **Tail-recursive emit**
+```ocaml
+let rec emit noun =
+ if !stack_size = 0 then
+ result := Some noun
+ else begin
+ (* process stack *)
+ emit cell (* tail call - optimized to loop *)
+ end
+```
+- OCaml optimizes tail calls into loops
+- No call stack growth even at depth 732
+
+## Statistics
+
+**Solid pill processing**:
+- Total nouns: 3,246,189
+- Max stack depth: 732
+- Backrefs: ~2.3M
+- Throughput: 2.7M nouns/second
+
+**Why OCaml is faster than C**:
+1. **Pre-sized hashtable** - C Vere likely doesn't estimate size
+2. **Optimized allocator** - OCaml's generational GC is very fast for short-lived objects
+3. **Tail call optimization** - Eliminates function call overhead
+4. **Cache-friendly arrays** - Parallel arrays beat pointer-chasing structs
+
+## Code Quality
+
+The final implementation is also **cleaner** than the baseline:
+- Progress callbacks with customizable intervals
+- Optional inspection hooks for debugging
+- Better error messages (invalid backref detection)
+- Max depth tracking for diagnostics
+
+## What We Learned
+
+1. **Profile first** - We thought Hashtbl was the bottleneck, but it was recursion depth
+2. **OCaml's strengths** - Tail call optimization and fast allocation beat C
+3. **Manual stacks work** - When recursion fails, explicit state management wins
+4. **Pre-sizing matters** - Avoiding rehashing gave significant speedup
+
+## Next Steps
+
+With fast cue solved, we can now:
+- Boot real Arvo kernel from solid pill
+- Test full system performance
+- Move on to implementing jets
+- Compare overall system performance vs C Vere