From d21900836f89b2bf9cd55ff1708a4619c8b89656 Mon Sep 17 00:00:00 2001 From: polwex Date: Mon, 20 Oct 2025 13:13:39 +0700 Subject: neoinit --- ocaml/SERIALIZATION.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) create mode 100644 ocaml/SERIALIZATION.md (limited to 'ocaml/SERIALIZATION.md') diff --git a/ocaml/SERIALIZATION.md b/ocaml/SERIALIZATION.md new file mode 100644 index 0000000..d02da54 --- /dev/null +++ b/ocaml/SERIALIZATION.md @@ -0,0 +1,15 @@ +# Serialization Notes + +## References +- Vere’s C implementation (`pkg/noun/serial.c`) defines the canonical jam/cue behaviour. I followed its control flow (Mat encoding, tag bits, backreference strategy) to make sure we stay protocol-compatible. +- Sword’s Rust code (`sword/rust/sword/src/serialization.rs`) mirrors the same rules but with clearer structure; I skimmed it for sanity checks (sign handling, backref caching). +- The previous OCaml port (`ocaml-old/lib/serial.ml`) served as a guardrail. Although I didn’t lift code, I used it to confirm interface expectations and test vectors produced by the old implementation. + +## Implementation Choices +- **Fresh bitstream**: I reintroduced a minimal `Bitstream` module so we can pack bits exactly like Vere. This keeps jam/cue self-contained and avoids depending on the old monolithic implementation. +- **Mat encoding**: Mat logic is copied conceptually from Vere—atoms are written with `0` + mat(number); cells get `10`; backrefs get `11`. The helpers are small enough for direct inspection. +- **Backreference table**: We hold a `Hashtbl` of bit positions (just like Vere/Sword) to recognise repeated sub-nouns and emit compact `11` entries. +- **Tests first**: I ported basic jam/cue test vectors (0, 1, `[0 0]`, shared structures, large atoms) to confirm behaviour before layering on more features. + +## Why Not Copy `ocaml-old/lib/serial.ml`? +The old file is large, full of profiling hooks, progress callbacks, and optimisation that we don’t need yet. Starting from scratch keeps the surface small and makes it obvious how the encoding works while still honouring the canonical semantics checked against Vere/Sword output. -- cgit v1.2.3