diff options
Diffstat (limited to 'ocaml/SERIALIZATION.md')
| -rw-r--r-- | ocaml/SERIALIZATION.md | 15 |
1 files changed, 15 insertions, 0 deletions
diff --git a/ocaml/SERIALIZATION.md b/ocaml/SERIALIZATION.md new file mode 100644 index 0000000..d02da54 --- /dev/null +++ b/ocaml/SERIALIZATION.md @@ -0,0 +1,15 @@ +# Serialization Notes + +## References +- Vere’s C implementation (`pkg/noun/serial.c`) defines the canonical jam/cue behaviour. I followed its control flow (Mat encoding, tag bits, backreference strategy) to make sure we stay protocol-compatible. +- Sword’s Rust code (`sword/rust/sword/src/serialization.rs`) mirrors the same rules but with clearer structure; I skimmed it for sanity checks (sign handling, backref caching). +- The previous OCaml port (`ocaml-old/lib/serial.ml`) served as a guardrail. Although I didn’t lift code, I used it to confirm interface expectations and test vectors produced by the old implementation. + +## Implementation Choices +- **Fresh bitstream**: I reintroduced a minimal `Bitstream` module so we can pack bits exactly like Vere. This keeps jam/cue self-contained and avoids depending on the old monolithic implementation. +- **Mat encoding**: Mat logic is copied conceptually from Vere—atoms are written with `0` + mat(number); cells get `10`; backrefs get `11`. The helpers are small enough for direct inspection. +- **Backreference table**: We hold a `Hashtbl` of bit positions (just like Vere/Sword) to recognise repeated sub-nouns and emit compact `11` entries. +- **Tests first**: I ported basic jam/cue test vectors (0, 1, `[0 0]`, shared structures, large atoms) to confirm behaviour before layering on more features. + +## Why Not Copy `ocaml-old/lib/serial.ml`? +The old file is large, full of profiling hooks, progress callbacks, and optimisation that we don’t need yet. Starting from scratch keeps the surface small and makes it obvious how the encoding works while still honouring the canonical semantics checked against Vere/Sword output. |
