summaryrefslogtreecommitdiff
path: root/ocaml/NEED_FOR_SPEED.md
blob: 3b9ae94019a22066503ff41f2a7cf10a5359a38d (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
Let's make this runtime the faster freaking ever

Claude advices:

  Why it's slow:

  1. Pure interpreter - We're executing every single Nock instruction one by one. No jets (hand-optimized stdlib functions)
  2. 8 events processing HUGE state - That 8MB pill expands into gigabytes of noun trees in memory. Each event modifies massive state
  3. Parallelism doesn't help much - All 32 cores are fighting over the same shared event_list noun, causing GC contention. This is memory-bound, not CPU-bound
  4. No optimization - C Vere has:
    - Jets (bypass Nock for stdlib)
    - Hash-consing (deduplicate nouns)
    - Years of tuning

  How slow is it really?

  Let's see single-core performance:

  dune exec test/test_solid_sequential.exe

  This will show the actual speed of nock.ml, nock_iter.ml, and nock_tail.ml without parallel overhead. My guess: each lifecycle takes several seconds even on one core.

  The real solutions:

  1. Jets - Implement fast paths for common operations (Hoon stdlib)
  2. Hash-consing - Deduplicate nouns to save memory
  3. Better GC tuning - Reduce allocation pressure
  4. Compile hot paths - JIT common formulas

  Throwing 32 cores at an interpreted workload with heavy memory contention won't magically make it fast. The interpreter itself needs optimization first! 🤷

> then I guess we better get working on those real solutions. Which is more approachable at our current state. Hash-consing?

● Hash-consing is definitely more approachable! Here's why:

  What you already have:

  - ✅ Mug (hash) computation working perfectly
  - ✅ Clean noun type definition
  - ✅ All noun construction centralized through Noun.atom and Noun.cell

  What hash-consing does:

  (* Without hash-consing *)
  let a = Noun.cell (Noun.atom 42) (Noun.atom 1) in
  let b = Noun.cell (Noun.atom 42) (Noun.atom 1) in
  (* a and b are different objects in memory *)

  (* With hash-consing *)
  let a = Noun.cell (Noun.atom 42) (Noun.atom 1) in
  let b = Noun.cell (Noun.atom 42) (Noun.atom 1) in
  (* a and b point to THE SAME object in memory! *)

  Benefits:

  1. Massive memory savings - That 8MB solid pill has tons of repeated structure
  2. O(1) equality - Just pointer comparison instead of tree traversal
  3. Better cache locality - Shared structures stay hot
  4. Less GC pressure - Fewer objects to track

  Implementation approach:

  1. Add a global Weak.t hash table (weak refs so GC can still collect unused nouns)
  2. Modify Noun.atom and Noun.cell to check the table first
  3. Return existing noun if found, or insert new one
  4. Use mug as the hash key

  Want me to implement hash-consing? It's maybe 100-200 lines of code and will give you huge wins!