ocaml/LOOM.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361

# The Loom Question: Memory Architecture in Neovere

## What is Vere's Loom?

Vere allocates a **contiguous arena** of memory (originally 2GB, now expandable to larger sizes) called the "loom". All Urbit nouns live within this arena. The loom is:

1. **Pre-allocated**: Grab 2GB (or more) of virtual memory at startup
2. **Contiguous**: All nouns at known offsets within this range
3. **Self-contained**: The entire Urbit state fits in one memory region
4. **Snapshotable**: Can write the whole loom to disk for persistence
5. **Fixed-size** (traditionally): Can't grow beyond initial allocation

### Why Did Vere Choose This?

Vere is written in C, which has no garbage collector. The loom provides:

**1. Memory Management**
- Custom allocator for nouns
- Reference counting for cleanup
- Memory pools for different noun sizes
- Compaction/defragmentation

**2. Persistence**
- Snapshot = copy loom to disk
- Resume = load loom from disk
- Extremely fast (just `memcpy`)
- Deterministic memory layout

**3. Performance**
- Nouns are 32-bit offsets into loom (compact!)
- Pointer arithmetic for tree navigation
- Cache-friendly sequential allocation
- No malloc/free overhead

**4. Determinism**
- Same events → same memory layout
- Reproducible across machines
- Critical for consensus/jets

**5. Jets**
- C code can use raw pointers to nouns
- No GC to worry about
- Direct memory access for performance

### The Loom's Limitations

**Hard Limit**: 2GB was the original ceiling
- Large ships hit this and crashed
- Recent Vere expanded to 4GB, 8GB, etc.
- But still fundamentally limited

**Fragmentation**: Memory can't be reclaimed easily
- Dead nouns leave holes
- Compaction is expensive
- Can run out of contiguous space

**Complexity**: Manual memory management
- Reference counting bugs
- Memory leaks
- Segfaults

## Sword's Revolution: Loom-Free Architecture

Sword (Rust implementation) took a radically different approach:

**No Fixed Arena**
- Nouns live on Rust's heap
- No size limit (grows with actual usage)
- Standard Rust allocator

**Hash-Consing**
- Every noun gets a unique hash
- Duplicate nouns share memory automatically
- Efficient deduplication

**Indirect References**
- Nouns reference each other via handles/IDs
- Not raw pointers
- Relocatable (for GC)

**Persistence**
- Use jam/cue serialization
- Snapshot = jam the entire state
- Resume = cue from disk
- Slower than memcpy but more flexible

### Sword's Wins

✅ **No Memory Limit**: Ships can grow arbitrarily large
✅ **Simpler**: Rust's allocator handles complexity
✅ **Safer**: No manual memory management bugs
✅ **Still Deterministic**: Hash-consing ensures consistency

### Sword's Tradeoffs

❌ **Slower Snapshots**: jam/cue vs raw memory copy
❌ **Larger Memory Overhead**: Rust allocator + hashes
❌ **More Complex Jets**: Can't use raw pointers

But the wins far outweigh the costs!

## OCaml's Position: Best of Both Worlds?

OCaml gives us unique advantages:

### What We Already Have

**1. Automatic GC**
- OCaml's generational GC manages memory
- No manual allocation/deallocation
- Mature, well-tested, fast

**2. Zarith (GMP) for Atoms**
- Arbitrary-precision integers
- Efficient representation
- Automatic memory management

**3. Jam/Cue Serialization**
- Already implemented and working!
- Can serialize entire kernel
- Proven approach (Sword uses it)

**4. No Segfaults**
- Memory safety by default
- Type system prevents many bugs
- Runtime checks where needed

### Three Possible Approaches

#### Option 1: Pure OCaml Heap (Current - Recommended!)

**Implementation:**
```ocaml
type noun =
  | Atom of Z.t           (* Zarith handles memory *)
  | Cell of noun * noun   (* OCaml GC handles cells *)
```

**Pros:**
- ✅ Simple, already working
- ✅ No memory limit
- ✅ Automatic memory management
- ✅ Type-safe
- ✅ Persistence via jam/cue (working!)

**Cons:**
- ❌ Larger memory overhead than loom
- ❌ Slower snapshots (jam/cue vs memcpy)
- ❌ GC pauses (though generational GC helps)

**Status:** This is what we have now and it works great!

#### Option 2: OCaml Loom (Not Recommended)

**Implementation:**
```ocaml
(* Allocate 2GB Bigarray *)
let loom = Bigarray.Array1.create Bigarray.char Bigarray.c_layout (2 * 1024 * 1024 * 1024)

type noun_offset = int  (* Offset into loom *)
```

**Pros:**
- ✅ Matches Vere architecture
- ✅ Fast snapshots (write Bigarray to disk)
- ✅ Compact representation

**Cons:**
- ❌ 2GB hard limit (or whatever we choose)
- ❌ Manual memory management (complex!)
- ❌ Fights OCaml's GC
- ❌ Unsafe (index out of bounds = crash)
- ❌ Fragmentation issues
- ❌ Complex to implement correctly

**Status:** Possible but not worth it - we'd be reimplementing malloc in OCaml!

#### Option 3: Hybrid with Hash-Consing

**Implementation:**
```ocaml
module NounTable = Hashtbl.Make(struct
  type t = noun
  let equal = (=)
  let hash = Hashtbl.hash
end)

let noun_cache : noun NounTable.t = NounTable.create 100000

let make_cell h t =
  let candidate = Cell (h, t) in
  match NounTable.find_opt noun_cache candidate with
  | Some existing -> existing  (* Reuse! *)
  | None ->
      NounTable.add noun_cache candidate candidate;
      candidate
```

**Pros:**
- ✅ Automatic deduplication
- ✅ Memory savings (like Sword)
- ✅ Still no size limit
- ✅ Deterministic (same nouns = same structure)

**Cons:**
- ❌ Hash table overhead
- ❌ GC pressure from large hash tables
- ❌ Need to decide when to clear cache

**Status:** Optional future optimization, not needed yet

## Recommendation: Stay Loom-Free

**Thesis:** The loom is a C-ism that doesn't make sense in OCaml (or Rust).

### Why No Loom?

1. **OCaml has a GC** - Using it is simpler and safer than fighting it
2. **No size limit** - Ships can grow as large as system RAM allows
3. **Jam/Cue works** - We already have working persistence
4. **Type safety** - OCaml prevents many memory bugs automatically
5. **Simpler code** - No custom allocator to maintain

### Counter: "But Vere snapshots are instant!"

True, but:
- Our jam is fast enough (1.2s for solid pill)
- Most ships snapshot infrequently
- Correctness > raw speed for snapshots
- Can optimize jam/cue later if needed

### Counter: "But the loom is deterministic!"

OCaml is deterministic too:
- Same nouns = same structure (referential transparency)
- Jam output is deterministic (same noun = same bytes)
- GC doesn't affect logical structure
- Hash-consing can ensure physical equality if needed

### Counter: "But jets need raw pointers!"

Not in modern languages:
- Sword jets work with indirect references
- OCaml jets can operate on `noun` type directly
- Type safety > raw speed (until proven bottleneck)
- Can use unsafe when absolutely necessary

## Practical Implications

### Current Architecture (Keep It!)

**Allocation:**
```ocaml
(* Atoms: Zarith manages memory *)
let a = Atom (Z.of_int 42)

(* Cells: OCaml GC manages memory *)
let c = Cell (a, a)
```

**Persistence:**
```ocaml
(* Snapshot: jam to bytes, write to file *)
let snapshot kernel =
  let bytes = Serial.jam kernel in
  write_file "kernel.jam" bytes

(* Resume: read file, cue to noun *)
let resume () =
  let bytes = read_file "kernel.jam" in
  Serial.cue bytes
```

**Memory Growth:**
- Automatic! GC expands heap as needed
- System RAM is the only limit
- Large ships work fine

### Future Optimizations (If Needed)

**1. Hash-Consing** (memory deduplication)
- Add global noun cache
- Reuse identical nouns
- Reduces memory footprint

**2. Incremental Snapshots**
- Snapshot diffs instead of full kernel
- Faster save/resume
- More complex but doable

**3. Memory-Mapped I/O**
- Use mmap for large nouns
- OS handles paging
- Transparent to our code

**4. Generational Snapshots**
- Keep old/new separated
- Only snapshot changed parts
- Faster incremental saves

## Comparison Table

| Feature | Vere Loom | Sword Heap | Neovere (Current) |
|---------|-----------|------------|-------------------|
| Size limit | 2-8GB | None | None |
| Memory management | Manual | Rust allocator | OCaml GC |
| Snapshot speed | Instant (memcpy) | Slow (jam/cue) | Slow (jam/cue) |
| Memory safety | Unsafe (C) | Safe (Rust) | Safe (OCaml) |
| Complexity | High | Medium | Low |
| Deduplication | Manual | Hash-consing | Optional |
| Growing ships | Hard limit | ✓ Works | ✓ Works |

## Conclusion

**We should NOT implement a loom in OCaml.**

Instead:
1. ✅ Use OCaml's GC (already working)
2. ✅ Use jam/cue for persistence (already working)
3. ✅ Let memory grow naturally (no limits)
4. ⏳ Add hash-consing later if needed (optimization)
5. ⏳ Optimize jam/cue if snapshots become a bottleneck

The loom made sense for C in 2013. It doesn't make sense for OCaml in 2025.

**Follow Sword's lead:** Modern languages with GC don't need manual memory arenas.

## Open Questions

1. **Should we add hash-consing now or later?**
   - Later. Current approach works fine.
   - Add when memory usage becomes a concern.

2. **How do we handle very large ships (100GB+ kernel)?**
   - Cross that bridge when we come to it
   - System RAM is cheap
   - Incremental snapshots if needed

3. **What about jets that need performance?**
   - Start with safe OCaml
   - Profile to find bottlenecks
   - Use unsafe/Obj.magic only where proven necessary
   - Still better than C's unsafety

4. **Can we beat Vere's snapshot speed?**
   - Probably not for raw memcpy
   - But we can get close with optimized jam/cue
   - And we win on flexibility/safety

## References

- Vere loom: `pkg/noun/allocate.c`, `pkg/noun/manage.c`
- Sword architecture: `rust/sword/src/mem.rs`, `rust/sword/src/noun.rs`
- OCaml GC: https://v2.ocaml.org/manual/gc.html
- Zarith: https://github.com/ocaml/Zarith

## Decision

**Status: LOOM-FREE ARCHITECTURE ✅**

We will continue using OCaml's heap and GC. No loom needed.