# System & UI Audit — 2026-06-10

*Synthesis of a 15-domain read-only audit of `/home/ubuntu/quivent/render` (GH200, render/inference stack).*

---

## Executive Summary

The repository is **two products fused into one checkout** — a deterministic, model-free geometry renderer (`cli/render` → `source/gpu_*.py`) and a resident-FLUX diffusion motion studio (`ui/inference/spiral/*`) — and almost every problem flows from never having separated them or chosen a single owner for shared concerns.

**What is healthy.** Git/build hygiene is the cleanest domain: the `.gitignore` is thorough and correct, the 10.5MB render binary, 654MB `dist/`, ~700MB of `node_modules`, and 831MB of runtime logs are all properly **ignored and untracked**. Of 3.0GB on disk, only ~60MB (intentional museum media) is in version control — the repo would clone clean. The core render pipeline is also clean: 224 registry entries map to exactly 107 PyTorch scripts with **zero orphans and zero missing references**. `mm_server.py` (:8188) correctly refuses to load weights (it is catalog-only and is NOT the VRAM holder — the system-prompt framing of "mm_server holding 34GB" is wrong). Credential hygiene is good: no hardcoded secrets in source.

**The 4 root causes of the "chaos":**

1. **The dev server IS production, with no event-loop isolation.** comfort-ui's `vite dev` on :9731 hosts the entire ~40-route `/api/render/*` backend inside Vite plugins (`devServer.ts` 3,173 lines + `renderStream.ts` 1,889 lines). render-ui already extracted this to a `standalone.ts` precisely because "UI and API died together"; comfort-ui **regressed by omission** and still has no standalone server.
2. **Multiple uncoordinated daemons exist to PEG the GPU at 100%, by design.** The thing memory calls a "GPU governor" (`saturator.py`) is a *saturation pump* whose stated job is to hold a util FLOOR (SAT_TARGET=70). Three independent queue producers (`refill.sh`, `refill.py`, `saturator.py`), two restart authorities (`guardian.sh` + `saturator.py`, on top of `Restart=always`), and an in-process governor thread (`model_manager.py:_governor_loop`) all independently keep the GPU busy with no shared arbiter. `explore --loop` (`explore.go:243`) adds an infinite, sleepless, VRAM-unaware respawn loop on top.
3. **Whole-app duplication.** `render-ui` is a complete React→SolidJS rewrite of `comfort-ui` (788 files / 154k lines vs 463 / 114k), kept in lockstep by hand; both were edited TODAY. 58 framework-agnostic files are byte-identical forks (~11.8k lines), the ~3k-line `devServer.ts` backend is forked, and two `node_modules` trees (~700MB) exist for one app. This is the structural source of the "3 Vite servers" observation.
4. **No resource governance + unbounded growth.** Zero of 17 `Restart=always` systemd units carry any `MemoryMax`/`CPUQuota`/`TasksMax`. Two large VRAM consumers (FLUX ~34GB + llama-server ~24GB, `-ngl 99`) share 98GB with no per-process cap, no MPS, no semaphore. Outputs grow forever by explicit "NEVER deletes" design: `spiral/renders` = 27GB, render-store logs = 831MB (incl. a stale 272MB `.bloat` backup), `out/assets` = 42,985 dirents.

**Security perimeter.** The host has **no firewall** (`ufw inactive`) while `wire-server.elf` binds `0.0.0.0:8092`, NFS/rpcbind are exposed, and `render.influx.vision` proxies an unauthenticated vite dev server publicly — anyone on the internet can drive the GPU. `model_manager.py` exposes 49 unauthenticated POST endpoints, several of which `pkill`, `systemctl restart`, `git clone`, and Popen-spawn infinite `render explore --loop` saturators.

Counts below: **4 critical**, **22 high**.

---

## Critical & High Issues

| Issue | Sev | Category | Domain | Evidence | Recommendation |
|---|---|---|---|---|---|
| Vite dev server IS production — entire render API lives in dev-only Vite plugins | critical | Architecture | data-flow | `vite.config.ts:147-166` registers ~30 plugins mounting ~40 `/api/render/*` routes in `devServer.ts` (3173 lines); `deploy.sh` only builds static dist with ZERO API routes; Go CLI POSTs to `:9731` | Extract routes into a standalone HTTP server (render-ui already did at `server/standalone.ts`); never run `vite dev` as prod |
| comfort-ui never got the event-loop-isolation fix render-ui's own code documents | critical | Architecture | data-flow | render-ui `standalone.ts` header: "wedged the whole surface — UI and API died together"; comfort-ui has NO `server/standalone.ts` | Port render-ui's standalone split to comfort-ui or unify backends |
| Unauthenticated endpoints Popen-spawn `render explore --loop --all-native --parallel 6` | critical | Process-chaos | inference | `model_manager.py:3328-3348` CPU/GPU_TASKS argv; `:3509`/`:3570` `subprocess.Popen(start_new_session=True)` with NO auth; CORS `allow_origins=['*']` | Split supervisor endpoints out, gate behind auth + one-instance lock; never register infinite `--loop` as a button |
| Repo systemd docs are fictional; the units actually saturating the GPU are undocumented & unversioned | critical | Process-chaos | docs | `spiral/systemd/README.md` documents 4 units (dir holds 6); `systemctl --user` shows a DIFFERENT enabled family (controller/guardian/saturator) living ONLY in `~/.config/systemd/user/`, not in repo or git | Commit the real units into repo; write one `SERVICES.md` matching `systemctl --user list-unit-files` |
| `render explore --loop` — infinite tight respawn, no sleep/backoff/VRAM check | critical | Process-chaos | render-cli | `explore.go:243` loop body has no sleep/throttle/nvidia/backoff (grep empty) | Add inter-round sleep + GPU/VRAM headroom check; crash-loop guard on 0 successful renders |
| No VRAM cap on any GPU process — FLUX ~34GB + llama ~24GB blindly share 98GB | critical | Resource | gpu-gov | No `set_per_process_memory_fraction`/`max_memory`/`device_map` anywhere; `worker.py:143-154` loads FLUX to CUDA uncapped; llama `-ngl 99` | Set explicit VRAM budget per process or adopt CUDA MPS / single-consumer admission |
| render-ui is a full duplicate of comfort-ui (React→Solid) maintained in parallel | high | Duplication | render-ui / cross-ui | `package.json` "Solid port of comfort-ui"; 446 same-path files (58 byte-identical = ~11.8k lines); render-ui LARGER (788/154k) than original (463/114k); both edited 2026-06-10 | Pick ONE framework; freeze/archive the loser; tear down `solid.influx.vision` |
| Framework-agnostic logic (api/ws/types/midi/state graphs) forked verbatim into both UIs | high | Duplication | cross-ui | 58 byte-identical files: `lib/api.ts`,`lib/ws.ts`, entire `wan/io/midi/*`, `wan/state/*Graph.ts`; 388 of 446 already diverged | Extract `@ui/core` workspace package consumed by both |
| Three independent producers feed one spiral queue → uncoordinated overproduction | high | Process-chaos | spiral | `refill.sh` (POST kick q+c<4), `refill.py` (LOW=8), `saturator.py` (HIGH=10/MAX=28) all target `/home/ubuntu/spiral/queue`; live log "util~100% pending=9 → +3 (top up)" | Collapse to ONE producer with one water-mark; gate enqueue on real demand |
| Two services can each restart spiral-worker — competing authorities, no shared lock | high | Process-chaos | spiral / orchestration | `guardian.sh:44` + `saturator.py:198` both `systemctl --user restart spiral-worker`; unit also `Restart=always`; guardian fired at 05:31:58 | One owner of worker liveness; share a flock/cooldown |
| Self-reinforcing in-process governor thread auto-respawns explore loops on idle | high | Process-chaos | inference | `model_manager.py:3693-3712` `_governor_loop` dispatches cpu/gpu-explore every 20s when idle; started at import when `MM_SERVE=1` | Designate ONE saturation owner; default auto-dispatch OFF (survive reboot) |
| spiral controller closed loop permanently railed — 56% no-op cycles, can never converge | high | Architecture | spiral | `calibration.json` all 4 actuators at clamp; 314/561 ledger rows identical "within band — no change"; measured diff 0.0286 vs target 0.0105 | Detect saturation, warn/stop instead of 314 identical rows; re-scale actuator authority |
| spiral output grows unbounded by design — 27GB / 29,052 entries, never pruned | high | Resource-leak | spiral | `renders/` 27GB; `guardian.sh:5,52` "NEVER deletes anything", disk "warn-only" | Add retention (last N / X GB) reaper; rotate `spiral.jsonl` + logs |
| render-store append-only logs ~870MB, fully read+JSON.parse synchronously on every start | high | Resource-leak | data-flow / bloat | `jobs.log` 303-317M, `renders.log` 208-217M, `assets.log` 49M, `jobs.log.bloat` 272-285M; `renderStream.ts:195-213` `readFileSync().split().map(JSON.parse)` | Log rotation/compaction or SQLite; stream replay; delete `.bloat` backup |
| spiral source-of-truth forked across 3 locations; repo `systemd/` points at nonexistent venv | high | Duplication | spiral | deployed `~/.config`, repo `ops/`, repo `systemd/`; set (C) ExecStart `.../ui/inference/.venv/bin/python` which does NOT exist; live uses `/home/ubuntu/venvs/mm` | Delete stale `systemd/` set; fix `ops/` venv path; version the runtime |
| Live runtime root `/home/ubuntu/spiral` is untracked & OUTSIDE the repo | high | Architecture | orchestration | Live worker/controller/guardian run from `/home/ubuntu/spiral/*`; not a git repo; editing repo `ops/*.sh` has zero effect | Make `/home/ubuntu/spiral` a checkout or symlink scripts from repo; ONE source of truth |
| 17 `Restart=always` units, 0 with MemoryMax/CPUQuota/TasksMax | high | Process-chaos | gpu-gov / config | `grep MemoryMax\|CPUQuota\|TasksMax` over all `*.service` → empty; RestartSec 2-5s | Add MemoryMax/TasksMax + StartLimit backoff; pick one restart authority |
| 30 components poll `/api/render` every 1.5-2.5s; ~77 pollers spawn nvidia-smi / scan /tmp per poll | high | Architecture | comfort-ui / data-flow | 30 files ref `/api/render`; `useVisiblePolling.ts` comment "~40 raw setInterval pollers hammering every 1.5-6s... each poll spawns nvidia-smi and scans /tmp"; telemetry spawns nvidia-smi 2x/sample in Vite loop | One server-side sampler + SSE/WS; never spawn a subprocess per HTTP request |
| 3-deep self-exec fan-out per render (explore→broadcast→form→python3/ffmpeg) | high | Architecture | render-cli | `explore.go:333,347` exec `render broadcast`; `comfort_cmd.go:367-371` re-execs `render <form>`; one CPU form = parent + child + grandchild | Collapse chain into in-process calls; reserve subprocess only for python GPU forms |
| Two `while true` keepalive wrappers respawn explore forever with pkill restarts | high | Process-chaos | render-cli / orchestration | `cpu_loop.sh:16-22` + `gpu-experiment-daemon.sh:82-89` `while true; do render explore --loop...; sleep 5; done`; pkill-by-pattern | One supervisor (systemd `Restart=on-failure`); drop nested while-true + pkill |
| spiral/worker.py holds 34GB VRAM: fp16 FLUX + T5-xxl, no cap/offload, fp8 path unused | high | Resource-leak | inference | nvidia-smi pid 720264 = 34398 MiB; `worker.py:61-65` `tf-dev-fp16`+`te-t5-fp16` resident; `model_manager.py:302-307` has a qfloat8 path that would ~halve it | Run with fp8 transformer or offload idle T5; set memory fraction |
| Only "arbiter" is a file-based priority yield honored by 1 of N GPU consumers | high | Architecture | gpu-gov | `gpu-experiment-daemon.sh:38-66` `.gpu-priority` file; worker.py/saturator.py/llama never check it | Promote to a real broker every consumer checks before allocating, or one queue/lock |
| 49 unauthenticated POST endpoints; several pkill/systemctl/git clone/rsync/aws | high | Security | inference | `grep -c '@router.post'` = 49; `:3552` pkill, `:2721` systemctl restart, `:3824` git clone+build, all CORS `*`, no auth | Auth token / CSRF before any process-control endpoint; tighten CORS; narrow pkill to own PIDs |
| Host has no firewall while services bind 0.0.0.0 (incl NFS/rpcbind/wire :8092) | high | Security | config | `ufw status` = inactive; `ss -tlnp` shows `0.0.0.0` :8092,:2049,:111 | Enable ufw allowing only 22/80/443; bind wire-server to 127.0.0.1 |
| Public Caddy route proxies all POST into unauthenticated vite dev server | high | Security | config | `Caddyfile` `render.influx.vision { reverse_proxy 127.0.0.1:9731 }`, `wire @writes method POST → :9731`; no basicauth/token | Caddy basicauth/token in front of `/api`; serve static dist, gate the API |
| 6 god-components >1000 lines; `tuning.tsx` 1451 lines / 40 hooks | high | Architecture | comfort-ui | `tuning.tsx` 1451 (40 useState/Effect/Ref), `output.tsx` 1187, `models.tsx` 1182; 59 files >500 lines | Lift data-fetch into hooks, extract presentational components, ~400-line cap |
| 22 anime skin files (~330KB) auto-bundled via `import.meta.glob` | high | Dead-code | comfort-ui | `anime_gallery.tsx:10-11` globs `anime_art_*`/`anime_concept_*`; 15+7 files, `anime_concept_7.tsx` 44KB | Promote the one used design; delete the other 21 + the glob gallery |
| 633M of webm duplicated between `public/inmotion` and `dist/inmotion` (real copies) | high | Bloat | comfort-ui-bloat | `du` both 633M/1240 files; different inodes; Vite publicDir copies on every build | Serve inmotion via static route/symlink; exclude from publicDir; reclaims ~633M |
| README contradicts the entire FLUX/motion doc set about what the system IS | high | Docs | docs | `README.md:5` "No model, no diffusion, no random seed" vs `FLUX*.md` + spiral docs describing resident 34GB FLUX; `CARTOGRAPHY.md:14-18` admits "two products fused" | State plainly the checkout holds TWO products; stop asserting "no diffusion" repo-wide |
| `out/assets` 42,985 mp4 dirents from runaway broadcast loop (6 unique inodes) | high | Process-chaos | subsystems | `find out/assets -name '*.mp4'` = 42,985; `printf %i \| sort -u` = 6 inodes; ~1,300 dirents/hr over 33h | Stop the broadcast loop; reuse the existing SHA entry; prune to 6 files (gitignored, safe) |

---

## Findings by Category

### Bloat
- **comfort-ui is 2.5G but only 7.4M is source.** The bloat is runtime/generated: render-store logs 841M, `dist/` 654M, `public/` 649M, `node_modules` 378M. `src` = 7.4M / 463 ts(x).
- **633M webm duplicated** between `public/inmotion` and `dist/inmotion` (separate inodes; Vite re-copies every build). Fixing publicDir shrinks `dist/` 654M → ~21M.
- **272M stale `jobs.log.bloat-20260609-112230`** is a manual pre-trim snapshot, regenerable, deletable now.
- **`@react-three/drei` pulls ~84M** of transitive mediapipe/hls.js/stats-gl for only **3 source files** (vs 161 using framer-motion). Investigate replacing with raw `@react-three/fiber`.
- **~700M of `node_modules` across two redundant trees** (comfort-ui 378M + render-ui 319M) for one app; version skew (three 0.172 vs 0.184, vitest 4.x vs 2.x).
- **`model_manager.py` is 3,878 lines / 196KB** mixing catalog, downloads, pipeline build, ffmpeg, rsync/aws, 49 endpoints, governor thread, and llama clone+build — one import error takes down all of :8188.
- **`_keep/` 34M + `maestro/` 6M** committed binary media (force-included via `.gitignore` negation) — the largest tracked blobs.

### Architecture
- **Dev server = production backend** (critical, above). No standalone server in comfort-ui despite render-ui having one.
- **Mega-store coupling:** `useStudio` called at **533 sites**; 7 Zustand stores; 255 `fetch()` (76 inline in design components); 19 hardcoded endpoints (`:8188`, `:8092`, `*.influx.vision`).
- **Sprawling nested tree:** `designs/render/views` has 22 view subfolders; same concept ("universe"/"motion"/"metal") appears as both a top-level design AND a `render/views/*`; `src/design/` vs `src/designs/` is a one-char footgun.
- **17,443 lines of Go in a single flat `package main`** (39 files) with **one** test file.
- **No backpressure** on the spiral queue: file-glob polling, unbounded depth, `FRAME_BATCH=4` with bare try/except OOM fallback.
- **PIPELINE.md (deterministic ray-tracer) and FLUX_ARCHITECTURE.md (diffusion) contradict each other** on the core thesis with no cross-reference.

### Process & Orchestration Chaos
- **No single orchestrator.** 8 `while true` shell loops + 17 `Restart=always` units; 4+ daemons (`guardian.sh`, `saturator.py`, `refill.sh`, `gpu-experiment-daemon.sh`) independently force GPU-busy and fight each other.
- **Three "keep the device busy" schemes layered:** spiral FLUX worker, `explore --loop --all-gpu`, `explore --loop --all-native` — coordinated only by an ad-hoc `.gpu-priority` file.
- **`--parallel 12` on the GPU lane is a dead flag** — `explore.go:267` GPU lane is hardcoded serial; `--parallel` only gates the CPU semaphore. Misleading knob in `gpu-experiment-daemon.sh:85`.
- **`install.sh` is broken** — installs `comfort-dev.service` which does not exist; comfort-render is `disabled` yet `active` (started by hand, won't survive reboot).
- **CPU loops hardcode `$HOME/render`** (`cpu_metal_loop.sh`, `shaderlab_loop.sh`) and default `COMFORT_URL=:3174` (dead) instead of `:9731`.

### Duplication
- **render-ui ↔ comfort-ui**: whole-app fork (the single biggest duplication). 446 same-path files, 58 byte-identical (~11.8k lines), forked ~3k-line `devServer.ts`.
- **metal/ vs motion/ design trees**: 6 duplicated 600-line section files; `processing.tsx` 610 lines with only 8 diff lines (99% identical).
- **≥4 parallel "experiments" implementations** (`experiments.tsx`, `flux_experiments.tsx`, `universe/sections/experiments.tsx`, `render/views/experiments/index.tsx`).
- **`style.go` triplicated** across the 3 Go binaries; go directives split (1.23 vs 1.25).
- **Duplicate `fleet-join.sh`** (116 vs 45 lines); duplicate byte-identical `server/README.md` in both UIs; `shaders.data.ts` byte-identical across both UIs.

### Dead Code
- **22 anime skins (~330KB)** auto-bundled (above).
- **`anima.tsx` + `overtone.tsx`** have 0 importers; `'anima'` is a phantom `DesignKey` with no registry entry; 38 `designs/*.tsx` files but only 17 registered.
- **14 "DOA" renderers** (tesseract, solid, mandala, …) crash on import yet remain registered, kept alive by a deny-list in `explore.go:510-516`.
- **`lumen/`** — near-abandoned parallel Python CLI re-wrapping the same `source/` corpus; only **1 of 161** renderers ported; no external caller.
- **`ui/anime.productions`** — never `npm install`ed, no Caddy vhost, 0 importers, pins react ^19 / vite ^8; abandoned third app.
- **`solid.influx.vision` → :9732 is down** while Caddy still proxies → 502 (dead vhost).
- **Enigma `.ls`/`.lion` shader DSL is dead on Linux** ("emits invalid GLSL"); the served shaders are hand-written GLSL pretending to be compiler output.

### Resource & GPU
- No per-process VRAM cap, no MPS, no semaphore (critical, above).
- The "governor" is a saturation pump (`saturator.py` docstring: "keeping the GPU busy"); SAT_TARGET=70 floor.
- 27GB unbounded `spiral/renders`; 870MB render-store logs; 42,985 `out/assets` dirents.
- OOM silently swallowed (`worker.py:416-423` bare except → per-frame fallback, no backoff, no signal).
- `saturator.gpu_util()` returns 100.0 on nvidia-smi error → "assume busy" disables stall detection exactly when monitoring breaks.

### Build & Git Hygiene
- **Healthiest domain.** `.gitignore` correctly ignores build binary, `out/`, `dist/`, `node_modules`, render-store logs, `public/inmotion`. Of 3.0G on disk, ~60MB tracked.
- One leak: `jobs.log.bak.1780457369` (1.27MB) committed-then-deleted, now permanent in the 67.56MiB pack (low; optional `filter-repo`).
- 4 uncommitted modified `.tsx` (live Motion-session edits) — commit/stash before refactor. 361 of 503 commits in last 7 days, direct-to-main.

### Security
- No firewall + 0.0.0.0 binds (`:8092` wire, NFS `:2049`, rpcbind `:111`).
- Public unauthenticated vite dev server behind `render.influx.vision`.
- 49 unauthenticated process-control POST endpoints (pkill/systemctl/git clone).
- `flux_server.py:612` + `fleet_agent.py:38` default to `0.0.0.0` with no auth.
- `cloudflared/jupyter-tunnel.json` is **world-readable** (`-rw-r--r--`) — `chmod 600` + rotate.
- 8 stale `/etc/caddy/Caddyfile.bak.*`; repo Caddyfile drifted from deployed.
- **No hardcoded secrets in source** (positive).

### Docs
- 1040 md on disk, **971 in node_modules**; only ~68 tracked project docs (~823KB).
- README "no diffusion" thesis contradicts the running FLUX system (high, above).
- Repo systemd docs describe a fictional topology; real units undocumented (critical, above).
- ~300KB of AI-generated `research/*.md` with leaked LLM scaffolding ("Let me write the final report") on a tangential subject.
- Auto-appended session journals (`observation-log.md` 28KB, every 5 min) committed as docs.
- Stale counts: README "161 scripts" vs CARTOGRAPHY "107" (actual 161); README "260" vs "224" subcommands in the same file; `HANDOFF.md` references nonexistent `/home/ubuntu/comfort/comfort-ui`.

---

## Quantified Bloat

| Item | Size / Count | Tracked in git? | Notes |
|---|---|---|---|
| comfort-ui total | 2.5G | partial | src only 7.4M / 463 ts(x) |
| → render-store logs | 841M | no (ignored) | jobs.log 303M + renders.log 208M + assets.log 49M + **bloat backup 272M** + posters 11M |
| → dist/ | 654M | no | mirrors public/; inmotion 633M, fixable → ~21M |
| → public/ | 649M | no | inmotion 633M, universe 16M |
| → node_modules (comfort-ui) | 378M | no | drei tail ~84M for 3 files |
| render-ui total | 333M | partial | node_modules 319M, src 9.2M (348 .test files), dist 4.8M |
| node_modules (both UIs) | ~700M | no | two trees, one app, version-skewed |
| spiral/renders | 27G | no | 29,052 entries / 9,651 flat webm; never pruned |
| out/ | 38M | no | out/assets 42,985 dirents, 6 unique inodes |
| _keep/ | 34M (16 files) | **yes** | force-kept museum mp4/png |
| maestro/ | 6.3M (7 files) | **yes** | source clips, live |
| build/render | 10.5M | no | ARM Go ELF, never committed |
| .git | 88M | — | 67.56MiB pack; ~60MB intentional media |
| md files | 1040 / **69** real | partial | 971 in node_modules |
| comfort-ui src | 114,479 lines / 466 files | yes | 59 >500 lines, 6 >1000 |
| render-ui src | 154,284 lines / 788 files | yes | larger than the original it ports |
| cli/render | 17,443 lines / 39 .go | yes | 1 flat `package main`, 1 test |
| Immediately reclaimable | ~1.0–1.7G | — | 633M inmotion dup + 272M bloat + 84M drei + 27G renders prune |

---

## Recommended Actions

### P0 — Stop the chaos & close the perimeter (do first)

1. **(safe)** Close the security perimeter: `ufw` allow only 22/80/443; `chmod 600 /etc/cloudflared/jupyter-tunnel.json` and rotate; bind `wire-server`, `flux_server`, `fleet_agent` to `127.0.0.1`. Add Caddy basicauth/token in front of `render.influx.vision` `/api`.
2. **(safe, reversible)** Pick **ONE** GPU-saturation owner. Disable the redundant ones: stop/disable `spiral-refill` (overlaps saturator), `gpu-experiment-daemon.sh`, the `cpu_loop.sh`/metal/shaderlab while-loops, and the `model_manager._governor_loop` (confirm `governor.json` defaults OFF). This alone should let the GPU idle.
3. **(safe)** Pick ONE restart authority for `spiral-worker` — remove restart logic from either `guardian.sh` or `saturator.py`; keep systemd `Restart`.
4. **(safe)** Add a min inter-round sleep + GPU/VRAM headroom check and a crash-loop guard to `explore.go`'s `--loop`; reject/warn `--parallel` with `--all-gpu`.

### P1 — Governance, leaks & isolation

5. **(moderate)** Port render-ui's `standalone.ts` to comfort-ui so the API has its own event loop; serve built `dist/` statically and stop running `vite dev` as production.
6. **(safe)** Set per-process VRAM budgets (`set_per_process_memory_fraction` for FLUX, cap llama ctx/ngl) and add `MemoryMax`/`TasksMax`/`StartLimit` backoff to all 17 units.
7. **(safe)** Reclaim disk: delete `jobs.log.bloat-*` (272M); prune `out/assets` to its 6 unique files; add log rotation for jobs/renders/spiral.jsonl; add a `renders/` retention reaper (last N or X GB). Run fp8 transformer or T5 offload to free ~12-16GB VRAM.
8. **(moderate)** Make `/home/ubuntu/spiral` a checkout/symlink of the repo (single source of truth); commit the real `~/.config/systemd/user/spiral-*.service` into the repo; delete the stale `ui/inference/spiral/systemd/` set; write one `SERVICES.md` matching `systemctl --user`.
9. **(safe)** Replace the 30 per-component `/api/render` polls with one shared SSE/WS subscription gated on tab visibility; one server-side nvidia-smi sampler (never spawn a subprocess per request).

### P2 — Duplication, dead code & docs

10. **(decision)** Choose React (comfort-ui, live) or Solid (render-ui) and **freeze/archive the loser**; until then, no dual-porting. Extract framework-agnostic `lib/`, `wan/io/midi`, `wan/state/*Graph` into one `@ui/core` workspace package; convert `ui/` into a single npm/pnpm workspace.
11. **(safe)** Dedupe metal/ vs motion/ section files into one parameterized module; collapse the 4 experiments implementations into one feature module.
12. **(safe)** Delete dead code: 21 unused anime skins + glob gallery, `anima.tsx`/`overtone.tsx`, phantom `'anima'` key, `lumen/`, `ui/anime.productions` (or move to own repo), remove the `solid.influx.vision` vhost. Remove/fix the 14 DOA renderers instead of the deny-list.
13. **(safe)** Fix README to state the checkout holds two products; move `research/*.md` AI drafts and session journals into `archive/`; fix stale counts/paths.
14. **(moderate)** Split `model_manager.py` and the flat Go `package main` into modules; add tests for the explore lane-splitting and store slices.

---

## Per-Domain Appendix

**comfort-ui architecture** — 114,479 lines / 466 files; 59 >500 lines, 6 >1000 (`tuning.tsx` 1451/40 hooks). metal/motion forks (`processing.tsx` 8 diff lines), 22 anime skins ~330KB glob-loaded, ≥4 experiments impls, 533 `useStudio` sites, 30 pollers @1.5-2.5s, dead `anima`/`overtone`.

**comfort-ui bloat & deps** — 2.5G; logs 841M (272M stale bloat), dist 654M = public dup (inmotion 633M), node_modules 378M (drei tail ~84M for 3 files). ~1.0-1.7G reclaimable. Deps otherwise lean.

**render-ui (+anime.productions, contract)** — Solid port: 333M, 980 src (348 .test), 633 same-path as comfort-ui. `:9732` DOWN, Caddy 502. anime.productions abandoned (no node_modules, 0 importers). contract = doc-only, fine but stale path.

**UI cross-project duplication** — render-ui full fork (788/154k vs 463/114k); 446 same-path, 58 byte-identical (~11.8k lines); forked `devServer.ts` (3173 vs 2819); ~697M node_modules x2; 4 Vite apps.

**UI ↔ backend data flow** — Vite dev IS prod; ~40 `/api/render/*` routes in plugins; render-store ~870MB read synchronously on start; explore loop + 77 pollers; comfort-ui lacks render-ui's `standalone.ts` isolation fix.

**render binary & CLI** — 10.5M Go ELF (gitignored, clean). `explore --loop` no throttle; 3-deep self-exec; two while-true wrappers; `--parallel 12` GPU = no-op; 17,443 lines flat `main`, 1 test; 240K committed `arrange.html`; 14 DOA renderers.

**inference / model-manager** — mm_server clean catalog-only (NOT the VRAM holder). worker.py holds 34GB fp16 (fp8 path unused). `model_manager.py` 3878 lines, 49 unauth endpoints, in-process governor; spiral worker + saturator + Restart=always self-reinforce.

**spiral-\* family** — 5-service closed loop to peg GPU. 3 producers, 2 restart authorities, controller railed (56% no-ops), 27GB renders unbounded, source forked across 3 locations (one points at nonexistent venv).

**process & shell orchestration** — 8 while-true loops, 17 Restart=always, no orchestrator. `/home/ubuntu/spiral` untracked & live. 2 conflicting spiral unit families. `install.sh` broken. CPU loops hardcode `$HOME/render` / wrong `:3174`.

**GPU & resource governance** — NO arbiter; "governor" is a saturation pump (floor 70). No VRAM cap (FLUX 34GB + llama 24GB / 98GB). 17 units, 0 MemoryMax, 0 MPS. Priority-yield honored by 1 of N consumers. OOM swallowed.

**smaller subsystems** — transport/framestats/focus/audio/web/examples/overtone all ALIVE & wired (keep). `out/assets` 42,985 dirents (6 inodes) from broadcast loop. `lumen/` dead refactor (1/161). `_keep`+maestro ~40M committed media.

**shaders & render pipeline** — pipeline clean (224 entries → 107 scripts, 0 orphan/missing). `.ls`/`.lion` DSL dead on Linux; served GLSL hand-written. `gen_shaders_gallery.py` writes to nonexistent `~/comfort`. PIPELINE.md vs FLUX_ARCHITECTURE.md contradict.

**git & build hygiene** — cleanest domain. `.gitignore` correct; ~60MB of 3.0G tracked. One stale committed `jobs.log.bak` (1.27MB) in pack. 4 uncommitted .tsx. Repo would clone clean.

**documentation sprawl** — 1040 md (971 in node_modules); ~68 real / ~823KB. README contradicts FLUX docs; repo systemd docs fictional; ~300KB AI-filler research docs; auto-appended journals; stale counts/paths.

**config, services & security** — no hardcoded secrets (good). No firewall; 0.0.0.0 binds; unauth public vite dev server; 49 unauth process-control endpoints; world-readable cloudflared creds; 0 resource limits on 7 user services.
