Research & Product Vision

Internal · Confidential

Thesis

Hypernym is the reality substrate for the post-Bitter-Lesson era.

Five reasoning models, four ideation rounds, ~20 outputs. What the panel converged on: the unit of product is the Persistent Domain Schema; the empirical proof that 75% of attention is structural noise demands a new architecture; and Hypercore + Modulum together compose the first commercial Grounded State Compiler.

Reasoning model panels (Codex, Claude, Gemini, Gemma, Grok)

Ideation rounds (R7 · R7.5 · R7.6 · R7.7)

60+

Product, feature, and breakthrough ideas catalogued

110

OSS items mapped: lift, extend, co-position

Panel-Convergent Architecture

An LLM that routes attention through a PDS is a different kind of system than an LLM that searches a token field. Five models with very different priors arrived at the same architecture in five different names: Reality Substrate (Claude), Grounded World Kernel (Codex), Verifiable Causal Engine (Grok), Bicameral Architecture (Gemma), Hypernym World Model (Gemini). One concept, five mouths. That is the actual market signal pushing through.

01 · The Stack

Five compounding zero-to-ones.

Each step exists today; each enables the next. The stack is what makes Hypernym non-cloneable: every layer below has to exist before the layer above is even legible.

75%

Empirical · Modulum · Today

Most attention is noise.

Across 4 architectures (Llama 3.1 8B, MiniMax M2.5 228B, plus two others), approximately 75% of attention heads contribute structural noise; 25% carry signal. The implication is architectural, not engineering. Demands publication-grade replication.

PDS

Primitive · Hypercore + Modulum

Persistent Domain Schema.

The unit of product. Entities, facts, confidence, provenance, embeddings, vocab window. Compiled from a corpus, mounted as substrate at inference time. Convergent across all four rounds and 5 of 5 models. Karpathy markdown vaults, Mem0, OpenHarness MEMORY.md are downstream of this. Hypernym's wedge is grounded PDS: mechanical confidence with structural provenance.

GSC

Compiler · Hypercore → Modulum

Grounded State Compiler.

Takes Hypercore truth (Omnifact 60-trial extraction, HyperRemember reranking, Compressed Repo Analyze 87%) and outputs a Modulum-loadable state. Already the beachhead in Forge: dispatch-core/src/hypernym.ts ships S37/S38 cache-first FORGE.md compression in production. The concept compiles outward to every domain.

Architecture · Modulum-Native · Lab

Reality Substrate.

The agent's attention routes through the PDS, not a token field. Every claim has provenance and confidence. Five models named the same architecture five different ways: Reality Substrate, Grounded World Kernel, Verifiable Causal Engine, Bicameral Architecture, Hypernym World Model. The convergence is the signal.

Runtime · 12-week reference model · Lab

Bicameral Architecture.

Substrate and LLM as two cooperating minds. The substrate guarantees mechanical truth (PDS, confidence, provenance); the LLM does inference. Reference implementation: Modulum-7B-Native, continued pretrain of Qwen 3.5-7B or Llama 3.1-8B with attention-modification objectives. Four of five panel models picked this option. Once it exists, the PDS spec becomes load-bearing infrastructure.

02 · Primitives

Concepts that ≥3 of 5 models named.

Different words, same building blocks. Not products: building blocks. The primitives compose; the products are downstream compositions of them.

Persistent Domain Schema (PDS)

5 / 5

Entities, facts, confidence, provenance, embeddings, vocab window. The unit of compiled domain expertise. Karpathy vault, but grounded.

Grounded State Compiler

5 / 5

Hypercore truth into Modulum-loadable state. Already shipped in Forge as cache-first FORGE.md compression. Compiles outward to every domain.

Reality Substrate

5 / 5

Attention routes through PDS, not a token field. Five names for the same convergent architecture: Reality Substrate, Grounded World Kernel, Verifiable Causal Engine, Bicameral, Hypernym World Model.

Substrate Router

4 / 5

OpenAI-shaped endpoint that runs compress → ground → CBAS → route on every dispatch. The trunk line: until it exists, every other Hypernym primitive is advisory.

Confidence-Bound Action Substrate

5 / 5

Executor refuses high-cost or high-risk tools (rm -rf, deploy_to_prod) if confidence falls below threshold. Per-track τ via replay calibration.

Attention as DB Query

3 / 5

Reframes attention from "search a learned token field" to "route through a structured substrate." The structural insight Modulum is built on.

Aether-Mem

3 / 5

Four primitives on the memory router: remember, recall, reconcile, replay. Confidence-gated writes. Vendor-neutral. Closes the "memory stores but does not surface" failure.

Semantic Quorum

3 / 5

60-trial stochastic check. Stable attractors score high, drifting attractors score low. Catches hypotheses that are linguistically smooth but semantically incoherent.

Multi-Agent Consistency Ledger

3 / 5

Claims posted, arbitrated mechanically. CXDB disputes edge between dissenting reviewers turns round-2 disambiguation from human re-reading into machine-readable diff.

Knowability Machines

3 / 5

Time-indexed reconstruction of "what was epistemically available at moment T." Retros today say "we missed X." Knowability replay says "we missed X because the substrate at T did not surface it."

APN Receipts

3 / 5

Agent Provenance Network. Each reviewer verdict and findings digest signed cryptographically. Replay-resistant audit log; precondition for ever exposing review verdicts off-host.

Reality Bus

3 / 5

"Stripe for facts." Grounded, time-stamped, confidence-bound facts on a typed event bus. External agents subscribe to facts, not just events.

Substrate Diffing

3 / 5

Mechanical disagreement location across redundant PDSes. Treats N reviewer findings as N substrates; emits exactly which semantic facts differ. The scientific method for multi-model panels.

Persistent Expertise Routing

3 / 5

Modulum-native: load a compiled expertise (legal, medical, code) once; route all queries through it without re-paying ingestion cost. The "infinite context" property is downstream of this.

25%-Signal-Heads Attention

3 / 5

If 75% of heads are noise, the right architecture has only the signal heads. Substrate-1 (from-scratch) is the first commercial implementation. Highest moat, highest research risk.

PDS as Bayesian Prior

1 / 5

Codex outlier, well-developed. PDS values condition the LLM's output distribution explicitly. Mathematically distinct from "PDS as RAG context"; deserves its own track.

Living Corpus / EchoStream

2 / 5

PDS that updates continuously from a streaming source. Domain-specific freshness half-lives. Underexplored thread; high-value if execution is clean.

Hallucination Liability Index

1 / 5

Claude outlier. Quarterly Moody's-style rating of top-40 AI products on per-domain fabrication rate. Pairs with EU AI Act Art. 52 (Aug 2026). Highest-leverage outlier in the round.

PDS.md Spec

2 / 5

DESIGN.md analog (Stitch by Google did this for design systems, 5.2K stars in 72 hours). Hypernym becomes format owner, not just vendor. Highest moat if the spec captures.

Memetic PDS

1 / 5

Claude outlier. GitHub for fact-graphs: anime canon, D&D rules. Confidence-as-leaderboard social mechanic. Plausibly viral; unique enterprise funnel.

Counterfactual Substrate

1 / 5

Branching forks of a PDS to ask "what would the system have done if fact F were different?" Substrate-diffing applied to alternative timelines. Foundation for the Counterfactual Futures Market.

Hypernym Court

1 / 5

Multi-agent arbitration. When two agents disagree, who decides? Mechanical fact-graph diff and contradiction resolution. Creates a market that does not yet exist.

Substrate-Mounted ToolSearch

1 / 5

Tool calls within a dispatch carry the same substrate annotation. Bash during BUILD sees Grind-mode rules; the same call during PLAN sees Pivot-mode rules. PDS conditions tool selection.

Self-Evolving Harness Compiler

1 / 5

Hypercore profiles harness logs, proposes patches; the next sprint SEED reads them automatically. The retro-to-improvement edge becomes a graph relation, not a human chore.

Fidelity Dial

2 / 5

Per-dispatch knob trading compression aggressiveness against fidelity. Already exposed in Hypercore. Codex's Fidelity Dial Scheduler tunes it per-FSM-node.

PDS-Aware Cache Partitioning

1 / 5

Gemma. PDS sections become SGLang prefix-cache partitions; mass-shareable across requests on a hyperscaler cache plane.

Contradiction Atlas

1 / 5

Codex H8 outlier. Catalogues active disputes across documents, models, and time. Researchers and journalists; structural input to HLI.

In-Place TTT × PDS

1 / 5

ICLR 2026 oral: fast-weights as memory. PDS-into-fast-weights is a plausible product wedge: structured TTT input that is also human-auditable.

03 · Primary Artifact

The big matrix.

All ~60 ideas. X axis: distance to ship (now to 2-year lab). Y axis: research depth (engineering to frontier). Color: surface (Hypercore-now / Hypercore-future / Modulum-now / Modulum-future / Lab / Outlier). Size: panel convergence (1 to 5 of 5 models). Hover any item for detail. This is a matrix to process, not a pre-filtered top five.

Hypercore today (public APIs)

Hypercore future

Modulum today (proven)

Modulum future

Lab / world-model

Outlier / high-variance

Bubble size = panel convergence (1–5 votes)

04 · By Surface

Where each idea lives.

Hypercore surfaces convergent product-grade work. Modulum surfaces the architecture that proves the thesis. The Lab is where world-model and simulation experiments stress-test the substrate.

Hypercore · Now

Public APIs (Omnifact, HyperRemember, Compressed Repo Analyze, Magic plugin). Buildable today.

Hypernym Vault Grounded memory for Goose / OpenHarness / Cline. 5/5 panel · the cash and distribution flywheel
SectorPack API Sector-tuned Omnifact (legal, medical, finance, academic). 5/5 panel · monetizes a latent sector: param
GroundedNotes / Receipt Vault Obsidian-first, B2C prosumer. 5/5 panel · Karpathy-vault but grounded
DomainForge CLI domainforge compile --source ./docs --output my.pds --sector legal. 3/5 panel · the "git" for knowledge compilation
Omnifact-arXiv Public fact-graph + SPARQL. CC-BY 4.0. 5/5 panel · NeurIPS / ICLR Verifiability Reports
CBAS Runtime Confidence-bound action gating before tool execution.
Magic plugin v1 Already shipping; extends to substrate-diff in CR panels.
Compressed Repo Analyze 87% compression. Bootstraps any repo into a PDS.
Goose Expert Pack Exchange Marketplace. Stripe pack, Terraform pack, FDA pack. Codex outlier · network effect

Hypercore · Future

Extensions of the public-API stack: calibration, packaging, ecosystem standards.

Sector PDS Hub Vertical SKUs. Legal, medical, finance, energy, research.
PDS-to-LoRA pipeline Customer uploads PDS; service generates synthetic Q&A; LoRA. 3/5 panel
Grounded Expertise Cartridge H2 outlier. Pre-built per-domain bundles.
Memetic PDS GitHub for fact-graphs. Anime canon, D&D, Wiki-style.
arXiv Citation Standard "Verifiability Report per accepted paper." Standard capture.
LongMemEval-Grounded Cert + HLI Category capture. Pairs with EU AI Act Art. 52.
Hyper-Synthetix PDS-driven synthetic-data factory for vertical model trainers.
Living Corpus / EchoStream Streaming PDS with freshness half-lives.
Magic plugin v2 Cross-doc contradiction surfacing in real time.
PDS.md spec DESIGN.md analog. Format-as-product.
RepoTwin Compressed Repo Analyze + Dreamer 4 to pre-flight agent commits.

Modulum · Now

What is empirically true today: observations and software-only Modulum-equivalents.

75%-Attention-Is-Noise Empirical study across 4 architectures. The structural finding the rest stands on.
3.04× Decode Speedup Demonstrated on Modulum substrate, no weights modified.
Infinite Context via Attention Modification Software-only; ships through inference-time substrate-loading.
Cache-First FORGE.md Compression Forge S37/S38, already production. dispatch-core/src/hypernym.ts
HyperRemember Embeddings + fact-based reranking. Substrate query layer when Modulum runtime is mocked.
Omnifact 60-trial Quorum Stochastic semantic-fact extraction with confidence. A Semantic-Quorum primitive in production.
Substrate-Mounted Inference (SW) Pure software composition (HyperRemember + clever prompting). Reproduces Modulum substrate behavior on any base. Crafter MVP uses this.

Modulum · Future

Architecture work: the reference model, the runtime, the chip story.

Modulum-7B-Native Continued pretrain of Qwen 3.5-7B / Llama 3.1-8B with attention-modification objectives. 4/5 panel · 12 weeks · category-defining
Substrate-Native Harness LoRA + pre-attention substrate-loading hooks via SGLang RadixAttention or vLLM. OSS Python package.
Substrate-1 (from-scratch) 1B–4B with 25%-signal-heads-only attention. The "we know the architecture is wasteful, build the right one" play.
Modulum Runtime (hosted) 3-of-5 R7.7 outlier convergence: skip-training-build-runtime. modulum serve <any-hf-model>.
25%-Signal-Heads Attention First architectural implementation of the noise-pruned head class.
PDS-as-Shareable-Prefix Mass-shareable SGLang prefix cache. 40-60% token-cost reduction vs RAG.
KV-as-Substrate Treat KV cache as queryable substrate, not ephemeral compression.
In-Place TTT × PDS Fast-weights as memory; PDS as the structured TTT input.
Distill-Frontier-into-Substrate-Native Gemma outlier. Claude / GPT-4o teacher into Phi-4 student via SFT. $30–50K pre-flight.
Modulum chip narrative Hardware ceiling rising (Flash Attention 4 at 1605 TFLOPs/s). Modulum kernels ship via vLLM.

Lab

World models, simulations, falsifiable tests. Where the thesis is stress-tested.

Crafter v1 substrate-mounting MVP 5/5 panel pick. 21 days · ~$40K · publication-grade falsifier
SWE-Bench Verified v2 5/5 outlier convergence. 6–8 weeks after Crafter. Real-codebase substrate test.
Dreamer 4 grounded variant Diamonds-in-Minecraft from offline data. PDS-grounded equivalent loads compiled domain facts.
Genie 3 textual analog DeepMind ships visual world models; Hypernym's lane is textual / factual.
NetHack lifetime-learning + PDS Persistent expertise across deaths.
Brax / MuJoCo physics-grounded Physics-domain PDS: every fact has a physical referent.
AgentGym-RL RL-train agents with PDS for compounding session expertise.
Gym-Anything (200 apps) Software-as-environment paradigm. Hypercore facts pre-populate every env.
WebArena substrate Browser-task agent benchmark; PDS for "browser tasks at site X."
ml-intern HF post-training Substrate-mounted research-loop agent.
MiroShark cross-pollination Swarm-intelligence sim engine; agent influence leaderboard scored by Omnifact.
TimesFM time-series PDS Specialized PDS for time-series forecasting; loadable into Modulum.
Heaviside-style domain foundation models EM 800,000× faster than commercial sim. Direct PDS-for-physics analogy.
CORAL multi-agent discovery Self-evolving agents need state continuity; the Grounded State Compiler is the missing piece.
Counterfactual Futures Market Branching PDS forks. Foundation for HLI.
Hypernym Court Multi-agent arbitration mechanism.
Contradiction Atlas Catalogues active disputes across docs, models, time.
VeriBrand Algorithmically verifiable marketing: brand uploads spec PDS, content auto-checked.
ProvenanceShield / ShieldRuntime Anti-hallucination streaming proxy; SSE-token-retract via MCP.
APN Receipts Cryptographically signed reviewer verdicts.
Knowability Replay Time-indexed reconstruction of what was knowable when.
Self-Evolving Harness Hypercore profiles harness logs and proposes patches.

05 · World Models

The cleanest place to falsify the thesis.

Schmidhuber's Neural World Model Boom essay positions world models as the substrate competitor to next-token LLMs. Hypernym's lane is the textual / factual world model, the one with provenance. Five models on the panel, asked to design a 3-week MVP, picked the same environment.

5 / 5 Panel Lock · MVP

Crafter v1 substrate-mounting MVP UNANIMOUS

Open-world Minecraft-like with 22 achievements as built-in benchmarks. Symbolic state (grid, inventory, vitals) maps 1:1 to PDS entities. Published baseline (Hafner 2022 geomean; DreamerV3 ~14%, SPRING/GPT-4 ~27%). Clean A/B isolates the substrate layer.

Crafter env → Compressed Repo Analyze → Omnifact bootstrap → HyperRemember.upsert → Substrate-mounted Qwen2.5-7B

Target. 1.5× sample efficiency over 100 train eps; ≥+8 absolute Crafter points; 95% bootstrap CI excludes zero.
Hardest falsifier (Codex). A baseline given a hand-written 20-fact static checklist reproduces the gain — meaning the PDS does no runtime work. Pre-registered.
Vocab window. ~90–120 canonical tokens (17 actions + 22 achievements + ~20 objects + ~12 predicates).
Fact tiers. Repo-rule (Omnifact bootstrap, conf 0.90–0.97); transition (env-observed, 0.95); strategy (post-episode distillation, 0.65–0.90).

5 / 5 Outlier · v2

SWE-Bench Verified UNANIMOUS

Every model on the panel was asked for a "wildly different" outlier MVP. All five picked SWE-Bench. Reason is structural: highest-blast-radius LLM-agent benchmark, and Compressed Repo Analyze finally earns its keep on real code (Crafter does not really need it).

Per-repo PDS. Entities = files, functions, tests, issues. Facts = call graphs, deps, type sigs, tracebacks.
Target. +8–15% absolute pass rate over baseline (SOTA is 3–6%, so +8% is publication-worthy).
Sequence. Crafter v1 (3wk); if positive, fund SWE-Bench v2 (6–8wk).

Environment evaluation

Why Crafter wins.

Environment	Hypercore-ingestable	Clean A/B	Failure decomposable	Panel verdict
Crafter	Yes, symbolic state	Yes, published baseline	Yes, per-achievement	5 / 5 — PICK
SWE-Bench Verified	Yes, repo PDS	Yes, pass rate	Yes, per-test	5 / 5 — v2 outlier
AgentGym-RL	Multi-ontology friction	RL confounds	Mixed	Rejected, confound risk
NetHack LE	Extreme partial obs.	Strong	Hard	v3+ candidate
Brax / MuJoCo	Continuous physics	Strong	Numeric	Wrong domain story
Gym-Anything (200 apps)	Yes, software state	Per-app	Strong	Too broad for v1
WebArena	DOM-shaped	Strong	Per-task	v3+ candidate
SCADA / clinical sim	Domain-shaped	No published baseline	Hard	Customer story, not v1

Surrounding research the substrate work composes with

What we lift, what we extend, what we co-position with.

Schmidhuber — Neural World Model Boom

Authoritative survey from the field's founder. Positions WMs as the substrate competitor to next-token LLMs. Modulum's PDS-into-attention is structurally the same bet, but with the grounded provenance Schmidhuber does not address.

Dreamer 4

Diamonds in Minecraft purely from offline data (Hafner). PDS-grounded equivalent: load a domain's Hypercore-compiled facts as the offline corpus, then deterministic agent expertise.

Project Genie 3

DeepMind's interactive 3D world model. Visual WM lane is taken; textual / factual WM lane is open. Co-positioning, not collision.

PufferLib 2.0

RL at 1M steps/s with vectorized environments. PDS-into-Modulum becomes the "expertise prior" PufferLib environments load.

In-Place Test-Time Training (ICLR 2026 oral)

Continual learning via projection-matrix fast weights. PDS-into-fast-weights is a plausible product wedge: structured TTT input that is also human-auditable.

CALM (Continuous Autoregressive)

Replaces next-token with vector-prediction at K-token chunks. Hypernym's attention-DB-query frame aligns: the structured chunk is a PDS-block.

SGLang RadixAttention

25K stars, 400K GPUs deployed. Prefix-cache and KV-share ecosystem; PDS as mass-shareable prefix. The cache plane Modulum substrate slots into.

TurboQuant / RocketKV / ChunkKV / Expected Attention

Training-free KV compression at production scale. ChunkKV's "semantic chunks" maps cleanly to "fact units" in Hypercore. Expected Attention's future-query awareness implies PDS knows the future-query distribution per domain.

MiroShark

Swarm-intelligence simulation engine; daily autonomous influence leaderboard already scored. Hypernym Omnifact extracts claim-grade facts; Modulum-as-sim-substrate plugs straight in.

TimesFM

Google open-source time-series foundation model, 100B data points zero-shot. Specialized FMs for narrow modalities. PDS analog: time-series persistent expertise loadable into Modulum.

Heaviside

Foundation model for electromagnetism, 800,000× faster than commercial sim. Direct analogy: PDS for physics-domain.

CORAL

Autonomous multi-agent open-ended scientific discovery. Self-evolving agents need state continuity, and the Grounded State Compiler is the missing piece they hand-wave.

06 · Architecture Breakthroughs

Concept-level zero-to-ones.

The list overlaps the §02 primitives but is read at the engineering-architecture grain. Names you could put on a whiteboard and build to.

PDS as a unit of product

Not a file format, not a database. A compiled domain expertise that mounts as substrate at inference time. Convergent across all four rounds. Karpathy markdown-vault is the meme; PDS is the formal version.

5 / 5 panel

Grounded State Compiler

Hypercore truth (Omnifact + HyperRemember + Compressed Repo Analyze) compiled into Modulum-loadable persistent state. Already operational in Forge as cache-first FORGE.md compression.

5 / 5 panel

Reality Substrate

Architecture where attention routes through a structured PDS rather than searching a learned token field. Five panel models named the same thing five different ways.

5 / 5 panel

Bicameral Architecture

Substrate and LLM as two cooperating minds. Substrate guarantees mechanical truth (PDS, confidence, provenance); LLM does inference. Reference implementation: Modulum-7B-Native.

3 / 5 panel

75%-Attention-Is-Noise (empirical)

Empirical study across 4 architectures. The structural finding the entire stack rests on. Demands publication and replication.

Modulum

25%-Signal-Heads Architecture

If 75% of heads are noise, the right architecture has only the signal heads. Substrate-1 (from-scratch) is the first commercial test. Highest moat, highest research risk; a multi-quarter program.

Lab

Substrate Router

OpenAI-shaped endpoint that runs compress → ground → CBAS → route as a single chokepoint. Promotes Forge's grounding-gate.ts from observability v1 to enforcement v2.

4 / 5 panel

Confidence-Bound Action Substrate

Executor refuses high-cost or high-risk tools below a confidence threshold. Per-track τ via replay calibration. Closes "burn budget on low-confidence retries" mechanically.

5 / 5 panel

Aether-Mem Contract

Four primitives on top of any memory provider: remember, recall, reconcile, replay. Confidence on writes; substrate-diff on disagreement; lineage walk on replay.

3 / 5 panel

Semantic Quorum

N-trial stochastic check on attractor stability. Generalizes Omnifact's 60-trial mechanic to any attestation. Catches linguistically smooth but semantically incoherent claims.

3 / 5 panel

Knowability Machines

Time-indexed reconstruction of "what was epistemically available at moment T." Walks events.jsonl, attestations, and CXDB lineage to compute the substrate state at moment of decision.

3 / 5 panel

Substrate Diffing

Mechanical disagreement location. Treats N reviewer findings as N substrates; emits exactly which semantic facts differ between reviewers. Round-2 disambiguation becomes finite and machine-readable.

3 / 5 panel

Reality Bus

"Stripe for facts." Typed event bus carrying time-stamped, confidence-bound, provenance-pointing facts. External agents subscribe to facts, not just events.

3 / 5 panel

APN Receipts

Agent Provenance Network. Each reviewer verdict and findings digest signed cryptographically (Keychain-keyed). Replay-resistant audit log that survives off-host exposure.

3 / 5 panel

PDS-Aware Attention Cache Partitioning

Gemma. PDS sections become SGLang prefix-cache partitions. Mass-shareable across requests on a hyperscaler cache plane. Bridge between Hypercore and inference-stack economics.

Lab

PDS as Bayesian Prior

Single-model bet but well-developed. PDS values explicitly condition the LLM's output distribution. Mathematically distinct from "PDS as RAG context"; deserves its own experimental track.

Codex 1 / 5

Living Corpus / EchoStream

Streaming PDS with domain-specific freshness half-lives and contradiction queues. Underexplored thread; high-value if execution is clean.

2 / 5 panel

Hallucination Liability Index

Quarterly Moody's-style rating of top-40 AI products on per-domain fabrication rate. Pairs with EU AI Act Art. 52 (Aug 2026). Highest-leverage outlier; every Hypernym SKU gets pulled by HLI demand.

Claude 1 / 5

Counterfactual Substrate

Branching forks of a PDS to ask "what would the system have done if fact F were different?" Substrate-diffing across alternative timelines. Foundation for the Counterfactual Futures Market.

Lab

Self-Indexing KVCache (PDS variant)

Riffs on the OSS Self-Indexing KVCache paper: 1-bit vector quantization unifies compression and retrieval. PDS-aware variant: query the cache by semantic fact, not vector neighbour.

Lab

In-Place TTT × PDS

ICLR 2026 oral: fast-weights as memory. PDS-into-fast-weights treats compiled domain expertise as a structured TTT input. Continual learning that is auditable.

Lab

Substrate-Mounted ToolSearch

Tool calls inside a dispatch carry the same substrate annotation. Bash during BUILD sees Grind-mode rules; Bash during PLAN sees Pivot-mode rules. PDS conditions tool selection.

Codex 1 / 5

Self-Evolving Harness Compiler

Hypercore profiles harness logs, proposes patches; the next sprint SEED reads them. Retro-to-improvement edge becomes a graph relation, not a human chore.

Codex 1 / 5

Multi-Agent Consistency Ledger

Claims posted, arbitrated mechanically. CXDB disputes edge between dissenting reviewers. Open-disagreements query becomes the seed for the next sprint.

3 / 5 panel

Hypernym Court

Multi-agent arbitration API. When two agents disagree, who decides? Mechanical fact-graph diff and contradiction resolution. Creates a market category.

Claude 1 / 5

Contradiction Atlas

Catalogues active disputes across docs, models, time. Researchers, journalists, compliance buyers. Structural input to HLI.

Codex 1 / 5

RepoTwin

Compressed Repo Analyze + Dreamer 4 to "pre-flight" agent commits in a digital twin of the repo. Aligns with branchable-counterfactual theme.

Gemma 1 / 5

Persistent-Expertise Routing

Modulum-native: load a compiled expertise (legal, medical, code) once; route all queries through it without re-paying ingestion cost. The "infinite context" property is downstream of this.

3 / 5 panel

07 · Compounding

What we lift, what we extend.

110 OSS items mapped from a 286-item bookmark sweep plus targeted X-trends search. Filtered for relevance to Hypercore (comprehension and grounding) and Modulum (PDS-into-attention).

World Models & Simulation

PufferLib 2.0 RL at 1M steps/s. PDS-into-Modulum as the expertise prior PufferLib envs load.
Crafter 22-achievement open-world. 5/5 panel pick for substrate-mounting MVP.
NetHack Learning Env Lifetime-learning. Persistent expertise across deaths via PDS.
Brax / MuJoCo Playground Physics-grounded. PDS for physics-domain; every fact has a physical referent.
AgentGym-RL Long-horizon multi-turn LLM-agent RL. RL-train agents with PDS for compounding session expertise.
Gym-Anything (CMU) 200-app software-as-environment. Hypercore facts pre-populate every Gym-Anything env.
Dreamer 4 (Hafner) Diamonds in Minecraft from offline data. PDS-grounded equivalent: load Hypercore-compiled facts as the offline corpus.
Genie 3 (DeepMind) Interactive 3D world model. Visual WM lane occupied; textual / factual lane open.
Heaviside EM foundation model, 800,000× faster than sim. Direct PDS-for-physics analogy; specialized FM template.
TimesFM Google OSS time-series foundation model. Time-series persistent expertise loadable into Modulum.
CORAL Multi-agent autonomous discovery. Self-evolving agents need state continuity; GSC is the missing piece.
MiroShark Swarm-intelligence sim engine + influence leaderboard. Modulum-as-sim-substrate plugs straight in.
Schmidhuber WM Boom essay Authoritative WM survey. Direct positioning opportunity; Hypernym = grounded WM.
Hitchhiker's Guide to World Models Field-consolidating survey. Right moment for a "grounded WM" thesis paper.

Agent Harnesses & Runtime

Goose (Block) 35K stars, most-loved OSS harness. Day-1 distribution channel for Hypernym Vault.
OpenHarness (HKUDS) 9.1K stars, MEMORY.md, MCP, ReactTUI. PDS is the missing "compiled domain" layer.
Pi-mono Simplest harness, highest cache hit, lowest tokens. Reference impl for thin harness; Hypernym = fat-skill source.
Hermes Agent v0.12 (Nous) Multi-agent Kanban. Each kanban task carries a PDS preload.
Hermes self-evolution $2 to rewrite own brain. Auto-evolving agents need durable state ground-truth = Hypernym.
Flue First headless TS agent harness. Magic-plugin-style PDS injection is the complementary layer.
AutoAgent #1 SpreadsheetBench, top GPT-5 TerminalBench. Meta-agent rewrites harness overnight; PDS shouldn't be re-derived.
Anthropic Managed Agents Hosted harness, "Dreaming" feature. Anthropic ships sessions as durable state; Hypernym's grounded dream is the differentiator.
OpenClaw + Knowledge System Knowledge templates. Lacks grounded provenance; Omnifact adds it.
Claude Artifacts (open-sourced) Sandboxed iframe. "Show me the PDS" B2C surface.
Adaptive Passport Agent acquires its own API keys. Now agents acquire their own PDS bundles.
Browser Harness Self-healing, edits helpers.py on the fly. PDS for "browser tasks at site X" directly applies.

Attention / KV-cache / Inference

SGLang 25K stars, 400K GPUs deployed (RadixAttention). Prefix-cache + KV-share; PDS = mass-shareable prefix.
vLLM Day-0 MiniMax M2.7 support, multi-agent orchestration. Modulum kernels could ship via vLLM.
CALM (Tencent + Tsinghua) Continuous Autoregressive Language Models — vector-prediction at K-token chunks. Structured chunk = PDS-block; attention-DB-query alignment.
TurboQuant (ICLR 2026) 6× memory, 8× faster on H100, training-free. Hypernym should ship a PDS-aware variant.
RocketKV 400× compression, 32.6% peak memory reduction. Aggressive eviction + sparse attention combo.
ChunkKV Semantic-chunk compression, +26.5% throughput. "Semantic chunks" maps cleanly to "fact units" in Hypercore.
Expected Attention KV compression by estimating future-query attention. PDS knows the future-query distribution by domain.
Self-Indexing KVCache 1-bit vector quantization unifies compression and retrieval. Compression-as-index = PDS-as-query-target.
SALS (Sparse Attention in Latent Space) 6.4× compression, 5.7× speedup. PDS-into-attention is structurally compatible.
Multi-head Latent Attention (MLA) Winning at scale. Latent attention is the substrate trend; PDS lives natively in latent.
Flash Attention 4 1605 TFLOPs/s on Blackwell. Hardware ceiling rising; Modulum chip narrative benefits.
OpenMythos Recurrent-depth transformer reverse-engineering of Mythos. Looped transformer = compute-adaptive depth; PDS routes depth selection.
Atomic Chat TurboQuant Gemma 4 at 25 tok/s on 16GB MacBook Air. Same wedge Modulum cuts.

Memory / Persistent State

Karpathy markdown-vault thesis "AI files itself." Most-cited memory-pattern of Q2 2026; Hypernym is its grounded backend.
claude-mem ~50K stars, persistent context across sessions. Memory plugin gold rush; Hypernym = grounded version.
Mem0 84.23% LongMemEval. Direct competitor; Hypernym differentiates on grounding.
Byterover 92% accuracy claim, Git-like hierarchy, 50–70% token savings. Reproducibility is the bar.
Zep · Letta · Cognee · Honcho Top-6 agent-memory frameworks 2026. Hypernym benchmarks against all on LongMemEval.
Graphify 71.5× fewer tokens per query, no vector DB. LLM-knowledge-graph trend; Hypercore is the grounded version.
llm-wiki (nvk) Persistent personal KB. Wiki pattern + Hypernym substrate.
Single Brain Vector DB ingests Slack/CRM/calls every 15min. "Company brain" pattern; PDS is the unit.
MemMachine Ground-truth-preserving memory. Direct conceptual sibling to Hypernym ground-truth.
Persistent Identity in AI Agents Multi-anchor identity. PDS = identity-of-domain.
Icarus inside Obsidian Hermes memory as readable notes + graph. Mechanical-confidence overlays directly.

Standards & Protocols

DESIGN.md (Stitch by Google) Apache 2.0 open spec, 5.2K stars in 72hr. Reference for "PDS.md" specification proposal.
MCP 97M downloads, Anthropic + OpenAI + Google + Microsoft adoption. Tool protocol settled; PDS-as-MCP-resource is a path.
UCP (Tobi/Shopify) Universal Commerce Protocol. PDS could be the equivalent for domain expertise.
C2PA 6,000 members, AI provenance global ref. Mechanical-confidence is the LLM-content peer.
Pricing.md trend Auth0, Resend, WorkOS. .md-as-API standardization expanding.
OWASP Top 10 Agentic 2026 Agent security framework. PDS provenance maps to ASI controls.
Agentic Risk Standard DeepMind + MS + Columbia + Virtuals. Risk-rating standard.
Resolver.md (Garry Tan) Routing-table-in-markdown, 200 lines replaces 20K. PDS-routing primitive.
x402 / ERC-8004 / ERC-8183 stack On-chain agentic-economy standards. Modulum-as-service inference micropayments path.

Post-training / Fine-tuning

LlamaFactory Unified efficient fine-tuning of 100+ LLMs/VLMs. Distribution surface for PDS-to-LoRA pipeline.
ml-intern Automates HuggingFace post-training team. End-to-end research-loop agent; composes with GSC.
In-Place Test-Time Training (ICLR 2026) Fast-weights as memory. PDS-into-fast-weights as a product wedge.
SHL0MS Autoreason Agent-debate reasoning method. Consistency arbitration is exactly Hypercore's mechanic.
Trinity (Arcee) Agent-coherence-tuned model. PDS-tuned models is the dual.
Carnice-9b Qwen3.5-9b harness-tuned. Harness-specific fine-tunes; PDS-conditioned variants are the next step.
Budgeted LoRA Distillation as structured compute allocation. PDS-conditioned LoRAs as a product line.
SLAD Shared LoRA adapters for task-specific distillation. One PDS, many LoRA.

Compression / Encoding

Compressed Repo Analyze (Hypernym) 87% compression. Already shipping in Forge S37/S38.
Hypernym Omnifact 60-trial fact extraction. Already shipping.
Bonsai 8B (PrismML) 1-bit intelligence-density paradigm (1.06/GB vs 0.10/GB). Framing wedge for Modulum's compression story.
Google KV-cache 6× compression Same quality. Direct competitor in compression; differentiate on persistent expertise.
Anthropic compaction 4 levels, disk-backed task list, CLAUDE.md memory. Compaction-as-product; Hypernym Magic plugin is the production version.

B2C / Dev-tool Surface

CodeWiki (Google) Paste GitHub repo, get interactive guide. Compressed Repo Analyze is the substrate.
Karpathy second-brain Claude Skill B2C distribution channel.
GBrain v0.10 RESOLVER.md + SOUL.md + ACLs, 24 fat skills. Personal-OS reference impl.
Garry Tan fat-skills/fat-code/thin-harness Dominant agent-engineering thesis. Hypernym is the source of fat-skill content.
Glass by Ramp Every-employee AI with one-click setup. Enterprise-rollout pattern needing pre-built domain PDS.
Claude Doctor Reads ~/.claude, writes CLAUDE.md rules. Self-healing config; Magic plugin can do the same with grounded facts.
SKILLIFY pattern (Garry Tan) Skill-creation loop. PDS-ify is the structural cousin.
Bud (AI Human Emulator) Full computer + comms. Human-emulator agents demand durable identity = PDS.

Eval / Benchmark

LongMemEval Memory-vendor benchmark. 87%+ target for Hypernym Vault publication.
Crafter benchmark (Hafner) 22-achievement geomean. 5/5 panel MVP target.
SWE-Bench Verified Real-codebase agent benchmark. Unanimous 5/5 v2 outlier.
LegalBench / FinanceBench / MedQA SectorPack falsifiers. +15 F1 / +12 F1 deltas.
LongBench-v2 / RULER-128K / NIAH-extreme Long-context substrate benchmarks. Modulum-7B-Native targets.
NeurIPS / ICLR Verifiability Reports Standard-capture mechanism for arXiv service.
TerminalBench (Stanford + Laude) 89 tasks, harness+model pair eval. PDS is harness-orthogonal IP.

Research-loop / Self-Improvement

Karpathy Autoresearch Self-improving research loops. Verify-run-loop is PDS-validation-loop.
Meta-Harness (Stanford) Karpathy Autoresearch on steroids. Reference impl.
Agentic Harness Engineering paper Automated harness evolution. PDS as a learnable component.
DeepAgents Harness Profiles Model-harness-task fit. Versioning the PDS around the harness.
Tinker for autoresearch Autoresearch tooling category.

08 · Synergy

Forge × Hypernym, file-cited.

Where the Hypernym primitives compound into Forge's research infrastructure. Each upgrade is implementable today on top of the existing dispatch-core/src/hypernym.ts beachhead. Effort: S (1–2 days), M (~1 week), L (2–4 weeks).

F1Substrate Router enforcementL4 / 5

infrastructure/packages/dispatch-core/src/envelope.ts + executor.ts + grounding-gate.ts

The trunk line. dispatchThrough(substrateRouter, envelope) chains compress → groundClaims → CBAS gate → routeProviderChain. Every other Hypernym primitive remains advisory until this lands. Promotes grounding-gate.ts from observability v1 to enforcement v2.

F2Substrate-aware eval suiteM5 / 5

forge-core/src/evals/index.ts + outcome-quality.ts

The measurement flywheel. Adds a 7th suite running the same OutcomeEvalCase set twice: once with the existing dispatcher, once with a substrate-mounted dispatcher. Quantifies the substrate's value-add. Without it, the rest of the backlog becomes a belief system.

F3Confidence-gated FSM execution (CBAS)S–M5 / 5

forge-core/src/engine.ts FSM table + dispatch-core/src/grounding-gate.ts

Smallest effort, largest behavioral change. Executor refuses high-cost tools (rm -rf, deploy_to_prod) if Omnifact confidence falls below τ (default 0.85). Per-track τ via replay calibration. Kills the "burn budget on low-confidence retries" failure mode mechanically.

F4PDS-aware Memory Router (Aether-Mem contract)L3 / 5

infrastructure/packages/memory-router/src/router.ts:82-100

Four primitives alongside store/search/get: remember(scope,key,val,{confidence,provenance}), recall(query,{minConfidence}), reconcile(key), replay(entryId). Vendor-neutral; every provider gets an adapter. Closes the "memory stores but does not surface" failure.

F5Disputes-edge in CXDB + substrate diffingS + M3 / 5

memory-router/src/handlers/cxdb-ingest.ts + new forge convergence diff <sha> CLI

Adds disputes edge relation. New endpoint GET /api/v1/cxdb/disputes/:entryId. New CLI forge convergence diff <sha> treats per-reviewer findings as N substrates and emits exactly which semantic facts differ. Round-2 disambiguation becomes machine-readable.

F6Cross-model review attestation as APN receiptsM3 / 5

forge-core/src/engine.ts:236-363 convergence_attested guard

Each reviewer verdict and findingsDigest signed with reviewer-keyed material via Keychain (existing credential-broker). Adds signatures[].receipt: APNReceipt. Audit log becomes non-repudiable, a precondition for ever exposing review verdicts off-host.

F7Sprint checkpoint provenance + Knowability replayS + M2 / 5

forge-core/src/engine.ts:884-928 validateCheckpointState + new forge replay <sha> CLI

CheckpointPreview gains {confidence, provenance: {transitionEventId, attestationPath, dispatchId, instructionsHash}}. forge replay <sha> reconstructs "what was epistemically available when this transition fired." Post-mortem becomes a one-liner.

F8CXDB attestations gain confidence + quorumM3 / 5

memory-router/src/handlers/cxdb-ingest.ts:86-91

Adds confidence REAL + quorum_trials INT. autoAttest calls runSemanticQuorum(entry, model) which submits the entry text through Hypernym N times. Stable attractors score high, drifting attractors score low. Drives F4's reconcile and recall.

F9Research-FSM hypothesis approval gets quorum checkM2 / 5

research/harness/src/approval-gate.ts:108-148

Insert quorumCheck(proposal) before autoApprove(). Submit each proposal's hypothesis text through Hypernym N=60. Require attractor stability ≥ τ before council vote. Catches hypotheses that are linguistically smooth but semantically incoherent.

F10Events log → Reality BusS3 / 5

forge-core/src/audit.ts + .forge/events.jsonl + EventBus

ForgeEvent gains confidence?: number, groundedBy?: string[], semanticHash?: string. POST /api/v1/events becomes a fact-ingest endpoint. External agents subscribe to facts, not just events. No new infra; the pipes already exist.

F11PDS for FORGE.md / per-track instructionsL3 / 5

infrastructure/packages/dispatch-core/src/envelope.ts:14-46 cachedInstructions

Promote FORGE.md to PDS manifest. Each section gets factId, confidence, provenance pointer. Envelope ships only sections relevant to the current FSM node, picked by the existing dispatch metadata. Constitutional Clauses carry nonDeletable: true.

F12Self-evolving harness via Hypercore profilingM1 / 5

research/harness/src/loop.ts + audit.ts events + new self-profile.ts

New script reads events.jsonl + per-iter retros + dispatch-metrics.jsonl, compresses via Hypernym, ranks "bottleneck facts" (e.g. "Codex round 2 finds bypass-class bugs that round-1 prompts miss"). Output to .forge/artifacts/harness-patch-proposals/{date}.md; the next sprint SEED reads it.

F13Cost Tracker as confidence-weighted spendS5 / 5

infrastructure/packages/cost-tracker/

Cost reports gain a confidence field. New budget rule: "do not spend more than $X on dispatches where substrate confidence is below τ." Pre-dispatch query returns 402 when violated. Three weeks of memory feedback say we burn budget on low-confidence retries; this kills it mechanically.

F14Reviewer disagreement graph in CXDBS3 / 5

memory-router/src/handlers/cxdb-ingest.ts:46-91 pickRelation

Adds disputes to edge relations. When the outliers sidecar records a dissentFromConsensus, ingest creates a disputes edge. Combined with semantic-quorum confidence (F8), AUDIT phase ranks "highest-stakes open disagreement" automatically. Seed for the next sprint, surfaced for free.

F15Substrate-mounted ToolSearch via intent classifierM1 / 5

infrastructure/packages/api-bridge/src/intent-classifier.ts + envelope build path

Each classified intent carries substrateProfile: 'pivot' | 'grind' | 'neutral' derived from the FSM node. Tool calls inside dispatch get the substrate annotation: a Bash call during BUILD sees Grind-mode rules; the same call during PLAN sees Pivot-mode rules. The dynamic-injection that the telemetry note describes.

Compose order · foundations first, then the substrate plane, then close the loop

Foundations (S, parallelizable). F7 checkpoint provenance, F10 Reality Bus events, F13 CBAS-weighted cost, F14 disputes edge. All additive schema plus one handler each.
Confidence math (M, sequential). F8 CXDB attestations gain confidence; F4 Memory Router primitives consume it.
Substrate plane (L, builds on 1+2). F1 Substrate Router, F11 PDS for FORGE.md, F15 substrate-aware ToolSearch.
Loop closure (M, builds on plane). F9 hypothesis quorum, F5 substrate diffing, F12 self-evolving harness, F7 knowability retros, F2 substrate-mounted evals, F6 APN receipts.

09 · Outliers

High-variance R&D bets.

Single-model proposals or 3-of-5 outlier convergence. Outliers reset categories. The 3-of-5 "skip-training-build-runtime" convergence in R7.7 is the strongest non-recommended signal in the entire panel.

Skip-Training-Build-Runtime (Modulum Runtime)

3 / 5 · Codex · Claude · Gemini

Three of five R7.7 models independently raised this as their outlier. modulum serve <any-hf-model> applies substrate-mounting at inference time without changing weights. Sized at $200K parallel runtime + $550K B-track for a $750K total inside the $1M envelope. Strongest signal not voted as primary, and the cleanest hedge against B-track failure.

Hallucination Liability Index

Claude · Highest-leverage outlier

Quarterly Moody's-style rating of top-40 AI products by per-domain fabrication rate. Pairs with EU AI Act Art. 52 (Aug 2026). Pulls demand for every Hypernym SKU. The category-defining move: convert Hypernym from "another memory vendor" into the scorekeeper.

LongMemEval-Grounded Certification

Codex · Pairs with HLI

Free benchmark + paid certification. Hypernym sets the scoreboard for the memory-vendor category. Open-source-from-day-one as the methodology defense. Highest-leverage outlier in the round; closely related to HLI; together they reframe procurement.

Counterfactual Futures Market

Claude · E3

Branching PDS forks for "what if F had been different?" Substrate diffing applied to alternative timelines. Enterprise strategists, regulators. Speculative; long-horizon; foundation for HLI's underlying mechanism.

Hypernym Court · multi-agent arbitration

Claude · E1

When two agents disagree, who decides? Mechanical fact-graph diff and contradiction resolution. Creates a market category that does not yet exist. Multi-agent ops at scale will need it within 18 months.

Distill-Frontier-Into-Substrate-Native

Gemma · R7.7 outlier

Use Claude or GPT-4o as teacher; distill PDS-reasoning into Phi-4 or 1B small model via SFT. $30–50K, 2 weeks. Pre-test for the B-track: validate the substrate thesis before committing $550K. Possibly the single highest-ROI experiment in the deck.

Partner with Together / Nous Research

Grok · R7.7 outlier

Hypernym contributes PDS dataset, IP, and $100–200K; partner provides compute and co-authors a paper. De-risks B-track at 30–50% cost share. Different IP terms but cleaner distribution if it works.

Memetic PDS · GitHub for Fact-Graphs

Claude

Anime canon, D&D rules, Wiki-style domains. Confidence-as-leaderboard social mechanic. Plausibly viral; lower direct revenue but unique enterprise funnel. The B2C Hypernym-as-platform play.

Agent Contradiction Markets

Gemini

Financially-incentivized adversarial fact-disputing platform. Users stake on truth claims; the byproduct is an invaluable training corpus. Wild but plausible: data asset over revenue.

VeriBrand · algorithmically verifiable marketing

Gemma

Brands upload a product spec PDS; all marketing content is auto-checked. FTC and EU compliance. Clean B2B revenue line; underdeveloped in the round but obvious surface for HLI customers.

RepoTwin · pre-flight agent commits in a digital twin

Gemma

Compressed Repo Analyze + Dreamer 4 to simulate agent commits before they land. Aligns with the branchable-counterfactual theme. Long-game; high-value enterprise infrastructure once Modulum runtime exists.

Hyper-Synthetix · synthetic data factory

Gemma · R7.5 CP7

PDS-driven synthetic-data generation for vertical model trainers. Vertical SLMs need 3K–30K training examples; PDS shortcuts the example-generation phase. Composes with the PDS-to-LoRA pipeline.

MiroShark Skill Underwriter

Codex outlier

Agent due-diligence service. Upload traces, receive a competence envelope. Cloud-uptime / security-attestation analog for the agent era. Long-game strategic positioning.

Hire ex-Anthropic / ex-DeepMind architecture researchers

R7.7 hint

If C-track (from-scratch substrate-native) is the right answer at some point, you would need 2 architecture-and-alignment researchers to spec it. Worth keeping a year-out option open even without funding it now.

10 · Open Research

What R7 did not resolve.

Each item needs another round, a customer conversation, a partnership conversation, or a focused experiment. Listed here so the team picks them up, not so they sit unowned.

PDS-corpus generation cost · single largest unknownCan we generate 5–10B tokens of PDS-paired data from CommonCrawl + Omnifact at less than $100K, or does it require a frontier-model teacher (Claude / GPT-4o) at $200–350K? Drives the B-track budget by ~$200K either direction. Needs a 2-week pilot, not a panel.

Does attention modification durably stick?Continued pretraining can produce gains that wash out as the model fine-tunes back. Diagnostic: hold-out PDS schema test. If the substrate gain does not generalize, the recipe is overfitting to training format, not learning substrate behavior. Codex's central diagnostic.

Does 75%-noise translate from inference to training?The 4-architecture observation is inference-time. Whether a from-scratch architecture can be built around just the signal heads is unproven, and a common failure mode in model-systems work. The outliers (distillation pre-test, runtime parallel track) are designed to falsify this cheaply.

PDS-as-Bayesian-Prior · the under-explored single-model betCodex's PriorGate framing was the only mathematically distinct model on the panel. Treats PDS values as conditioning on the LLM's output distribution, not as RAG context. Deserves its own track: falsifiable, well-developed, but only one model carried it.

Living Corpus / EchoStream · half-finished threadStreaming PDS with domain-specific freshness half-lives. Two models (Gemini, Gemma) developed the idea; three did not. Domain-specific decay rates need empirical work: finance < healthcare < legal < reference, but how much?

KV-as-substrate · what is the right granularity?Treat KV cache as queryable substrate, not ephemeral compression. Per-fact? Per-entity? Per-section? SGLang prefix-cache gives mass-share but coarse granularity; PDS implies per-fact. Cache-design tradeoff worth a focused experiment.

In-place TTT × PDS · fast-weights as memoryThe ICLR 2026 oral is OSS-available. Loading PDS into projection-matrix fast-weights would be continual learning that is both auditable (PDS lineage) and architecturally clean (no catastrophic forgetting). Nobody has tried this yet.

Multi-PDS arbitrationWhat happens when two domains conflict? "Patient is allergic to X" (medical PDS) vs "X is the only treatment for Y" (clinical-trial PDS). Hypernym Court is the runtime; arbitration policy is undesigned.

Substrate diffing as a scientific methodTreating model panels as N substrates and computing the semantic-fact diff is a research-method primitive that does not yet have a formal foundation. Worth a paper: would establish "substrate-diffing review" as the multi-model-panel default.

Sector tuning · what F1 lift is achievable with prompt-templates alone?R7.5 SectorPack falsifier targets are +15 F1 LegalBench, +12 F1 FinanceBench. Open: is that achievable with sector prompt-templates alone, or does it require a fine-tune? If a fine-tune is required, the SectorPack economics change materially.

Confidence calibration across source-typesEnv-state (0.99), repo-rule (0.90–0.97), transition (0.95), episode-summary (0.5×Omnifact + 0.5×success-rate). The mapping is hand-tuned. Empirical calibration on real domains is missing.

Vocab-window discipline at scaleCrafter MVP picks ~90–120 canonical tokens. SectorPack legal will be ~5K. Open: what is the lexicon-discipline → retrieval-quality curve, and does it have a knee? Critical for SectorPack tuning.

PDS sharing protocol (PDS.md spec)If PDS becomes the unit of product, how do PDSes interoperate across vendors and agents? DESIGN.md is the closest analog (Stitch by Google, 5.2K stars in 72 hours). Spec needs to land before the "format owner" position is winnable.

Hypercore endpoint stability boundsAt what call-volume does the public Omnifact / HyperRemember API degrade? Internal benchmark missing. Affects Vault unit economics and Crafter MVP budget. 2-day investigation.

Modulum runtime kernel work · talent questionIf we run the parallel runtime track (3-of-5 outlier), kernel work is CUDA / Triton specialty, different from B-track's pretraining team. Open: do we have, or can we hire, that person?

Substrate-mounted vs grounded vs in-context · naming clarityThe panel used these interchangeably. They are not the same. In-context = facts in the prompt. Grounded = facts with provenance. Substrate-mounted = facts routed through attention as substrate. Naming should land before public communication.

Karpathy gravity well · endorsement mechanismHypernym is the grounded version of the Karpathy markdown-vault meme. Single highest-leverage cultural lever in the deck. Open: what is the actual mechanism for getting his attention? Direct ask, organic discovery, or publication co-author?

Failure-narrative readiness for B-track killIf B-track is killed at the week-6 gate, what is the public messaging? Honest negative-result paper preserves narrative; silent kill burns credibility. The team should pre-write the "we falsified the recipe" post.