Data Availability
Where heavy data lives — and how the chain verifies it without trusting whoever stores it.
In one paragraph
- Three anchor types:
weights_root(per model),context_root(per agent), generic data roots for everything else. - Fetch-and-verify: nodes never trust raw bytes — they fetch by root and verify against the on-chain anchor.
- Correctness ≠ availability: integrity is cryptographic; availability is what the DA layer’s economics ensure.
- Sampling-based DA checks: Substrate Off-chain Workers (OCWs) periodically sample data referenced by anchored roots and slash unavailability.
On-chain anchors
pallet_store maintains the canonical mapping between on-chain identifiers and off-chain data. It exposes three kinds of anchor:
weights_root
Content-addressed root for model parameters. Stored per model in pallet_models. Provers fetch weights by root, verify integrity before running inference.
context_root
Content-addressed root for agent execution context and logs. Updated by pallet_agents as agents execute, anchoring the AKG (Agent Knowledge Graph).
Generic data roots
For other domain-specific data referenced by agents or contracts. Same fetch-and-verify pattern.
Data flow: fetch and verify
No participant in the system blindly trusts data retrieved from TheseusStore. They always verify it against the on-chain root before using it.
- A model is registered with its weights_root anchored on-chain.
- When assigned an inference job, a prover fetches model weights from TheseusStore by root and verifies integrity (Merkle / Verkle proofs).
- After agent execution, updated context is stored in TheseusStore and the new
context_rootis anchored on-chain.
Correctness vs availability
A critical property of the DA design: correctness does not depend on the DA layer’s honesty. It only depends on its availability.
Consensus nodes never trust raw data from TheseusStore. For anything that matters for execution, they fetch it via the content-addressed root and verify integrity against the on-chain anchor (Merkle / Verkle proofs). As long as at least one honest full node can obtain the data and verify it against the root, the on-chain state transition remains correct.
The DA layer’s economic incentives and slashing mechanisms exist to ensure liveness (data stays available), not integrity (data is correct). Integrity is guaranteed by the cryptographic commitments anchored on-chain.
Sampling-based availability checks
The protocol supports bounded, sampling-based availability checks via Substrate Off-chain Workers (OCWs):
- OCWs periodically sample data referenced by anchored roots.
- Unsigned transactions record evidence of unavailability when samples fail to retrieve.
- Misbehaving storage providers are slashed via the staking pallet.
These DA checks are orthogonal to inference verification. The chain’s correctness doesn’t depend on them. They strengthen the guarantee that off-chain data remains available and consistent over time.
Why this layout works
Splitting heavy data off-chain is the only way to support modern model sizes inside a chain. Doing it without weakening integrity comes from the same property that makes Tensor Commits work: the on-chain anchor commits to the data, and anyone holding the data can produce a proof against that anchor. The chain doesn’t need to store the data to verify claims about it.
That’s the same principle that lets a prover serve inference for a 120 GB model from a single GPU and still have every validator independently verify the result in milliseconds.