System

Data Availability

Where heavy data lives — and how the chain verifies it without trusting whoever stores it.

In one paragraph

A 120-billion-parameter model’s weights can exceed 200 GB — far too large to replicate across every validator. Agent execution contexts and logs can grow significantly too. Theseus uses the standard pattern: heavy data lives off-chain in TheseusStore (a dedicated DA layer), and the on-chain runtime stores only compact, content-addressed roots that serve as cryptographic commitments. Correctness depends only on the roots; the DA layer provides availability.
  • Three anchor types: weights_root (per model), context_root (per agent), generic data roots for everything else.
  • Fetch-and-verify: nodes never trust raw bytes — they fetch by root and verify against the on-chain anchor.
  • Correctness ≠ availability: integrity is cryptographic; availability is what the DA layer’s economics ensure.
  • Sampling-based DA checks: Substrate Off-chain Workers (OCWs) periodically sample data referenced by anchored roots and slash unavailability.

On-chain anchors

pallet_store maintains the canonical mapping between on-chain identifiers and off-chain data. It exposes three kinds of anchor:

weights_root

Content-addressed root for model parameters. Stored per model in pallet_models. Provers fetch weights by root, verify integrity before running inference.

context_root

Content-addressed root for agent execution context and logs. Updated by pallet_agents as agents execute, anchoring the AKG (Agent Knowledge Graph).

Generic data roots

For other domain-specific data referenced by agents or contracts. Same fetch-and-verify pattern.

Data flow: fetch and verify

No participant in the system blindly trusts data retrieved from TheseusStore. They always verify it against the on-chain root before using it.

  1. A model is registered with its weights_root anchored on-chain.
  2. When assigned an inference job, a prover fetches model weights from TheseusStore by root and verifies integrity (Merkle / Verkle proofs).
  3. After agent execution, updated context is stored in TheseusStore and the new context_root is anchored on-chain.

Correctness vs availability

A critical property of the DA design: correctness does not depend on the DA layer’s honesty. It only depends on its availability.

Consensus nodes never trust raw data from TheseusStore. For anything that matters for execution, they fetch it via the content-addressed root and verify integrity against the on-chain anchor (Merkle / Verkle proofs). As long as at least one honest full node can obtain the data and verify it against the root, the on-chain state transition remains correct.

The DA layer’s economic incentives and slashing mechanisms exist to ensure liveness (data stays available), not integrity (data is correct). Integrity is guaranteed by the cryptographic commitments anchored on-chain.

Sampling-based availability checks

The protocol supports bounded, sampling-based availability checks via Substrate Off-chain Workers (OCWs):

  • OCWs periodically sample data referenced by anchored roots.
  • Unsigned transactions record evidence of unavailability when samples fail to retrieve.
  • Misbehaving storage providers are slashed via the staking pallet.

These DA checks are orthogonal to inference verification. The chain’s correctness doesn’t depend on them. They strengthen the guarantee that off-chain data remains available and consistent over time.

Why this layout works

Splitting heavy data off-chain is the only way to support modern model sizes inside a chain. Doing it without weakening integrity comes from the same property that makes Tensor Commits work: the on-chain anchor commits to the data, and anyone holding the data can produce a proof against that anchor. The chain doesn’t need to store the data to verify claims about it.

That’s the same principle that lets a prover serve inference for a 120 GB model from a single GPU and still have every validator independently verify the result in milliseconds.

Documentation