Data Storage
How research files are stored, content-addressed, versioned, and permanently persisted across the decentralised storage stack
How Storage Works
Every file uploaded to an Onchain Lab passes through a layered storage pipeline that encrypts, distributes, versions, and references data across multiple decentralised systems. The result is a file that is encrypted before it leaves the researcher's browser, pinned to a content-addressed network for retrieval, persisted permanently independent of any single provider, versioned with full provenance history, and referenced on-chain in the Lab's Token Bound Account.
Upload Flow
When a researcher uploads a file to a Lab, the following sequence occurs:
Authenticate. The researcher authenticates with their wallet (Privy JWT) or a service token, establishing a session with the Molecule API that permits uploads to the target Lab.
Prepare. The Client SDK computes a content hash checksum of the raw file. If the file's access level is set to Token-Holder or Admin-Only, the SDK opts into encryption on initiateCreateOrUpdateFileV2 β the backend issues a fresh per-file data encryption key (DEK) and returns both the one-shot plaintext DEK and its wrapped form. The SDK builds the file's on-chain access conditions, AES-256-GCM encrypts the file client-side with Web Crypto, and stores the wrapped DEK + IV + content hash + conditions as the file's encryption metadata on Kamu (ODF). The encrypted blob replaces the original file in memory. If the access level is Public, no encryption is required and the raw file proceeds directly. See Data Privacy & Access for the full Onchain-Verified Envelope Encryption model.
Upload. The Client SDK initiates the upload through the Molecule API, which reserves an upload slot and generates a pre-signed URL via Filebase (the S3-compatible upload gateway). The encrypted file is uploaded directly from the browser to the pre-signed URL with progress tracking.
Commit. Once the upload completes, Kamu fetches the file from staging, creates a new version record (recording the timestamp, content hash, author, and data room path), and commits the file to IPFS via the pinning service. IPFS returns a content identifier (CID) derived from the file's contents. The file is also persisted to Arweave for permanent availability. The temporary staging file is deleted.
Reference. The CID and associated metadata are written on-chain to the Lab's Token Bound Account. This creates a tamper-evident, publicly auditable link between the Lab's onchain identity and the off-chain file. The transaction is timestamped and signed, becoming part of the Lab's activity log.
Record. Kamu stores the full file record β including the file DID, the Lab identifier, encryption metadata, access level, content hash, and version information β in its provenance database. The Molecule API stores a corresponding application record linking the file to the Lab's data room.
Content Addressing
Every file stored through the pipeline receives a content identifier (CID) β a cryptographic hash derived from the file's contents using IPFS's multihash format. The CID serves as both the file's address and its integrity proof: requesting a CID from any IPFS node guarantees that the returned content is exactly what was originally stored. If even a single byte of the underlying file changes, the CID changes, and the on-chain reference in the Lab's TBA becomes a mismatch β making tampering immediately detectable.
Because CIDs are deterministic, the same file uploaded by different researchers at different times will always produce the same identifier. This property enables deduplication across the network and allows independent verification of data integrity without trusting any specific storage provider.
Versioning and Provenance
Kamu maintains a complete, append-only history of every dataset in every Lab. When a file is uploaded, updated, or modified, Kamu creates a new version record that preserves the previous version's CID alongside the new one. No version is ever overwritten or deleted β the full lineage is permanently retrievable.
Each version record includes the content hash, the timestamp, the author's decentralised identifier (DID) linked to their wallet address, the data room path, and a reference to the previous version. This creates a verifiable provenance chain from the current state of any dataset back to its original upload. When a collaborator, funder, or reviewer needs to verify when data was created, who created it, or how it evolved over time, the evidence is in Kamu's version graph.
Kamu also records activity events β file access, metadata changes, announcements, and other Lab actions β providing a broader context for the dataset's history beyond just version changes.
Permanent Persistence
IPFS provides content-addressed retrieval, but it does not guarantee permanent availability on its own. If every node that pins a file goes offline, the file becomes unreachable β the CID still exists as an address, but nothing answers the request.
Onchain Labs solve this by persisting files to Arweave in addition to IPFS. Arweave is a permanent, pay-once storage network β once a file is written, it remains available indefinitely regardless of whether any individual node or service continues operating. This dual-storage approach means files are retrievable from IPFS for fast, everyday access, and backed by Arweave for permanent, censorship-resistant availability.
Even if a file record is removed from a Lab's data room index, the underlying content persists on both IPFS (as long as it remains pinned) and Arweave (permanently). Because all files are encrypted before upload, this persistence does not compromise confidentiality β the content is publicly available but unreadable without the wrapped DEK, which is only released after on-chain access conditions are re-verified against live chain state.
E2E Upload Flow


Storage Summary
Research files (encrypted)
IPFS + Arweave
Filebase (upload), Kamu (versioning)
Decentralised, permanent, content-addressed
On-chain data references
Lab's Token Bound Account
Smart contract
Tamper-evident CID pointers and metadata
tokenURI pointer
On-chain (Lab TBA)
Lab smart contract
Permanent reference to IP-NFT metadata
File versions
Kamu provenance DB
Kamu
Append-only version history and audit trail
Encryption keys
Wrapped per-file DEKs
Onchain-Verified Envelope Encryption
Per-file AES-256 DEK wrapped by a protocol-operated custodian (BLS threshold operator network on roadmap). Plaintext DEK is released only after on-chain conditions are re-verified. Legacy files continue to resolve through Lit Protocol until migrated.
Access conditions
Stored in encryption metadata on Kamu (ODF)
AccessResolver (on-chain)
Defines who can decrypt each file
File provenance
Kamu provenance DB
Kamu
DID-based authorship, timestamps, lineage
Activity events
Kamu provenance DB + on-chain
Kamu + Lab TBA
Access logs, metadata changes, announcements
Last updated