Designing Institutional-Grade
Custody Architecture
MPC, HSM, and Multi-Party Signing for Digital Asset Firms — a complete technical teardown of the protocols, tradeoffs, and production architectures that govern how institutions secure billions in digital assets.
In traditional finance, custody is understood. Regulated custodians hold bearer instruments in vaulted facilities under well-established legal frameworks, with decades of court precedent establishing what control means and who bears liability when it is lost. In digital assets, custody is a cryptographic problem wearing a compliance uniform — and the engineering decisions made at its foundation determine not just operational risk, but whether assets can be recovered at all.
The custody stack for a regulated digital asset firm managing significant AUM must satisfy a set of requirements that are individually demanding and collectively contradictory: keys must be completely inaccessible to any single insider at any moment, yet the firm must be able to sign transactions in seconds to meet trading desk SLAs. The system must survive the simultaneous compromise of multiple infrastructure components, yet maintain availability that satisfies institutional clients used to 99.99% uptime. It must be auditable enough to satisfy regulators, yet private enough to protect clients from targeted attacks.
This analysis examines how the leading institutional custody architectures — and the protocols that underpin them — navigate these contradictions.
The Threat Model: What Institutional Custody Must Defeat
Before evaluating any custody technology, a firm must specify what it is defending against. The threat landscape for institutional digital asset custody is broader than it first appears, and different custody architectures optimise for different subsets of it.
External Threat Vectors
- •Network intrusion: Compromise of signing infrastructure via software vulnerabilities, supply chain attacks, or zero-days in dependencies
- •Social engineering: Targeted attacks on operators and administrators to extract key material or authorise fraudulent transactions
- •Side-channel attacks: Timing, power analysis, or electromagnetic attacks on HSM hardware, particularly relevant for cold signing paths
- •Supply chain compromise: Malicious firmware, compromised HSMs, or backdoored cryptographic libraries introduced before deployment
Internal Threat Vectors
- •Rogue employee: Single authorised operator exfiltrating key material or self-approving fraudulent withdrawals
- •Collusion: Multiple insiders coordinating to meet signing thresholds without legitimate business purpose
- •Coercion: Physical or legal pressure on key holders to sign without proper governance process
- •Operational error: Accidental key deletion, backup corruption, or procedural failures that result in permanent loss
Every custody architecture must be evaluated against two distinct failure modes: theft (keys exfiltrated, assets stolen) and loss (keys destroyed or inaccessible, assets permanently locked). These failure modes are in direct tension — measures that reduce theft risk often increase loss risk, and vice versa. The engineering task is calibrating this tradeoff for the firm's specific risk tolerance and regulatory context.
TSS vs. On-Chain Multisig: The Foundational Choice
The first architectural decision is whether multi-party control of assets will be implemented via threshold signature schemes (TSS), a cryptographic technique, or on-chain multisig, a smart contract or protocol-level construct.
| Dimension | TSS / MPC | On-Chain Multisig |
|---|---|---|
| Key Architecture | No full private key ever exists. Key shares held by separate parties; signing via cryptographic protocol. | N private keys exist independently. Smart contract or script requires M-of-N signatures to spend. |
| On-Chain Footprint | Indistinguishable from single-sig on-chain. No disclosure of threshold configuration. Lower fees. | Multisig structure publicly visible on-chain. M and N values disclosed. Higher fees on some chains. |
| Chain Compatibility | Chain-agnostic — works for any chain supporting the underlying signature scheme. Requires per-chain integration work. | Chain-specific — each chain has different multisig primitives. UTXO chains, EVM chains, and others require separate implementations. |
| Smart Contract Risk | No smart contract risk. Signing is off-chain cryptography; no exploitable on-chain logic. | Smart contract bugs are a material attack surface. Multisig contracts have been exploited (Parity wallet, $150M frozen). |
| Latency | 3–10 rounds of network communication between signing nodes. Adds latency; must be engineered for SLA. | Each signer signs independently. Aggregation is trivial. Low coordination latency. |
| Key Refresh | Key shares can be refreshed proactively — parties compute new shares without changing the underlying key. Old shares become useless. | Key rotation requires new address generation and asset migration. Expensive and creates operational risk. |
| Regulatory Optics | Preferred by most institutional regulators. No on-chain disclosure of custody structure. Harder to target. | On-chain transparency can be a compliance asset (auditability) or a liability (attack surface disclosure). |
The institutional consensus has moved decisively toward TSS/MPC for most use cases. The combination of no single point of cryptographic failure, chain-agnosticism, and on-chain indistinguishability makes it the superior architecture for firms managing diverse asset portfolios under regulatory scrutiny.
The private key is the asset. Any architecture where a full private key exists — even transiently, even in hardware — has a single point of cryptographic failure. MPC's value proposition is not that it distributes risk. It is that it eliminates the key as a singular attack target entirely.
— Design principle, institutional custody engineeringThe MPC-CMP Protocol: How Modern MPC Signing Works
The cryptographic engine underneath modern institutional MPC custody is typically an implementation of MPC-CMP (Multi-Party Computation for ECDSA, CMP variant), published by Ran Canetti, Rosario Gennaro, Steven Goldfeder, Nikolaos Makriyannis, and Udi Peled in 2020. MPC-CMP replaced earlier ECDSA threshold protocols that required a preprocessing round and were vulnerable to certain abort attacks.
Key Generation (Distributed Key Generation — DKG)
The protocol begins with distributed key generation: a process by which n parties jointly compute a public key and receive individual secret shares — without any party or external coordinator ever seeing the full private key. The DKG procedure in MPC-CMP uses Feldman's verifiable secret sharing, ensuring that each party's share is consistent with the others and that the resulting public key is verifiable without revealing the private key.
// n parties jointly generate keypair. No party sees sk.
// Feldman VSS + Schnorr commitment scheme
Round 1 → Each party Pᵢ samples secret sᵢ ← ℤₙ
Commits to polynomial fᵢ(x) where fᵢ(0) = sᵢ
Broadcasts Pedersen commitment Cᵢ + Schnorr proof
Round 2 → Each Pᵢ sends share fᵢ(j) to party Pⱼ (encrypted)
Parties verify shares against commitments
Abort if verification fails → reshare with honest parties
Output → Party Pᵢ holds secret share xᵢ = Σⱼ fⱼ(i) mod n
Public key pk = g^(Σ xᵢ) is known to all
sk = Σ xᵢ is NEVER assembled by any partyThreshold Signing — The CMP Improvement
The signing protocol in MPC-CMP achieves ECDSA threshold signing in 4 rounds (reduced from earlier protocols requiring preprocessing). The key innovation is the use of Paillier encryption for the randomness multiplication step — the part of ECDSA generation that previously required either a trusted dealer or expensive zero-knowledge proofs.
// t parties (of n) sign message m without exposing key shares
// Paillier homomorphic encryption used for nonce multiplication
Round 1 → Each signer samples nonce share kᵢ, γᵢ ← ℤₙ
Broadcasts Paillier encryption: Enc(kᵢ), Enc(γᵢ)
Commits to elliptic curve points Γᵢ = γᵢ·G
Round 2 → Parties compute MtA (Multiplicative-to-Additive) shares
kᵢ · γⱼ computed via Paillier without revealing kᵢ or γⱼ
ZK proofs verify correct Paillier computation
Round 3 → Parties reveal Γᵢ, compute R = k⁻¹·G
Each party computes partial signature σᵢ
Broadcasts σᵢ with ZK consistency proof
Output → Any party aggregates: σ = Σ σᵢ mod n
ECDSA signature (r, s) is valid for pk over m
No party's key share xᵢ was ever revealedProactive Secret Sharing and Key Refresh
MPC-CMP supports proactive secret sharing: periodic refresh of all key shares such that the new shares are mathematically unrelated to the old ones (while the underlying key remains unchanged). This is critical for institutional security: key shares exfiltrated before a refresh are useless after it. Production systems should run proactive refresh on a schedule — daily or weekly for hot wallets, monthly for warm, on-demand for cold.
HSM Integration: Anchoring MPC in Hardware Trust
MPC alone is a cryptographic protocol running on general-purpose compute. Without hardware roots of trust, the security boundary of each MPC node extends to the entire software stack — operating system, hypervisor, cloud provider infrastructure, and every library in the dependency chain. Hardware Security Modules (HSMs) establish a hardware-enforced boundary within which key material cannot be extracted regardless of software compromise.
HSM Selection Criteria for MPC Nodes
Not all HSMs are equal, and the MPC use case creates specific requirements that differ from traditional HSM deployments:
Required Capabilities
- •FIPS 140-2 Level 3 minimum — Level 4 preferred for tier-1 cold signing nodes
- •Custom firmware support — MPC protocol primitives may need to run inside the HSM boundary
- •High-throughput ECDSA — signing throughput measured in operations per second
- •Remote attestation — ability to cryptographically prove firmware integrity to remote verifiers
Vendor Landscape
- •Thales Luna Network HSM — dominant in banking; excellent PKCS#11 support; used in Fireblocks node infrastructure
- •Utimaco SecurityServer — strong European regulatory track record; BaFin/ECB compliance
- •AWS CloudHSM / Azure Dedicated HSM — cloud-native; limited custom firmware options
- •Ledger Vault (enterprise) — purpose-built for crypto; strong UX for governance workflows
Cloud-hosted HSMs offer operational convenience and elastic scalability, but introduce a dependency on cloud provider infrastructure integrity. For tier-1 cold storage, dedicated on-premise HSMs in physically controlled facilities remain the gold standard. Cloud HSMs are appropriate for warm and hot tiers where the tradeoff between convenience and absolute security has already been made.
Key Ceremony Design: The Most Critical Operational Moment
The key ceremony — the procedure by which the distributed key generation protocol is executed to create a new custody key — is the single most consequential operational event in a custody firm's lifecycle. A ceremony executed correctly creates a key whose security is mathematically guaranteed by the MPC protocol. A ceremony with procedural flaws can create attack vectors that persist for the lifetime of the key.
Pre-Ceremony Planning & Participant Selection
Define the threshold scheme (t-of-n). Designate signing nodes and their operators — ensuring geographic separation, organisational independence, and background-checked personnel. Document the governance policy. Establish secure communication channels. Engage independent ceremony observers for audit purposes.
Environment Preparation & Verification
Each signing node must be provisioned, its firmware verified against published hashes, and its HSM initialised with tamper-evident seals verified by independent witnesses. Software on ceremony machines should be built from audited source. Network isolation — no internet connectivity during key generation; air-gapped machines preferred for cold tier ceremonies.
Distributed Key Generation Execution
The DKG protocol runs across all n nodes simultaneously. Each node generates its entropy, commits to its polynomial, exchanges encrypted shares, verifies received shares, and outputs its key share to HSM-protected storage. No node should output its share to any medium other than the designated HSM. The resulting public key is extracted and verified by all participants.
Verification & Test Signing
A test signing round is executed with the new key: a transaction to a dust address is constructed, signed by the minimum required threshold, and broadcast to mainnet. Successful confirmation proves the key is valid. The public key's derivation path is recorded and independently verified. All ceremony logs, hashes, and participant attestations are collected and archived.
Backup Share Distribution & Recovery Documentation
Encrypted backup shares are distributed to geographically and organisationally separate custodians. Recovery procedures are documented in detail: who holds which backup share, under what governance conditions they may be used, what verification is required before reconstruction, and how the recovered key is reintegrated into live infrastructure.
Disaster Recovery Architecture: Designing for the Unthinkable
Disaster recovery in custody is not a single procedure — it is a tiered set of procedures corresponding to failure scenarios of different severity. A firm that plans only for "one node goes down" has not planned for disaster recovery. True DR planning must include scenarios that no individual within the organisation wants to contemplate: simultaneous destruction of multiple facilities, death or incapacitation of key personnel, catastrophic infrastructure failure, and hostile legal actions in multiple jurisdictions.
| Scenario | Recovery Mechanism | RTO Target | Governance Required |
|---|---|---|---|
| Single node failure | Automatic failover to standby node with pre-provisioned key share replica | < 30 seconds | None — automated |
| Majority node failure | Cold backup key shares activated; new signing quorum assembled | 4–24 hours | Senior operations + compliance sign-off |
| Full infrastructure loss | Shamir backup shares reconstructed by designated trustees; assets swept to new key | 24–72 hours | Board-level approval + independent trustee coordination |
| Key share compromise | Immediate key refresh; compromised node isolated; audit initiated | 1–4 hours | CISO + external security firm engagement |
| Firm dissolution / insolvency | Client assets swept via court-appointed trustee using escrowed backup shares | Days to weeks | Legal process + independent trustee |
The most difficult DR scenario — firm dissolution — requires backup shares to be held by parties who are completely independent of the custody firm, yet bound by contractual and fiduciary obligations to clients. This is not purely a technical problem. It requires legal agreements, regulatory approval in the applicable jurisdictions, and trustees who themselves have robust custody infrastructure. Regulated trust companies, law firms with specific crypto mandates, and purpose-built independent trustee services all serve this function — each with distinct risk profiles and regulatory implications.
Fireblocks Architecture Teardown
Fireblocks represents the dominant institutional MPC custody infrastructure provider, with over 1,800 institutional clients and $4 trillion in annualised transfer volume. Its architecture makes specific engineering choices that reflect its target market — active trading institutions requiring high-throughput, low-latency signing across hundreds of blockchains.
The MPC-CMP Implementation
Fireblocks implements MPC-CMP with a 2-of-3 default threshold scheme: one share held by the Fireblocks cloud infrastructure, one by the client's mobile device (protected by the device's secure enclave), and one in cold backup storage. This design creates a specific security model: any signing operation requires Fireblocks infrastructure cooperation, which is simultaneously its primary security guarantee and its primary regulatory concern.
Cloud · Always available
Share [1 of 3]
iOS Secure Enclave
Share [2 of 3]
Encrypted · Offline
Share [3 of 3]
Intel SGX as Software HSM
Fireblocks uses Intel SGX (Software Guard Extensions) as the hardware root of trust for its cloud-hosted key share, rather than traditional HSMs. SGX enclaves provide memory encryption and attestation — code running inside an enclave cannot be inspected by the host OS, hypervisor, or cloud provider. This is a pragmatic choice for a cloud-native architecture.
The tradeoff is that SGX has a documented history of side-channel vulnerabilities (Spectre, Meltdown, Plundervolt, SGAxe) that have periodically allowed enclave memory to be read by local attackers. Fireblocks mitigates this through SGX mitigations, regular patching, and the structural guarantee that the SGX share alone is insufficient to sign — but it represents a genuine divergence from the physical tamper-resistance model of FIPS 140-2 Level 3 HSMs.
Policy Engine and Network Architecture
Fireblocks operates a dedicated Fireblocks Network — a permissioned network for inter-institution transfers that allows direct asset movement between Fireblocks clients without on-chain settlement, settling net positions periodically. The policy engine supports complex governance rules: withdrawal limits by asset and amount, whitelisted destination addresses, multi-approver workflows with biometric authentication, and time-lock conditions. These policy rules are enforced at the MPC signing layer — a transaction that violates policy cannot be signed, not merely rejected at the application layer.
Anchorage Architecture Teardown
Anchorage Digital, the first federally chartered digital asset bank in the United States (OCC charter, 2021), represents a fundamentally different architectural philosophy from Fireblocks: rather than a software-first MPC platform with cloud HSMs, Anchorage is built around physical infrastructure designed to satisfy the most stringent banking regulators in the world.
The Biometric-MPC Hybrid Model
Anchorage's signature architectural innovation is the integration of biometric authentication directly into the MPC signing quorum. Rather than relying solely on device possession, Anchorage requires biometric verification (fingerprint or face) from designated human approvers, whose biometric data is stored in an on-device secure enclave and never transmitted to Anchorage's servers. The biometric verification unlocks the approver's key share for the duration of the signing session only.
This architecture satisfies a specific regulatory requirement: human intentionality in the signing process. A rogue automated system cannot satisfy the biometric threshold without physical human cooperation — directly addressing the coercion and rogue employee scenarios that concern banking regulators most acutely.
Physical Infrastructure and Regulatory Design
As a nationally chartered bank, Anchorage's custody infrastructure is designed to satisfy OCC Handbook requirements for custodial services, NIST SP 800-57 key management standards, and the examination expectations of federal bank examiners. This creates a design constraint that differs fundamentally from non-bank MPC vendors: every component must be documentable, examinable, and explainable to a non-technical regulatory audience.
Concretely, this means: physical HSMs in OCC-examinable facilities rather than cloud SGX enclaves; documented key ceremonies with notarised witness attestations; explicit chains of custody for all cryptographic material; and signing workflows that map to traditional banking dual-control concepts that regulators already understand.
Policy Engine and Governance Layer Design
The cryptographic layer — MPC signing, HSM protection — solves the key security problem. The governance layer solves the authorisation problem: even if an attacker cannot steal a key, they may be able to manipulate the signing process into authorising a fraudulent transaction. A well-designed policy engine makes the signing infrastructure useless to an attacker who has not also compromised the governance layer.
[01]Role-based access control with principle of least privilege at the operation level, not the account level.
[02]Transaction limits by asset, amount, destination, and time window.
[03]Mandatory destination address whitelisting with time-locked addition of new addresses.
[04]Multi-approver workflows with quorum configuration per policy tier.
[05]Out-of-band approval notification (separate channel from the transaction initiation channel).
[06]Immutable audit log of all approval actions, exported to external SIEM in real time.
[07]Automatic escalation for out-of-policy transactions — not rejection, but human review queue.
[08]Configurable cooling-off periods for large withdrawals.
[09]Geographic and time-of-day restrictions for sensitive operations.
Architecture Recommendations by Firm Profile
| Firm Profile | Recommended Architecture | Key Considerations |
|---|---|---|
| Exchange / Trading Platform High volume, hot wallet dominant | Fireblocks or proprietary MPC-CMP on cloud SGX with dedicated network HSMs for cold tier. 2-of-3 hot, 3-of-5 warm, 4-of-7 cold. | Prioritise throughput and latency on hot path. Rigorous whitelisting on cold path. Dedicated operational security team. |
| Qualified Custodian / Trust Company Regulatory-first, client segregation | Anchorage-style physical HSM infrastructure or licensed Anchorage partnership. Per-client key isolation. OCC/state trust charter alignment. | Regulatory examination readiness is the primary driver. Every design decision must be documentable to a non-technical examiner. |
| Fund Manager / Family Office Self-custody preference, lower volume | 3-of-5 MPC with two internal signers, one external trustee, and two cold backup shares. Hardware key shares (Ledger Enterprise or equivalent). | Disaster recovery with trustee independence is paramount. Prioritise backup share custody and recovery testing over throughput. |
| Protocol Treasury / DAO Decentralisation as governance feature | Gnosis Safe multisig on-chain (transparency as governance) with individual signers using hardware wallets. MPC available as an alternative. | On-chain transparency of multisig structure may be a feature, not a bug. Community auditability of treasury governance outweighs privacy concerns. |
The Architecture Is Never Finished
Institutional custody architecture is not a problem that gets solved at deployment and then maintained. The cryptographic threat landscape evolves — post-quantum migration (NIST's ML-DSA and ML-KEM standards) will require complete key architecture redesign within this decade. The regulatory landscape evolves — the OCC, ECB, MAS, and DFSA are all actively developing custody-specific guidance. The operational landscape evolves — key personnel leave, vendors are acquired, infrastructure providers change their terms.
What distinguishes genuinely institutional-grade custody from custody-adjacent technology is not the sophistication of any individual component but the coherence of the whole: a cryptographic foundation (MPC-CMP on hardware-rooted nodes) that eliminates single points of failure; a governance layer that makes the signing infrastructure useless to an attacker who lacks legitimate authorisation; a disaster recovery framework that has been tested, documented, and explained to regulators; and an operational security culture that treats key ceremony discipline and proactive refresh as core business functions.
The firms that get this right will hold digital assets safely for decades. The ones that mistake a vendor integration for a custody architecture will discover the difference at the worst possible moment.
Institutional Custody FAQs
What is MPC custody, and how does it differ from traditional multisig?
MPC (Multi-Party Computation) custody uses threshold signature schemes to distribute key shares across multiple parties such that no full private key ever exists. Traditional multisig requires N independent private keys, with M-of-N signatures needed to spend. MPC is chain-agnostic, hides the threshold configuration on-chain, supports proactive key refresh, and eliminates smart contract risk — making it the preferred architecture for institutional custodians.
Why do institutional firms prefer TSS over on-chain multisig?
TSS/MPC offers four advantages institutions value: (1) no on-chain disclosure of custody structure, reducing attack surface; (2) chain-agnostic operation across hundreds of blockchains; (3) proactive key refresh without address migration; and (4) no smart contract risk. On-chain multisig retains a role for DeFi treasuries where transparency is a governance feature, but for regulated custodians, TSS is the consensus choice.
What is the role of HSMs in digital asset custody?
Hardware Security Modules establish a hardware-enforced boundary for key material. Even if the operating system, hypervisor, or cloud provider is compromised, keys inside a FIPS 140-2 Level 3 HSM cannot be extracted. In MPC custody, each node holds its key share inside an HSM, meaning an attacker must simultaneously compromise multiple HSMs across different jurisdictions to reconstruct the key — a dramatically harder problem than software-only key storage.
How does Fireblocks' architecture differ from Anchorage's?
Fireblocks is a software-first, cloud-native MPC platform using Intel SGX enclaves as its hardware root of trust. It prioritises throughput, multi-chain support, and rapid onboarding — ideal for trading institutions. Anchorage is a federally chartered digital asset bank built around physical HSMs, biometric-MPC hybrid signing, and OCC-examinable facilities. It prioritises regulatory compliance, human intentionality in signing, and examination readiness — ideal for qualified custodians and trust companies.
Designing Institutional-Grade Custody Architecture · May 2026
Infrastructure analysis for builders · Not financial advice
Layer-1 Scaling: Beyond TPS Metrics
Why throughput alone doesn't tell the full story — and what really matters when evaluating blockchain infrastructure.
Designing Compliant Stablecoin Architectures
Technical and regulatory considerations for building stablecoin systems that work within existing financial frameworks.
Real-World Asset Tokenization: A Practical Guide
From legal structures to smart contract implementation — how to tokenize real assets in a compliant manner.