Custody · Security · 202625 min read

Designing Institutional-Grade
Custody Architecture

MPC, HSM, and Multi-Party Signing for Digital Asset Firms — a complete technical teardown of the protocols, tradeoffs, and production architectures that govern how institutions secure billions in digital assets.

$200B+Assets under institutional digital custody, 2025
3–5sTarget signing latency for MPC production systems
2-of-3Minimum threshold scheme for institutional governance
$0Full private key ever assembled in MPC custody

In traditional finance, custody is understood. Regulated custodians hold bearer instruments in vaulted facilities under well-established legal frameworks, with decades of court precedent establishing what control means and who bears liability when it is lost. In digital assets, custody is a cryptographic problem wearing a compliance uniform — and the engineering decisions made at its foundation determine not just operational risk, but whether assets can be recovered at all.

The custody stack for a regulated digital asset firm managing significant AUM must satisfy a set of requirements that are individually demanding and collectively contradictory: keys must be completely inaccessible to any single insider at any moment, yet the firm must be able to sign transactions in seconds to meet trading desk SLAs. The system must survive the simultaneous compromise of multiple infrastructure components, yet maintain availability that satisfies institutional clients used to 99.99% uptime. It must be auditable enough to satisfy regulators, yet private enough to protect clients from targeted attacks.

This analysis examines how the leading institutional custody architectures — and the protocols that underpin them — navigate these contradictions.

01 · Security

The Threat Model: What Institutional Custody Must Defeat

Before evaluating any custody technology, a firm must specify what it is defending against. The threat landscape for institutional digital asset custody is broader than it first appears, and different custody architectures optimise for different subsets of it.

External Threat Vectors

  • Network intrusion: Compromise of signing infrastructure via software vulnerabilities, supply chain attacks, or zero-days in dependencies
  • Social engineering: Targeted attacks on operators and administrators to extract key material or authorise fraudulent transactions
  • Side-channel attacks: Timing, power analysis, or electromagnetic attacks on HSM hardware, particularly relevant for cold signing paths
  • Supply chain compromise: Malicious firmware, compromised HSMs, or backdoored cryptographic libraries introduced before deployment

Internal Threat Vectors

  • Rogue employee: Single authorised operator exfiltrating key material or self-approving fraudulent withdrawals
  • Collusion: Multiple insiders coordinating to meet signing thresholds without legitimate business purpose
  • Coercion: Physical or legal pressure on key holders to sign without proper governance process
  • Operational error: Accidental key deletion, backup corruption, or procedural failures that result in permanent loss
The Dual Failure Mode Problem

Every custody architecture must be evaluated against two distinct failure modes: theft (keys exfiltrated, assets stolen) and loss (keys destroyed or inaccessible, assets permanently locked). These failure modes are in direct tension — measures that reduce theft risk often increase loss risk, and vice versa. The engineering task is calibrating this tradeoff for the firm's specific risk tolerance and regulatory context.

02 · Cryptography

TSS vs. On-Chain Multisig: The Foundational Choice

The first architectural decision is whether multi-party control of assets will be implemented via threshold signature schemes (TSS), a cryptographic technique, or on-chain multisig, a smart contract or protocol-level construct.

DimensionTSS / MPCOn-Chain Multisig
Key ArchitectureNo full private key ever exists. Key shares held by separate parties; signing via cryptographic protocol.N private keys exist independently. Smart contract or script requires M-of-N signatures to spend.
On-Chain FootprintIndistinguishable from single-sig on-chain. No disclosure of threshold configuration. Lower fees.Multisig structure publicly visible on-chain. M and N values disclosed. Higher fees on some chains.
Chain CompatibilityChain-agnostic — works for any chain supporting the underlying signature scheme. Requires per-chain integration work.Chain-specific — each chain has different multisig primitives. UTXO chains, EVM chains, and others require separate implementations.
Smart Contract RiskNo smart contract risk. Signing is off-chain cryptography; no exploitable on-chain logic.Smart contract bugs are a material attack surface. Multisig contracts have been exploited (Parity wallet, $150M frozen).
Latency3–10 rounds of network communication between signing nodes. Adds latency; must be engineered for SLA.Each signer signs independently. Aggregation is trivial. Low coordination latency.
Key RefreshKey shares can be refreshed proactively — parties compute new shares without changing the underlying key. Old shares become useless.Key rotation requires new address generation and asset migration. Expensive and creates operational risk.
Regulatory OpticsPreferred by most institutional regulators. No on-chain disclosure of custody structure. Harder to target.On-chain transparency can be a compliance asset (auditability) or a liability (attack surface disclosure).

The institutional consensus has moved decisively toward TSS/MPC for most use cases. The combination of no single point of cryptographic failure, chain-agnosticism, and on-chain indistinguishability makes it the superior architecture for firms managing diverse asset portfolios under regulatory scrutiny.

The private key is the asset. Any architecture where a full private key exists — even transiently, even in hardware — has a single point of cryptographic failure. MPC's value proposition is not that it distributes risk. It is that it eliminates the key as a singular attack target entirely.

— Design principle, institutional custody engineering
03 · Protocol

The MPC-CMP Protocol: How Modern MPC Signing Works

The cryptographic engine underneath modern institutional MPC custody is typically an implementation of MPC-CMP (Multi-Party Computation for ECDSA, CMP variant), published by Ran Canetti, Rosario Gennaro, Steven Goldfeder, Nikolaos Makriyannis, and Udi Peled in 2020. MPC-CMP replaced earlier ECDSA threshold protocols that required a preprocessing round and were vulnerable to certain abort attacks.

Key Generation (Distributed Key Generation — DKG)

The protocol begins with distributed key generation: a process by which n parties jointly compute a public key and receive individual secret shares — without any party or external coordinator ever seeing the full private key. The DKG procedure in MPC-CMP uses Feldman's verifiable secret sharing, ensuring that each party's share is consistent with the others and that the resulting public key is verifiable without revealing the private key.

Protocol · MPC-CMP DKG — Distributed Key Generation
// n parties jointly generate keypair. No party sees sk.
// Feldman VSS + Schnorr commitment scheme

Round 1  →  Each party Pᵢ samples secret sᵢ ← ℤₙ
           Commits to polynomial fᵢ(x) where fᵢ(0) = sᵢ
           Broadcasts Pedersen commitment Cᵢ + Schnorr proof

Round 2  →  Each Pᵢ sends share fᵢ(j) to party Pⱼ (encrypted)
           Parties verify shares against commitments
           Abort if verification fails → reshare with honest parties

Output   →  Party Pᵢ holds secret share xᵢ = Σⱼ fⱼ(i) mod n
           Public key pk = g^(Σ xᵢ) is known to all
           sk = Σ xᵢ is NEVER assembled by any party

Threshold Signing — The CMP Improvement

The signing protocol in MPC-CMP achieves ECDSA threshold signing in 4 rounds (reduced from earlier protocols requiring preprocessing). The key innovation is the use of Paillier encryption for the randomness multiplication step — the part of ECDSA generation that previously required either a trusted dealer or expensive zero-knowledge proofs.

Protocol · MPC-CMP Signing — t-of-n Threshold Signing
// t parties (of n) sign message m without exposing key shares
// Paillier homomorphic encryption used for nonce multiplication

Round 1  →  Each signer samples nonce share kᵢ, γᵢ ← ℤₙ
           Broadcasts Paillier encryption: Enc(kᵢ), Enc(γᵢ)
           Commits to elliptic curve points Γᵢ = γᵢ·G

Round 2  →  Parties compute MtA (Multiplicative-to-Additive) shares
           kᵢ · γⱼ computed via Paillier without revealing kᵢ or γⱼ
           ZK proofs verify correct Paillier computation

Round 3  →  Parties reveal Γᵢ, compute R = k⁻¹·G
           Each party computes partial signature σᵢ
           Broadcasts σᵢ with ZK consistency proof

Output   →  Any party aggregates: σ = Σ σᵢ mod n
           ECDSA signature (r, s) is valid for pk over m
           No party's key share xᵢ was ever revealed

Proactive Secret Sharing and Key Refresh

MPC-CMP supports proactive secret sharing: periodic refresh of all key shares such that the new shares are mathematically unrelated to the old ones (while the underlying key remains unchanged). This is critical for institutional security: key shares exfiltrated before a refresh are useless after it. Production systems should run proactive refresh on a schedule — daily or weekly for hot wallets, monthly for warm, on-demand for cold.

04 · Hardware

HSM Integration: Anchoring MPC in Hardware Trust

MPC alone is a cryptographic protocol running on general-purpose compute. Without hardware roots of trust, the security boundary of each MPC node extends to the entire software stack — operating system, hypervisor, cloud provider infrastructure, and every library in the dependency chain. Hardware Security Modules (HSMs) establish a hardware-enforced boundary within which key material cannot be extracted regardless of software compromise.

// HSM-Anchored MPC Node Architecture
Policy Engine / Transaction Authorisation
MPC Protocol Runtime (userspace)
│ PKCS#11 / vendor API
HSM Boundary
Key Share StorageCrypto EngineAttestation / RNG
FIPS 140-2 Level 3 / CC EAL4+ · Tamper-responsive enclosure · Zeroisation on attack

HSM Selection Criteria for MPC Nodes

Not all HSMs are equal, and the MPC use case creates specific requirements that differ from traditional HSM deployments:

Required Capabilities

  • FIPS 140-2 Level 3 minimum — Level 4 preferred for tier-1 cold signing nodes
  • Custom firmware support — MPC protocol primitives may need to run inside the HSM boundary
  • High-throughput ECDSA — signing throughput measured in operations per second
  • Remote attestation — ability to cryptographically prove firmware integrity to remote verifiers

Vendor Landscape

  • Thales Luna Network HSM — dominant in banking; excellent PKCS#11 support; used in Fireblocks node infrastructure
  • Utimaco SecurityServer — strong European regulatory track record; BaFin/ECB compliance
  • AWS CloudHSM / Azure Dedicated HSM — cloud-native; limited custom firmware options
  • Ledger Vault (enterprise) — purpose-built for crypto; strong UX for governance workflows
The Cloud HSM Tradeoff

Cloud-hosted HSMs offer operational convenience and elastic scalability, but introduce a dependency on cloud provider infrastructure integrity. For tier-1 cold storage, dedicated on-premise HSMs in physically controlled facilities remain the gold standard. Cloud HSMs are appropriate for warm and hot tiers where the tradeoff between convenience and absolute security has already been made.

05 · Operations

Key Ceremony Design: The Most Critical Operational Moment

The key ceremony — the procedure by which the distributed key generation protocol is executed to create a new custody key — is the single most consequential operational event in a custody firm's lifecycle. A ceremony executed correctly creates a key whose security is mathematically guaranteed by the MPC protocol. A ceremony with procedural flaws can create attack vectors that persist for the lifetime of the key.

I

Pre-Ceremony Planning & Participant Selection

Define the threshold scheme (t-of-n). Designate signing nodes and their operators — ensuring geographic separation, organisational independence, and background-checked personnel. Document the governance policy. Establish secure communication channels. Engage independent ceremony observers for audit purposes.

II

Environment Preparation & Verification

Each signing node must be provisioned, its firmware verified against published hashes, and its HSM initialised with tamper-evident seals verified by independent witnesses. Software on ceremony machines should be built from audited source. Network isolation — no internet connectivity during key generation; air-gapped machines preferred for cold tier ceremonies.

III

Distributed Key Generation Execution

The DKG protocol runs across all n nodes simultaneously. Each node generates its entropy, commits to its polynomial, exchanges encrypted shares, verifies received shares, and outputs its key share to HSM-protected storage. No node should output its share to any medium other than the designated HSM. The resulting public key is extracted and verified by all participants.

IV

Verification & Test Signing

A test signing round is executed with the new key: a transaction to a dust address is constructed, signed by the minimum required threshold, and broadcast to mainnet. Successful confirmation proves the key is valid. The public key's derivation path is recorded and independently verified. All ceremony logs, hashes, and participant attestations are collected and archived.

V

Backup Share Distribution & Recovery Documentation

Encrypted backup shares are distributed to geographically and organisationally separate custodians. Recovery procedures are documented in detail: who holds which backup share, under what governance conditions they may be used, what verification is required before reconstruction, and how the recovered key is reintegrated into live infrastructure.

06 · Resilience

Disaster Recovery Architecture: Designing for the Unthinkable

Disaster recovery in custody is not a single procedure — it is a tiered set of procedures corresponding to failure scenarios of different severity. A firm that plans only for "one node goes down" has not planned for disaster recovery. True DR planning must include scenarios that no individual within the organisation wants to contemplate: simultaneous destruction of multiple facilities, death or incapacitation of key personnel, catastrophic infrastructure failure, and hostile legal actions in multiple jurisdictions.

ScenarioRecovery MechanismRTO TargetGovernance Required
Single node failureAutomatic failover to standby node with pre-provisioned key share replica< 30 secondsNone — automated
Majority node failureCold backup key shares activated; new signing quorum assembled4–24 hoursSenior operations + compliance sign-off
Full infrastructure lossShamir backup shares reconstructed by designated trustees; assets swept to new key24–72 hoursBoard-level approval + independent trustee coordination
Key share compromiseImmediate key refresh; compromised node isolated; audit initiated1–4 hoursCISO + external security firm engagement
Firm dissolution / insolvencyClient assets swept via court-appointed trustee using escrowed backup sharesDays to weeksLegal process + independent trustee
The Trustee Independence Problem

The most difficult DR scenario — firm dissolution — requires backup shares to be held by parties who are completely independent of the custody firm, yet bound by contractual and fiduciary obligations to clients. This is not purely a technical problem. It requires legal agreements, regulatory approval in the applicable jurisdictions, and trustees who themselves have robust custody infrastructure. Regulated trust companies, law firms with specific crypto mandates, and purpose-built independent trustee services all serve this function — each with distinct risk profiles and regulatory implications.

07 · Case Study

Fireblocks Architecture Teardown

Fireblocks represents the dominant institutional MPC custody infrastructure provider, with over 1,800 institutional clients and $4 trillion in annualised transfer volume. Its architecture makes specific engineering choices that reflect its target market — active trading institutions requiring high-throughput, low-latency signing across hundreds of blockchains.

The MPC-CMP Implementation

Fireblocks implements MPC-CMP with a 2-of-3 default threshold scheme: one share held by the Fireblocks cloud infrastructure, one by the client's mobile device (protected by the device's secure enclave), and one in cold backup storage. This design creates a specific security model: any signing operation requires Fireblocks infrastructure cooperation, which is simultaneously its primary security guarantee and its primary regulatory concern.

// Fireblocks Three-Share Architecture (Simplified)
Fireblocks SGX Enclave
Cloud · Always available
Share [1 of 3]
Client Mobile / HSM
iOS Secure Enclave
Share [2 of 3]
Cold Backup Share
Encrypted · Offline
Share [3 of 3]
2-of-3 required to sign · Fireblocks always participates in hot-path signing

Intel SGX as Software HSM

Fireblocks uses Intel SGX (Software Guard Extensions) as the hardware root of trust for its cloud-hosted key share, rather than traditional HSMs. SGX enclaves provide memory encryption and attestation — code running inside an enclave cannot be inspected by the host OS, hypervisor, or cloud provider. This is a pragmatic choice for a cloud-native architecture.

The tradeoff is that SGX has a documented history of side-channel vulnerabilities (Spectre, Meltdown, Plundervolt, SGAxe) that have periodically allowed enclave memory to be read by local attackers. Fireblocks mitigates this through SGX mitigations, regular patching, and the structural guarantee that the SGX share alone is insufficient to sign — but it represents a genuine divergence from the physical tamper-resistance model of FIPS 140-2 Level 3 HSMs.

Policy Engine and Network Architecture

Fireblocks operates a dedicated Fireblocks Network — a permissioned network for inter-institution transfers that allows direct asset movement between Fireblocks clients without on-chain settlement, settling net positions periodically. The policy engine supports complex governance rules: withdrawal limits by asset and amount, whitelisted destination addresses, multi-approver workflows with biometric authentication, and time-lock conditions. These policy rules are enforced at the MPC signing layer — a transaction that violates policy cannot be signed, not merely rejected at the application layer.

08 · Case Study

Anchorage Architecture Teardown

Anchorage Digital, the first federally chartered digital asset bank in the United States (OCC charter, 2021), represents a fundamentally different architectural philosophy from Fireblocks: rather than a software-first MPC platform with cloud HSMs, Anchorage is built around physical infrastructure designed to satisfy the most stringent banking regulators in the world.

The Biometric-MPC Hybrid Model

Anchorage's signature architectural innovation is the integration of biometric authentication directly into the MPC signing quorum. Rather than relying solely on device possession, Anchorage requires biometric verification (fingerprint or face) from designated human approvers, whose biometric data is stored in an on-device secure enclave and never transmitted to Anchorage's servers. The biometric verification unlocks the approver's key share for the duration of the signing session only.

This architecture satisfies a specific regulatory requirement: human intentionality in the signing process. A rogue automated system cannot satisfy the biometric threshold without physical human cooperation — directly addressing the coercion and rogue employee scenarios that concern banking regulators most acutely.

Physical Infrastructure and Regulatory Design

As a nationally chartered bank, Anchorage's custody infrastructure is designed to satisfy OCC Handbook requirements for custodial services, NIST SP 800-57 key management standards, and the examination expectations of federal bank examiners. This creates a design constraint that differs fundamentally from non-bank MPC vendors: every component must be documentable, examinable, and explainable to a non-technical regulatory audience.

Concretely, this means: physical HSMs in OCC-examinable facilities rather than cloud SGX enclaves; documented key ceremonies with notarised witness attestations; explicit chains of custody for all cryptographic material; and signing workflows that map to traditional banking dual-control concepts that regulators already understand.

09 · Governance

Policy Engine and Governance Layer Design

The cryptographic layer — MPC signing, HSM protection — solves the key security problem. The governance layer solves the authorisation problem: even if an attacker cannot steal a key, they may be able to manipulate the signing process into authorising a fraudulent transaction. A well-designed policy engine makes the signing infrastructure useless to an attacker who has not also compromised the governance layer.

Governance Layer Requirements Checklist

[01]Role-based access control with principle of least privilege at the operation level, not the account level.

[02]Transaction limits by asset, amount, destination, and time window.

[03]Mandatory destination address whitelisting with time-locked addition of new addresses.

[04]Multi-approver workflows with quorum configuration per policy tier.

[05]Out-of-band approval notification (separate channel from the transaction initiation channel).

[06]Immutable audit log of all approval actions, exported to external SIEM in real time.

[07]Automatic escalation for out-of-policy transactions — not rejection, but human review queue.

[08]Configurable cooling-off periods for large withdrawals.

[09]Geographic and time-of-day restrictions for sensitive operations.

10 · Strategy

Architecture Recommendations by Firm Profile

Firm ProfileRecommended ArchitectureKey Considerations
Exchange / Trading Platform High volume, hot wallet dominantFireblocks or proprietary MPC-CMP on cloud SGX with dedicated network HSMs for cold tier. 2-of-3 hot, 3-of-5 warm, 4-of-7 cold.Prioritise throughput and latency on hot path. Rigorous whitelisting on cold path. Dedicated operational security team.
Qualified Custodian / Trust Company Regulatory-first, client segregationAnchorage-style physical HSM infrastructure or licensed Anchorage partnership. Per-client key isolation. OCC/state trust charter alignment.Regulatory examination readiness is the primary driver. Every design decision must be documentable to a non-technical examiner.
Fund Manager / Family Office Self-custody preference, lower volume3-of-5 MPC with two internal signers, one external trustee, and two cold backup shares. Hardware key shares (Ledger Enterprise or equivalent).Disaster recovery with trustee independence is paramount. Prioritise backup share custody and recovery testing over throughput.
Protocol Treasury / DAO Decentralisation as governance featureGnosis Safe multisig on-chain (transparency as governance) with individual signers using hardware wallets. MPC available as an alternative.On-chain transparency of multisig structure may be a feature, not a bug. Community auditability of treasury governance outweighs privacy concerns.
Conclusion

The Architecture Is Never Finished

Institutional custody architecture is not a problem that gets solved at deployment and then maintained. The cryptographic threat landscape evolves — post-quantum migration (NIST's ML-DSA and ML-KEM standards) will require complete key architecture redesign within this decade. The regulatory landscape evolves — the OCC, ECB, MAS, and DFSA are all actively developing custody-specific guidance. The operational landscape evolves — key personnel leave, vendors are acquired, infrastructure providers change their terms.

What distinguishes genuinely institutional-grade custody from custody-adjacent technology is not the sophistication of any individual component but the coherence of the whole: a cryptographic foundation (MPC-CMP on hardware-rooted nodes) that eliminates single points of failure; a governance layer that makes the signing infrastructure useless to an attacker who lacks legitimate authorisation; a disaster recovery framework that has been tested, documented, and explained to regulators; and an operational security culture that treats key ceremony discipline and proactive refresh as core business functions.

The firms that get this right will hold digital assets safely for decades. The ones that mistake a vendor integration for a custody architecture will discover the difference at the worst possible moment.

Frequently Asked Questions

Institutional Custody FAQs

What is MPC custody, and how does it differ from traditional multisig?

MPC (Multi-Party Computation) custody uses threshold signature schemes to distribute key shares across multiple parties such that no full private key ever exists. Traditional multisig requires N independent private keys, with M-of-N signatures needed to spend. MPC is chain-agnostic, hides the threshold configuration on-chain, supports proactive key refresh, and eliminates smart contract risk — making it the preferred architecture for institutional custodians.

Why do institutional firms prefer TSS over on-chain multisig?

TSS/MPC offers four advantages institutions value: (1) no on-chain disclosure of custody structure, reducing attack surface; (2) chain-agnostic operation across hundreds of blockchains; (3) proactive key refresh without address migration; and (4) no smart contract risk. On-chain multisig retains a role for DeFi treasuries where transparency is a governance feature, but for regulated custodians, TSS is the consensus choice.

What is the role of HSMs in digital asset custody?

Hardware Security Modules establish a hardware-enforced boundary for key material. Even if the operating system, hypervisor, or cloud provider is compromised, keys inside a FIPS 140-2 Level 3 HSM cannot be extracted. In MPC custody, each node holds its key share inside an HSM, meaning an attacker must simultaneously compromise multiple HSMs across different jurisdictions to reconstruct the key — a dramatically harder problem than software-only key storage.

How does Fireblocks' architecture differ from Anchorage's?

Fireblocks is a software-first, cloud-native MPC platform using Intel SGX enclaves as its hardware root of trust. It prioritises throughput, multi-chain support, and rapid onboarding — ideal for trading institutions. Anchorage is a federally chartered digital asset bank built around physical HSMs, biometric-MPC hybrid signing, and OCC-examinable facilities. It prioritises regulatory compliance, human intentionality in signing, and examination readiness — ideal for qualified custodians and trust companies.

Designing Institutional-Grade Custody Architecture · May 2026

Infrastructure analysis for builders · Not financial advice

Related Reading