Owner: Chief Operating Officer (COO) and Chief Information Security Officer (CISO)
Co‑Owners: Head of Platform/SRE, Company Secretary, Protocol Custodian, Regional Operations Leads, DPO, GC, Finance Controller
Review cadence: Semi‑annual (and after material incidents/changes)
Purpose. Ensure the Nexus ecosystem (SNC, NatCos, Program SPVs) can withstand, operate through, and recover from disruptive events. Establishes RTO/RPO targets, a vendor criticality matrix, code/data escrow, and a published exercise calendar. Integrates with Annex C (Privacy), D (SDZ/Transfers), E (Security/SLSA/SBOM), F (Incident Response), I (PVAS/Third‑Party Risk), J (Sanctions/Export), N (Records/Dual‑Logging), and O (Employment & Whistleblowing).
1) Scope & Equal‑Treatment Baseline
Applies equally to all regions (APAC, Middle East, East Africa, Southern Africa, EU/France, USA, Canada, Brazil/LatAm, Senegal/West Africa, Switzerland). Where host‑law or donor conditions prescribe stricter continuity or disclosure requirements, the most restrictive applies. Privacy‑first restoration: no PII is exposed outside SDZs during recovery (Annex D).
2) Roles & Governance
- COO (Owner): BCP sponsor; approves exercise calendar; chairs Resilience Steering Group (RSG).
- CISO (Owner): DR lead; owns backup/restore, failover design, and cyber resilience; approves recovery runbooks.
- Head of Platform/SRE: Operates DR tooling; executes failover/fallback; maintains telemetry.
- Protocol Custodian: Ensures Nexus Ledger continuity; manages signer keys; executes Emergency Record process (Annex N) if required.
- Company Secretary: Ensures Class A/B records and QPP disclosures per Annex N.
- DPO/GC: Validate privacy/legal aspects of recovery and communications.
- Regional Leads: Local coordination with authorities and PoRs.
- Business Owners: Maintain process‑level playbooks and minimum viable operations (MVO) procedures.
- PVAS Board: Oversees Tier‑1 vendor resilience attestations and joint tests (Annex I).
3) Business Impact Analysis (BIA) & Service Tiering
- BIA cadence: Refresh annually and after major changes.
-
Service tiers:
– Tier 0 — Mission‑critical / Safety‑of‑life: Ledger/registry posting, EWS alerts, payout timers, SDZ boundary controls, identity/SSO, network core.
– Tier 1 — Critical business: Program management, NEXQ workflows, payments telemetry, calc‑agent tooling, customer support.
– Tier 2 — Important: Analytics workbenches, reporting, training environments.
– Tier 3 — Standard: Non‑urgent back‑office tools, docs sites.
BIA outputs: Maximum Tolerable Downtime (MTD), Recovery Time Objective (RTO), Recovery Point Objective (RPO), Minimum Operating Capacity (MOC), dependencies, manual workarounds.
4) Targets — RTO/RPO & Availability (by tier)
| Tier | RTO (target) | RPO (target) | Availability SLO | Notes |
|---|---|---|---|---|
| 0 | ≤ 4 hours (hot/warm) | ≤ 15 minutes | ≥ 99.95% | Dual‑region/zone; quorum keys available; emergency offline mode per Annex N. |
| 1 | ≤ 8 hours | ≤ 1 hour | ≥ 99.9% | Warm standby, automated infra as code (IaC) rebuild. |
| 2 | ≤ 24 hours | ≤ 4 hours | ≥ 99.5% | Cold/warm; manual steps acceptable. |
| 3 | ≤ 72 hours | ≤ 24 hours | Best effort | Restore from backups; manual workarounds. |
Review: Validate targets at each semi‑annual review and after major incidents.
5) Resilience Patterns & Runbooks
- Active‑active / Multi‑AZ: For Tier 0 components where latency allows.
- Pilot‑light / Warm standby: For Tier 1; replicate data continuously with minimal pre‑provisioned capacity.
- Cold standby with IaC: For Tier 2/3 on low‑cost footprint.
- SDZ isolation: Ability to unilaterally disconnect external links (Annex D) while sustaining in‑country operations.
- Key management continuity: HSM/KMS redundancy within jurisdiction; break‑glass with dual approval and audit.
- Identity resilience: Secondary IdP / emergency local accounts; offline MFA token packs under seal; rotation after use.
- Network & DNS: Secondary DNS provider; pre‑staged cut‑over runbook; BGP/peering contingencies where applicable.
- Data integrity: Append‑only logs; hash/manifest verification against ledger; quarantine on mismatch.
- Ledger/Registry: Offline Emergency Record (Annex N §11) if ledger unavailable; post back within 72h.
Standard runbooks (minimum set):
A) Cloud provider region outage
B) SDZ unilateral disconnect
C) Ransomware/crypto‑locker (restore & re‑key)
D) KMS/HSM loss or compromise
E) IdP outage / auth failure
F) Database corruption
G) DNS/PKI compromise
H) Payment Partner‑of‑Record outage
I) Critical SaaS outage (ticketing, comms)
J) Natural disaster affecting facilities/personnel
6) Backups — Policy & Testing
- 3‑2‑1(+1) rule: 3 copies, 2 media, 1 off‑site, +1 immutable/air‑gapped.
- Scope: Source, databases, object stores, secrets, Terraform state, container registries, ledger/registry artifacts, collaboration repos.
- Encryption: At rest/in transit; keys jurisdiction‑pinned; escrowed per §8.
- Schedules: Tier 0: near‑real‑time + hourly snaps; Tier 1: hourly; Tier 2: 6‑hourly; Tier 3: daily.
- Testing cadence: Quarterly restore drills for Tier 0/1; semi‑annual for Tier 2; annual for Tier 3. Record restore time and data loss vs targets.
- Retention: Align to legal/contractual retention; minimum 30 days point‑in‑time; long‑term archives where required.
7) Vendor Criticality Matrix & Joint Testing
- Tiering: Align vendor tiers to Annex I (Tier 1 Critical; Tier 2 Important; Tier 3 Standard).
- Minimum evidence for Tier 1: SOC2/ISO, BCP/DR plan, last 12‑month test reports, RTO/RPO declarations, geographic redundancy, incident notice ≤24h, and commitment to joint exercises.
- Contractual clauses: DR SLOs, data residency (Annex D), key custody, export controls (Annex J), audit/attest rights, termination for sustained failure.
- Concentration risk: Track critical vendor concentration index; require alternates or exit plans where risk is high.
- Joint exercises: At least annually with each Tier‑1 vendor on a realistic failover scenario.
8) Code & Data Escrow (Continuity Assurances)
- Code escrow: For critical proprietary components; quarterly deposits, independent escrow agent, and annual restore test.
- Data escrow: Quarterly signed snapshots (schema‑stable, hashed) for customer portability (Annex D §8).
- Release triggers: Insolvency, extended SLA breach, regulator order, or security failure; tested in tabletop every 6 months.
9) Communications & Stakeholder Management
- Declare/Stand‑down: Incident Commander (Annex F) declares DR event; COO confirms business continuity posture.
- Internal comms: Situation reports (SITREPs) at defined cadence; single source of truth channel.
- External comms: Customers/PoRs/regulators per Annex F/C/J; no PII; provide RTO/RPO expectations and updates.
- Public disclosures: QPP metrics and narrative (Annex N).
- After‑action: Post‑mortem within 10 business days, with remediation owners/dates.
10) Exercise Calendar & Scoring
Minimum annual plan (rolling 12 months):
- Quarterly tabletops: Alternate scenarios (A–J in §5).
- Semi‑annual live failover: One Tier‑0 and one Tier‑1 scenario, including SDZ unilateral disconnect.
- Annual full‑restore test: From immutable backups to clean environment (prove RPO).
- Vendor joint test: Each Tier‑1 vendor at least once per year.
- People & facilities drills: Evacuation/safety, loss of facility, pandemic roster.
Scoring & success criteria:
- RTO attained vs target; RPO attained (minutes of data loss); time to declare; time to first customer comms; time to steady state; # runbook deviations; residual risk rating.
- Colour score (Green/Amber/Red) with actions and deadlines; track in Resilience Scorecard.
11) Change Management & Triggers
DR plans must be updated when any of the following occur:
- New country/region or SDZ; new Tier‑1 vendor; material architectural change; key rotation scheme change; new PoR; major product release; significant regulatory change; notable incident/near‑miss.
12) KPIs, Reporting & Assurance
- KPIs: RTO/RPO attainment rate; restore success rate; exercise completion %; aged actions; critical vendor concentration; escrow test pass rate; incident MTTR; % services with current runbooks.
- Reporting: Quarterly to RSG and Board; public metrics in QPP (Annex N).
- Assurance: Internal audit annually; optional third‑party DR assessment every 24 months.
13) Exceptions & Waivers
Document in Resilience Exception Register with risk assessment, compensating controls, expiry, and approval by COO + CISO; material exceptions notified to the Board.
14) Effective Date & Governance
Adopted by the Board(s) of all regional operators on [●] and incorporated by reference into Charters/Bylaws, vendor contracts (as applicable), and operational SOPs. Class B to amend/strengthen; Class A to relax RTO/RPO targets, weaken escrow/testing, or reduce vendor evidence requirements.
Appendices (Templates)
P‑1 — BIA Worksheet & Service Tiering Catalog (fields: process, owner, dependencies, MTD, RTO, RPO, MOC, workarounds)
P‑2 — DR Runbook Template (trigger, assumptions, steps, rollback, comms, owners, success criteria)
P‑3 — Backup Matrix (systems × frequency × retention × encryption × test cadence)
P‑4 — Vendor Criticality Matrix (vendor, tier, function, RTO/RPO, evidence on file, last test date)
P‑5 — Exercise Calendar & Scorecard (scenario, date, participants, targets, results, actions)
P‑6 — Code/Data Escrow Checklist (scope, deposit cadence, agent, release triggers, last restore test)