Privileged Access - “Break-Glass” Design

October 21, 2025


Privileged access “break-glass” design

Design a last-resort path to restore control without creating a standing backdoor.

TL;DR: Keep one narrow, offline-verifiable path for emergency admin access. Make it opt-in, short-lived, multi-party, and fully logged. Store credentials out of band. Test it every month. Kill it fast after use.

What “break-glass” means

Emergency-only elevation used when normal privileged access paths are unavailable or unsafe. It restores command and control during outages, ransomware, or identity provider failure.

Design goals

Security
  • No standing trust or always-on accounts
  • Out-of-band credential custody
  • Two-person integrity for activation
  • Shortest possible duration and scope
Reliability
  • Works when SSO, MFA, or PAM is down
  • Offline-verifiable procedures
  • Documented runbooks, practiced often
  • Clear ownership and paging

Reference architecture

Minimal viable pattern
  1. Accounts: Pre-provision disabled emergency admin roles in each critical domain (cloud roots, IdP break-glass, hypervisor mgmt, backups, CI/CD).
  2. Credentials: Store split knowledge secrets in two independent HSM-backed vaults. Require quorum to reconstruct.
  3. Devices: Keep two clean-room laptops with full-disk encryption, no EDR dependency, and offline configuration media.
  4. Paths: Separate “resilience plane” network with bastion, dedicated DNS, and restricted egress. No shared creds with production.
  5. Controls: Activation via out-of-band comms, change ticket, and time-bound policy that auto-expires.
  6. Evidence: Append all steps and decisions to an immutable log. Anchor daily hash externally.

Activation flow

StepWhoControlOutput
Declare emergencyIncident CommanderSEV-1 playbook, ticket, pagingIncident ID
Authorize useTwo executivesTwo-person approval, callback verificationSigned approval record
Reconstruct secretCustodians A+BSeparate vaults, audit camerasTime-limited credential
Enable accountResponderJust-in-time role, max TTL 60–120 minActive session
Perform actionResponderScreen capture, command loggingChange set applied
Revoke and rotateResponderAuto-expire, rotate secrets, disable accountAccess removed
ReviewIR + GRCImmutable log, lessons, improvementsPostmortem

Technical controls checklist

Identity
  • Break-glass roles pre-scoped to least privilege
  • Default disabled, activation via policy toggle
  • No federation dependency for login path
  • Out-of-band MFA tokens stored sealed
Credentials
  • Quorum-based retrieval (e.g., 2-of-3 shares)
  • Short TTL passwords or ephemeral keys
  • Immediate rotation after use
  • No reuse across planes or tenants
Access path
  • Dedicated bastion with allowlist rules
  • Command logging on bastion and target
  • Break-glass security group with time lock
  • Emergency DNS and IdP fallback documented
Monitoring & evidence
  • Immutable log of approvals, tokens, and commands
  • Out-of-band alert to executives on activation
  • Video capture on custodian stations
  • Daily hash anchored to public chain or TSA

Testing cadence

Drills that keep it real
  • Monthly tabletop: IdP outage and ransomware scenarios
  • Quarterly live test: full path from approval to revoke
  • Credential fire drill: recover from vaults without production SSO
  • Evidence check: verify the ledger and external anchors

Plain words

Keep one safe door for bad days. Two people must agree to open it. It closes itself fast. Everything is written down where no one can quietly change it. Practice until it is boring.

Minimal ledger entry — included for trust Why this block is included
{
  "event": "break_glass_activation",
  "incident_id": "INC-2025-10-21-042",
  "scope": ["aws:org:prod-landing", "idp:admin-fallback"],
  "reason": "idp-outage",
  "approved_by": ["CISO", "VP-Engineering"],
  "credential_source": ["vault-a:share1", "vault-b:share2"],
  "ttl_minutes": 90,
  "bastion": "resilience-bastion-01",
  "timestamp": "2025-10-21T17:05:12Z",
  "prev_hash": "b5f1...0a",
  "sha3_256": "7a9c...55",
  "sig": "dilithium3:base64..."
}

FAQ

PAM and JIT are first choice. Break-glass is for when those controls are down or compromised.

One per critical control plane. Keep them disabled and scoped tightly.

Two-person approval, out-of-band verification, and real-time executive alerts reduce that risk. Evidence trails enable fast response.

As short as possible. Target 60–120 minutes with auto-expiry and forced rotation.

Takeaway: one narrow, testable emergency path. Two-person control. Short life. Full evidence.