Privileged Access - “Break-Glass” Design
October 21, 2025
Privileged access “break-glass” design
Design a last-resort path to restore control without creating a standing backdoor.
TL;DR: Keep one narrow, offline-verifiable path for emergency admin access. Make it opt-in, short-lived, multi-party, and fully logged. Store credentials out of band. Test it every month. Kill it fast after use.
What “break-glass” means
Emergency-only elevation used when normal privileged access paths are unavailable or unsafe. It restores command and control during outages, ransomware, or identity provider failure.
Design goals
Security
- No standing trust or always-on accounts
- Out-of-band credential custody
- Two-person integrity for activation
- Shortest possible duration and scope
Reliability
- Works when SSO, MFA, or PAM is down
- Offline-verifiable procedures
- Documented runbooks, practiced often
- Clear ownership and paging
Reference architecture
Minimal viable pattern
- Accounts: Pre-provision disabled emergency admin roles in each critical domain (cloud roots, IdP break-glass, hypervisor mgmt, backups, CI/CD).
- Credentials: Store split knowledge secrets in two independent HSM-backed vaults. Require quorum to reconstruct.
- Devices: Keep two clean-room laptops with full-disk encryption, no EDR dependency, and offline configuration media.
- Paths: Separate “resilience plane” network with bastion, dedicated DNS, and restricted egress. No shared creds with production.
- Controls: Activation via out-of-band comms, change ticket, and time-bound policy that auto-expires.
- Evidence: Append all steps and decisions to an immutable log. Anchor daily hash externally.
Activation flow
| Step | Who | Control | Output |
|---|---|---|---|
| Declare emergency | Incident Commander | SEV-1 playbook, ticket, paging | Incident ID |
| Authorize use | Two executives | Two-person approval, callback verification | Signed approval record |
| Reconstruct secret | Custodians A+B | Separate vaults, audit cameras | Time-limited credential |
| Enable account | Responder | Just-in-time role, max TTL 60–120 min | Active session |
| Perform action | Responder | Screen capture, command logging | Change set applied |
| Revoke and rotate | Responder | Auto-expire, rotate secrets, disable account | Access removed |
| Review | IR + GRC | Immutable log, lessons, improvements | Postmortem |
Technical controls checklist
Identity
- Break-glass roles pre-scoped to least privilege
- Default disabled, activation via policy toggle
- No federation dependency for login path
- Out-of-band MFA tokens stored sealed
Credentials
- Quorum-based retrieval (e.g., 2-of-3 shares)
- Short TTL passwords or ephemeral keys
- Immediate rotation after use
- No reuse across planes or tenants
Access path
- Dedicated bastion with allowlist rules
- Command logging on bastion and target
- Break-glass security group with time lock
- Emergency DNS and IdP fallback documented
Monitoring & evidence
- Immutable log of approvals, tokens, and commands
- Out-of-band alert to executives on activation
- Video capture on custodian stations
- Daily hash anchored to public chain or TSA
Testing cadence
Drills that keep it real
- Monthly tabletop: IdP outage and ransomware scenarios
- Quarterly live test: full path from approval to revoke
- Credential fire drill: recover from vaults without production SSO
- Evidence check: verify the ledger and external anchors
Plain words
Keep one safe door for bad days. Two people must agree to open it. It closes itself fast. Everything is written down where no one can quietly change it. Practice until it is boring.
Minimal ledger entry
— included for trust
Why this block is included
{
"event": "break_glass_activation",
"incident_id": "INC-2025-10-21-042",
"scope": ["aws:org:prod-landing", "idp:admin-fallback"],
"reason": "idp-outage",
"approved_by": ["CISO", "VP-Engineering"],
"credential_source": ["vault-a:share1", "vault-b:share2"],
"ttl_minutes": 90,
"bastion": "resilience-bastion-01",
"timestamp": "2025-10-21T17:05:12Z",
"prev_hash": "b5f1...0a",
"sha3_256": "7a9c...55",
"sig": "dilithium3:base64..."
}
FAQ
PAM and JIT are first choice. Break-glass is for when those controls are down or compromised.
One per critical control plane. Keep them disabled and scoped tightly.
Two-person approval, out-of-band verification, and real-time executive alerts reduce that risk. Evidence trails enable fast response.
As short as possible. Target 60–120 minutes with auto-expiry and forced rotation.
Takeaway: one narrow, testable emergency path. Two-person control. Short life. Full evidence.