KPMG's 2025 Business Resiliency Survey found that only 50% of organizations have fully implemented resiliency plans even for their critical processes. Just 17% have extended those plans beyond critical processes. And 51% of organizations only test and update their plans once a year — which, if you've worked in infrastructure, you know means the plan is wrong within weeks of the review.
This isn't a knowledge gap. The people responsible for these plans know they should be maintained. They just aren't, because the way organizations approach BCP maintenance is fundamentally broken.
What Actually Happens
Here's the pattern, and if you've been in this role, you'll recognize it immediately.
Somebody — usually a compliance officer, a risk manager, or an IT director who drew the short straw — writes the BCP. It takes weeks. It involves interviewing department heads who give vague answers about which systems they use. It involves drawing architecture diagrams that are roughly correct at the time. It involves estimating recovery time objectives that nobody has actually tested. The resulting document ends up in SharePoint, or Confluence, or a shared drive. It's 40 to 80 pages long. It gets a sign-off from senior leadership.
And then nobody opens it again for a year.
When the annual review comes around, the plan owner sends emails asking if anything has changed. Department heads respond weeks late, or not at all. The plan owner makes a few surface edits — updates a phone number, changes a vendor name — and re-certifies the document. The compliance box is checked. Everyone moves on.
Meanwhile, infrastructure has changed completely. The engineering team migrated two databases, decommissioned a service, onboarded a new payment processor, and moved a workload to a different cloud region. None of those changes made it into the BCP because the people making those changes aren't the people who own the plan, and there's no process connecting deployments to continuity documentation.
The dependency diagram on page 14 hasn't been accurate since three months after it was drawn. Everyone in the room knows this, but nobody says it because redrawing it would take another two weeks of interviews.
The Tabletop Exercise
Most organizations that take BCP seriously run a tabletop exercise at least once a year. In theory, this is where you walk through a disruption scenario and test whether the plan holds up.
In practice, it's where everyone discovers the plan is wrong.
The contact list has three people who left the company. The runbook references a system that was replaced eight months ago. Someone asks "what about the dependency on the third-party auth provider?" and nobody in the room knows the answer because the engineer who set up that integration left last year and the documentation was a Slack thread.
This isn't hypothetical. Reddit experienced a 314-minute outage on Pi Day 2023 because a Kubernetes upgrade broke a Calico networking configuration that had been set up years earlier by engineers who had since left. The configuration wasn't documented anywhere. There was no runbook for it. The knowledge existed only in the heads of people who no longer worked there. Fifty-two million users went dark for over five hours because of a dependency that lived as tribal knowledge.
A managed services provider on r/sysadmin shared an even grimmer story: their custom DNS system failed completely, wiping all domain records. Every person who understood the system had left. Backups were non-functional. They couldn't even contact their 300 employees because the communication systems depended on the DNS that had just died.
These are extreme cases. But the underlying pattern — critical knowledge that exists only in people's heads, documentation that doesn't match production, plans that describe a system that no longer exists — is universal.
Why Plans Drift
The root cause isn't laziness. It's structural.
Infrastructure changes faster than documentation. Engineering teams ship changes daily or weekly. The BCP is reviewed annually, maybe quarterly. That's a 50x to 250x mismatch between the rate of change and the rate of documentation. The plan is stale by design.
The people who change things aren't the people who own the plan. An engineer migrating a database doesn't think about the BCP. Why would they? It's someone else's document in someone else's SharePoint folder. There's no trigger, no workflow, no integration that says "you just changed a critical dependency — the continuity plan needs to be updated." The feedback loop doesn't exist.
Dependency knowledge is distributed and informal. Ask five engineers which services depend on the Redis cluster and you'll get five different answers, all partially correct. The real dependency map lives across dozens of people's mental models, not in any single document. Capturing it accurately requires either exhaustive manual interviews (which nobody has time for) or automated discovery (which most organizations haven't set up).
Testing is expensive and disruptive. A proper disaster recovery test means taking production-like systems offline, executing recovery procedures, and measuring whether you meet your RTOs. That requires coordination, downtime windows, and engineering hours. Most organizations can justify doing it once a year. Once a year isn't enough.
What "Actually Maintaining It" Looks Like
The organizations that keep their BCPs current don't do it by trying harder at the annual review. They change the architecture of how the plan is maintained.
Embed continuity into the change management workflow. Every infrastructure change — new service, decommissioned database, new third-party provider, changed cloud region — should trigger a continuity review. Not a full re-write. A check: does this change affect a critical service? Does it introduce a new dependency? Does it create a single point of failure? If you can't answer those questions for a given change, your plan is already drifting.
Automate dependency discovery. Stop drawing dependency diagrams by hand and then hoping they stay accurate. Infrastructure-as-code configs, network traffic data, and cloud provider APIs already contain the information you need. Tools that build dependency maps from actual infrastructure data produce graphs that reflect reality, not memory. When the infrastructure changes, the map changes with it.
Run simulations regularly, not annually. The point of a simulation isn't to confirm that your plan works. It's to find out where it breaks. Operational resilience modeling lets you test "what happens if this service goes down" against your actual dependency graph, without the overhead of a full tabletop exercise. Run them monthly. Run them before major deployments. Run them when you onboard a new third-party provider. The more frequently you test, the smaller the gap between your plan and reality.
Make the plan a living system, not a document. A BCP that's a 60-page Word doc will always decay. A BCP that's connected to your infrastructure — where the dependency map updates automatically, where single points of failure are flagged in real time, where simulation results are current — stays accurate because it's built on live data, not last quarter's interviews.
The Bottom Line
The problem with how companies maintain business continuity plans isn't that they lack discipline. It's that they're maintaining a static artifact in a dynamic environment. The plan is a snapshot. The infrastructure is a moving target. Without automated dependency tracking, continuous simulation, and a workflow that connects infrastructure changes to continuity documentation, the snapshot is always out of date.
KPMG found that 52% of organizations haven't even integrated their risk and resilience capabilities into a coordinated structure. That means more than half of organizations are maintaining plans in isolation from the systems those plans are supposed to protect.
If your BCP is a document that someone updates once a year, it's not a plan. It's a historical record. The question is whether you'll discover that during an annual review — or during the outage that the plan was supposed to prepare you for.
Related Reading
- Why Most BCP Software Still Can't Tell You What Breaks When Something Fails
- How to Keep BCP Documentation in Sync with Infrastructure (And Why It Never Is)
- Business Continuity Reports Are Mandatory. Why Are You Still Writing Them in Word?
- What Is Operational Resilience Modeling? From Compliance to Continuous Confidence
- What Is Infrastructure Dependency Mapping? A Complete Guide
- What Is a Minimum Viable Company — And Why Every CTO Needs to Define Theirs
- KPMG — Risk & Resilience Survey 2025
- Databarracks — Data Health Check 2025