cutover-community
Blog
April 28, 2026

Why Major Incident Management Fails to Connect the Dots - and How We Can Fix It

The real secret in major incident management isn't what we document. It's what we don't. As @signulll put it recently: "Secrets are actually the connections between ideas, not the ideas themselves. I can publish all the nodes & still own the graph."

We already publish the nodes -  Logs, tickets, post-mortems, monitoring dashboards, alert histories, and so on. But it's the graph, the tacit, high-context connections that say "this query on that alert pattern + that recent change data + this dependency + tacit team knowledge of the app + infra historic insights = here's how we actually killed the P1 incident in <15 min" stays locked and nowhere to be found.

That unspoken graph is exactly why:

  • Human on-call teams still burn minutes (or hours) in the first critical phase of an incident
  • AI agents, despite ingesting every runbook and log, still can't meaningfully accelerate mitigation
  • Mean Time To Mitigation (MTTM) refuses to move, despite all the tools and retrospectives

We're not keeping secrets from our competitors. We're keeping the secret from our own teams and in the future our own agents.

The fix isn't more documentation of facts.  It's making the connections explicit.  Specifically, turning tacit knowledge into a living, queryable, updatable task graph that both humans and agentic systems can traverse in real time.

When we finally do that, MTTM doesn't just improve. It becomes predictable.

Who else is seeing this gap?  Are you actively mapping the tacit edges, or still relying on "the humans just know"?

I would love to hear how you're bridging this for your human + agentic teams.

Ky Nichol
CEO
Major incident management
Latest blog posts