The Metric to Anchor Your Agentic SOC Evaluation On

Triage speed is the measurable win. Detection completeness is the next layer. Here's the metric to anchor your evaluation on.

AI is fundamentally changing how we run security operations. I see it inside our own SOC every day. Analysts are going deeper on the work that matters because the routine work moves faster. The agentic wave is the next step in that change, and there's real promise in it. But the way the industry evaluates agentic SOC products hasn't caught up with what those products are doing. That's the gap I want to walk through here.

The category is moving fast and the technology is genuinely impressive. There's one question that, once it anchors how we evaluate these products, makes the difference between picking a triage tool and picking a detection partner. Getting that question into the evaluation conversation is how we make sure this wave delivers its full value.

What the industry-level data does and doesn't tell us

Mandiant's M-Trends 2026 report is based on more than 500,000 hours of incident response work from 2025. The global median dwell time, days from first attacker foothold to the moment someone noticed, came in at 14 days. Up from 11 in 2024. Up from 10 in 2023.

That number needs context before you draw conclusions from it. The 2025 jump isn't a clean technology failure story. Espionage operations and DPRK IT-worker campaigns dragged the median up significantly, running a 122-day median on their own. Cases where organizations detected intrusions internally improved to 9 days. Strip out the long-dwell espionage tail and the picture looks meaningfully better.

What the data does tell us is that the industry-level dwell-time metric is complicated, and that individual product evaluation needs a more direct measurement than a single aggregate number. The 9-day internal-detection figure still means an attacker had over a week in your environment before anyone knew. External notification cases landed at 25 days. That 16-day gap between "we caught it" and "someone told us" is the cost of incomplete detection coverage. Triage speed doesn't close it.

The IBM 2025 Cost of a Data Breach Report puts mean time to identify at around 181 days globally, down from 194 the year before. That's a broader self-reported population, not IR engagement data like Mandiant's. They're not measuring the same thing. But the directional signal is consistent: detection completeness is where the next layer of value lives.

Dwell time is the metric. Days from foothold to detection. Not how fast you closed a ticket. Not how many phish you triaged before lunch. Foothold to detection. Keep that in mind for the next section.

What we measure today, what we don't measure yet

The agentic-SOC category has standardized on a set of performance metrics that are easy to measure cleanly: per-ticket investigation time, alert closure rate, response latency on already-detected incidents. These are real metrics measuring real work, and the wins on them are operationally meaningful.

What that set of metrics doesn't tell you is whether the product finds the intrusion that wasn't in its detection rules yet. The attacker who spent eleven days in your environment before a single alert fired. The low-confidence signal that pattern-matches to an emerging TTP across three different systems. That's a different kind of measurement than triage speed, and it's the one the category hasn't fully built out yet.

Here's the distinction, side by side:

Metric categoryWhat it measuresWhat it tells you about the product
Triage-speed metrics (broadly available)How fast the system processes alerts that already firedHow efficient your SOC becomes at handling known signal
Detection-completeness metrics (still maturing)Whether the system surfaces threats it didn't already have a rule forWhether the product is meaningfully shortening attacker dwell time


Both matter. Triage-speed metrics are the more tractable measurement, and they're where the category has built strong baselines. Detection-completeness metrics are harder. They require longitudinal customer data, before-and-after comparisons, and a willingness to attribute outcomes to a single tool in a multi-tool environment.

That structural difficulty is why so few products publish detection-completeness data today. The measurement infrastructure for the category hasn't caught up to what the technology is now capable of doing. As the category matures, that's the next layer of measurement to expect.

Where AI triage is winning, and what comes next

I'm not here to hedge on whether AI in the SOC works. I run a SOC that uses it and I've seen what it does well. AI and ML are fundamentally changing detection and response operations, and the wins are real.

Alert volume is a genuine operational problem. Tier-1 deterministic triage, the mechanical work of enriching and closing known-pattern alerts, burns analysts out and crowds out the higher-order work. Automating that layer matters. With NightBeacon, we brought per-ticket investigation time down from 20–30 minutes to roughly two minutes. That's a significant operational win, and it's the kind of efficiency gain the category is correctly proud of. It's why analysts are going deeper on the threats that need them.

That said, there's a meaningful distinction between triage speed and detection completeness, and understanding it helps you get full value from the agentic wave. Our NightBeacon number sits in the triage-speed category. It measures speed on alerts the system already had. It doesn't measure whether we found the low-confidence signal across three systems that pattern-matches to an emerging TTP. Those are different capabilities.

An agent that closes false positives faster does not automatically also find the intrusion you don't have a detection rule for yet. The M-Trends 16-day gap between internal detection and external notification is a detection-completeness problem. AI triage is solving the efficiency layer. The next layer is detection coverage, and that's where the most interesting agentic-SOC development is heading.

Gartner predicts more than 40% of agentic AI projects will be canceled by end of 2027, citing escalating costs, unclear business value, and inadequate risk controls. The way to make sure the right 60% survives is to measure outcomes that map to the actual threat: how many days did the attacker have before detection, and did that number go down?

The one question to anchor your evaluation

You're going to be in evaluation conversations soon. The presentations will look good. Here's the question to anchor on:

"Show me your customers' median dwell time before deployment and after. Not your alert closure rate. Not your MTTI on tickets the system already had. Dwell time."

Pay attention to what happens next.

If the vendor has the data, even partial, honestly framed, that's a vendor investing in measuring the right things. They're worth a deeper conversation. It means their product roadmap is aimed at the outcome that matters.

If they pivot to triage speed, they're likely early on the measurement maturity curve, which is where most of the category is right now. File the question and ask it again at renewal. The triage wins are real; this is about what the next version of the product should be building toward.

If they say "dwell time is a lagging indicator that's hard to attribute to a single tool," they're being honest about a genuinely hard problem. Attribution is hard. That's a positive signal about how they think. Then ask them what they're building toward on that dimension, because the vendors who acknowledge the measurement challenge and are actively working on it are the ones worth watching.

The answer you get back will tell you more about how the product is being built than the demo will. Products built around that metric are the ones most likely to deliver on what this category can genuinely do: meaningfully shorten the time between an attacker's first move and the moment someone stops them. That outcome is worth investing in carefully.

_______

Next in the series: Attackers Went Agentic First. The adversary side of this shift is already shipping: phishing-as-a-service kits with LLM-assisted social engineering, agent-orchestrated initial access, the 22-second handoff from foothold to lateral movement.