Against 100% automation
Every AI vendor pitches it. Every responsible enterprise should refuse it. The case for confidence-floor escalation as a design constraint, not a quality-of-life feature.
Open any AI-vendor pitch deck. There’s a slide with a number on it. 90%. 95%. 100%. The number is ‘decisions made without a human in the loop’. It’s framed as the destination. We think it’s the wrong destination — and accepting it as the goal is the architectural decision that wrecks an otherwise-promising AI project six months in.
The argument for 100% automation, briefly
Humans are slow and expensive. The model is fast and cheap. If the model is right enough of the time, why slow it down?
This is correct as far as it goes. The problem is the ‘right enough of the time’ part. In a regulated workflow, the cost of the model being wrong is not symmetric. A wrongly-approved claim costs the insurer real money and a complaint. A wrongly-denied claim costs the customer the procedure they needed and the insurer their reputation when the appeal goes public. A wrongly-cleared AML alert costs the bank a regulatory fine and possibly a CISO.
The cost of being wrong about a decision is not the cost of one decision. It’s the cost of the worst plausible failure mode multiplied by how often it can happen.
Why the confidence floor matters more than the model
The architectural answer is to set a configured confidence threshold — the floor — below which decisions force-route to a human reviewer, regardless of what the model said. If the model is confident, the decision goes through. If the model is uncertain, the case routes to a human with the model’s recommendation, rationale, and citations attached. The reviewer isn’t starting from scratch; they’re reviewing the work the model already did.
The point isn’t to remove humans from the loop. It’s to remove them from the decisions where they don’t add value, and concentrate them on the decisions where they do.
This is a primitive, not a feature. It doesn’t depend on the model volunteering its uncertainty correctly. It depends on the architecture force-routing low-confidence cases regardless. The model’s confidence score is an input; the routing decision is the system’s.
What this means for the touchless-rate conversation
‘Touchless rate’ is a useful metric. The number that matters is the touchless rate above your confidence floor, not the overall touchless rate. Two systems that report ‘65% touchless’ are very different if one of them is including cases where it was wrong but didn’t know.
What we tell prospects: pick the floor first, then measure the rate. The floor is a risk-management decision, not a technology decision. It comes from the medical director, the credit head, the compliance officer — the person whose name is on the line when things go wrong. The touchless rate is what falls out, and it should be tuned per drug class, per loan product, per alert typology.
The honest framing
Promising 100% automation in regulated workflows is either over-fitting to a demo or hiding the failure modes. The honest version of the AI promise is:
- A meaningful share of decisions handled without a human reviewer — and that share is tuned per workflow, per risk class.
- The rest routed to your existing reviewers with the agent’s work attached — so the reviewer’s time goes to judgement, not data extraction.
- The audit chain on every decision — touchless or escalated — dense enough for regulators years later.
That’s the system that scales. The 100% claim is the one that gets pulled from production after the first board-level question.