All insights
Delivery

How to run an AI pilot that does not turn into a demo

6 min read

The most common failure pattern in private AI is a pilot that ships, impresses leadership, and quietly never becomes a real system. Here is how to keep that from happening.

Demos answer the wrong question

A demo answers "can the model do this?" That question is uninteresting in 2026. The interesting question is "does this system meet criteria the business cares about, under conditions the security team can review, with an operating model the internal team can sustain?" The pilot has to be designed to answer that second question, not the first.

The most common failure pattern in private AI engagements is a pilot that produces an impressive ten-minute walkthrough but no acceptance criteria, no controls documentation, no operational runbook, and no internal owner. Three months after the launch celebration, the environment is forgotten. The team has moved on. Leadership remembers the demo and assumes the capability is in place. It is not.

What makes a pilot a real pilot

Four things distinguish a real pilot from a demo dressed up as one. First, acceptance criteria written in advance, agreed with the business owner, and tested against actual workload — not curated examples. Second, controls implemented and documented at the same level of rigor the production rollout will require — not deferred because "this is just a pilot." Third, an internal owner named at the start of the engagement, present for design decisions, and responsible for validation and adoption. Fourth, an honest evaluation at the end, including the cases where the system did not perform well.

The acceptance criteria are the most important of the four. They should be specific enough that a reasonable observer can tell whether the system passed or failed without arguing about it. "The assistant answers compliance policy questions correctly" is not acceptance criteria. "The assistant answers 90% of a sampled set of 50 compliance questions with citations that match the source document, evaluated by a named reviewer, with a documented disagreement process for the cases that fail" is acceptance criteria.

The internal owner is the most fragile of the four. Engagements that proceed without a named owner — or with an owner who is a placeholder — produce pilots that nobody adopts after launch. Naming the owner is not enough; the owner has to be present for design decisions, has to validate the build, and has to be in a position to make adoption happen across the user population.

What to ship from a pilot

A real pilot produces five things at the end: a working environment, written acceptance test results, control documentation, an operating runbook, and a recommendation about whether to proceed to production rollout. The recommendation should be honest — not every pilot deserves to become a production system, and an engagement that ends with "this should not move forward" is still a successful engagement.

A pilot that produces only the working environment, with everything else deferred to the rollout phase, has not actually produced a pilot. It has produced a demo with a deployment URL.

The discipline that distinguishes a pilot from a demo costs more in the short term. It is also the reason the resulting system survives the second quarter. Engagements designed around demos are cheaper to scope and more expensive to outlive.