Piloting an AI support agent in 90 days
Three checkpoints. Each one gates the next phase. A pilot done this way produces a credible executive decision at week 12: production-ready, or killed cleanly.
Most pilots fail for one of three reasons: overscoping, anchoring on the wrong metric, or no executive cover for an honest verdict.
Pilot scope: tight, not broad
A successful 90-day pilot targets one product line, one document set, or one query category. Not all three. Pick the area where success would produce the strongest signal and where failure would be most diagnostic.
Three scoping patterns that produce a strong week-12 signal:
- Worst document set first. The one producing the highest ticket volume or installer frustration. If the pilot handles your hardest content, rollout is a content-extension exercise.
- One high-pain query category. "Wiring questions on the X-series." Tight category, real volume, measurable resolution.
- One installer or service-tech audience. The persona under the most pressure or with the highest brand-defection risk. Their week-12 verdict is the most credible.
"Roll out across all product lines, all query types, all customer segments. Measure overall deflection rate." Two months in, every team reads it differently. No clean verdict.
"Deploy against X-series wiring docs for installers. Measure resolution on the top 20 query types from last quarter. Validate citation integrity on every response." One area, sharp criteria, clean verdict.
Mistakes that kill pilots
Anchoring on deflection from day one
You'll hit the number and miss the value. The primary metric is end-to-end resolution; deflection is downstream. (Full reframe: 2.2.)
No executive cover for an honest verdict
Support-team-only ownership produces a recommendation only the support team trusts. Pilots that succeed have an executive sponsor from week 0 with a pre-committed week-12 review.
Treating the pilot as procurement
Pilots scoped as joint projects with the vendor succeed. Pilots scoped as evaluations the vendor passes or fails stall.
Skipping content readiness
Pilots run against unaudited documentation frequently fail week 4 for content reasons, not agent reasons. Run the readiness rubric (3.2) before kickoff.
Pre-commit the week-12 review on your executive sponsor's calendar before week 0 kickoff. Send the invite during the kickoff meeting. A fixed decision date forces a credible verdict on schedule.
What it takes from your team
For a managed-service pilot, the manufacturer-side time commitment is smaller than most CS directors anticipate.
| Your role | Time over 90 days | What they do |
|---|---|---|
| Executive sponsor | ~4 hours total | Kickoff, week-4 check-in, week-12 go/no-go |
| CS director (project lead) | ~3 hours / week | Weekly vendor check-in, metric review, blocker triage |
| Documentation lead | ~8 hours total | Content-readiness review, gap identification at weeks 4 and 8 |
| Pilot user cohort | ~30 min / week each | Use the agent on real queries, week-8 interview |
How tightly the pilot is structured is what makes the verdict credible. A pilot that drifts past 90 days without a verdict gets cut at the next budget cycle, regardless of how the metrics look.
- Has an executive sponsor with the week-12 review pre-committed
- Scopes to one document set or query category
- Uses the Layer 1 metric set, not deflection as a north star
- Runs a content-readiness audit before week 0
- Produces a four-way verdict at week 12
- Treats the pilot as a joint vendor project