03.1 How to deploy and operate it

Piloting an AI support agent in 90 days

Three checkpoints. Each one gates the next phase. A pilot done this way produces a credible executive decision at week 12: production-ready, or killed cleanly.

Chapter 3.17 min readDeploy and operate

Most pilots fail for one of three reasons: overscoping, anchoring on the wrong metric, or no executive cover for an honest verdict.

THE 90-DAY PILOT Three checkpoints. Each gates the next phase. 4 WEEK 4 · DOES IT WORK ON OUR CONTENT? Fast-kill checkpoint. If no, you save eight weeks. Answer rate on 20 real symptom-to-fix queries Source-pinned rate (each answer cites a page) Visual retrieval on 3 diagram-based queries Revision correctness on a paired battery 8 WEEK 8 · DOES IT WORK FOR OUR USERS? Behavior validation in front of real installers and techs. Layer 1 micro-CSAT (thumbs on every answer) Query reformulation / bounce-after-failure Escalation quality (handoff lands with context) Qualitative interviews with three pilot users 12 WEEK 12 · SHOULD WE GO TO FULL ROLLOUT? Executive go/no-go. A four-way verdict with all three metric layers reporting. FULL ROLLOUTmetrics support production EXTENDone dimension needs more time RESCOPEagent works, scope was wrong KILLTrust-tier failures or no fit
Week 4 fast-kill. Week 8 user validation. Week 12 executive go/no-go with a four-way verdict.
01

Pilot scope: tight, not broad

A successful 90-day pilot targets one product line, one document set, or one query category. Not all three. Pick the area where success would produce the strongest signal and where failure would be most diagnostic.

Three scoping patterns that produce a strong week-12 signal:

  • Worst document set first. The one producing the highest ticket volume or installer frustration. If the pilot handles your hardest content, rollout is a content-extension exercise.
  • One high-pain query category. "Wiring questions on the X-series." Tight category, real volume, measurable resolution.
  • One installer or service-tech audience. The persona under the most pressure or with the highest brand-defection risk. Their week-12 verdict is the most credible.
Scope that kills the project

"Roll out across all product lines, all query types, all customer segments. Measure overall deflection rate." Two months in, every team reads it differently. No clean verdict.

Scope that produces a decision

"Deploy against X-series wiring docs for installers. Measure resolution on the top 20 query types from last quarter. Validate citation integrity on every response." One area, sharp criteria, clean verdict.

02

Mistakes that kill pilots

1

Anchoring on deflection from day one

You'll hit the number and miss the value. The primary metric is end-to-end resolution; deflection is downstream. (Full reframe: 2.2.)

2

No executive cover for an honest verdict

Support-team-only ownership produces a recommendation only the support team trusts. Pilots that succeed have an executive sponsor from week 0 with a pre-committed week-12 review.

3

Treating the pilot as procurement

Pilots scoped as joint projects with the vendor succeed. Pilots scoped as evaluations the vendor passes or fails stall.

4

Skipping content readiness

Pilots run against unaudited documentation frequently fail week 4 for content reasons, not agent reasons. Run the readiness rubric (3.2) before kickoff.

Pilot setup move

Pre-commit the week-12 review on your executive sponsor's calendar before week 0 kickoff. Send the invite during the kickoff meeting. A fixed decision date forces a credible verdict on schedule.

03

What it takes from your team

For a managed-service pilot, the manufacturer-side time commitment is smaller than most CS directors anticipate.

Your roleTime over 90 daysWhat they do
Executive sponsor~4 hours totalKickoff, week-4 check-in, week-12 go/no-go
CS director (project lead)~3 hours / weekWeekly vendor check-in, metric review, blocker triage
Documentation lead~8 hours totalContent-readiness review, gap identification at weeks 4 and 8
Pilot user cohort~30 min / week eachUse the agent on real queries, week-8 interview

How tightly the pilot is structured is what makes the verdict credible. A pilot that drifts past 90 days without a verdict gets cut at the next budget cycle, regardless of how the metrics look.

What good looks like
A 90-day pilot scoped to produce a real decision:
  • Has an executive sponsor with the week-12 review pre-committed
  • Scopes to one document set or query category
  • Uses the Layer 1 metric set, not deflection as a north star
  • Runs a content-readiness audit before week 0
  • Produces a four-way verdict at week 12
  • Treats the pilot as a joint vendor project
Next · Chapter 3.2
AI-readable content for the agent-majority web
Get started

Want help scoping a 90-day pilot?

We help you pick the document set most likely to produce a credible week-12 verdict.

Talk to us →