Piloting an AI support agent in 90 days

Most pilots fail for one of three reasons: overscoping, anchoring on the wrong metric, or no executive cover for an honest verdict.

Week 4 fast-kill. Week 8 user validation. Week 12 executive go/no-go with a four-way verdict.

Pilot scope: tight, not broad

A successful 90-day pilot targets one product line, one document set, or one query category. Not all three. Pick the area where success would produce the strongest signal and where failure would be most diagnostic.

Three scoping patterns that produce a strong week-12 signal:

Worst document set first. The one producing the highest ticket volume or installer frustration. If the pilot handles your hardest content, rollout is a content-extension exercise.
One high-pain query category. "Wiring questions on the X-series." Tight category, real volume, measurable resolution.
One installer or service-tech audience. The persona under the most pressure or with the highest brand-defection risk. Their week-12 verdict is the most credible.

Scope that kills the project

"Roll out across all product lines, all query types, all customer segments. Measure overall deflection rate." Two months in, every team reads it differently. No clean verdict.

Scope that produces a decision

"Deploy against X-series wiring docs for installers. Measure resolution on the top 20 query types from last quarter. Validate citation integrity on every response." One area, sharp criteria, clean verdict.

Mistakes that kill pilots

Anchoring on deflection from day one

You'll hit the number and miss the value. The primary metric is end-to-end resolution; deflection is downstream. (Full reframe: 2.2.)

No executive cover for an honest verdict

Support-team-only ownership produces a recommendation only the support team trusts. Pilots that succeed have an executive sponsor from week 0 with a pre-committed week-12 review.

Treating the pilot as procurement

Pilots scoped as joint projects with the vendor succeed. Pilots scoped as evaluations the vendor passes or fails stall.

Skipping content readiness

Pilots run against unaudited documentation frequently fail week 4 for content reasons, not agent reasons. Run the readiness rubric (3.2) before kickoff.

Pilot setup move

Pre-commit the week-12 review on your executive sponsor's calendar before week 0 kickoff. Send the invite during the kickoff meeting. A fixed decision date forces a credible verdict on schedule.

What it takes from your team

For a managed-service pilot, the manufacturer-side time commitment is smaller than most CS directors anticipate.

Your role	Time over 90 days	What they do
Executive sponsor	~4 hours total	Kickoff, week-4 check-in, week-12 go/no-go
CS director (project lead)	~3 hours / week	Weekly vendor check-in, metric review, blocker triage
Documentation lead	~8 hours total	Content-readiness review, gap identification at weeks 4 and 8
Pilot user cohort	~30 min / week each	Use the agent on real queries, week-8 interview

How tightly the pilot is structured is what makes the verdict credible. A pilot that drifts past 90 days without a verdict gets cut at the next budget cycle, regardless of how the metrics look.

What good looks like

A 90-day pilot scoped to produce a real decision:

Has an executive sponsor with the week-12 review pre-committed
Scopes to one document set or query category
Uses the Layer 1 metric set, not deflection as a north star
Runs a content-readiness audit before week 0
Produces a four-way verdict at week 12
Treats the pilot as a joint vendor project

Next · Chapter 3.2

AI-readable content for the agent-majority web