← Findings 🕐 14 min read

Findings

The COO's Pocket Card: Five Decisions Before Your First AI Pilot Launches — And Three Signs It Is Stalling

Brandon Sneider · March 2026

The CIO owns the tool. The COO owns the workflow the tool is supposed to improve.

Executive Summary

Every AI pilot requires five structural decisions before launch. Organizations that make these decisions explicitly achieve a 54% success rate versus 12% that skip them — a 4.5x difference entirely attributable to pre-launch discipline, not technology selection (Pertama Partners, n=2,400+ initiatives, 2025-2026).
The COO is the natural pilot owner at a mid-market company. Not the CIO (who manages infrastructure), not the CEO (who sets direction), and not a committee. The COO owns workflows, owns headcount, and owns the operational metrics that determine whether the pilot produced a business result or a technology demonstration.
Eaton Corporation, Cox Automotive, and Cisco all converged on the same structural insight in early 2026: “twenty pilots do not equal one transformation” (Fortune, March 2026). Cox Automotive has 20 AI solutions in production delivering measurable value — not because they piloted more, but because they structured each pilot with a production path before launch.
Three warning signs predict pilot failure with enough lead time to intervene. Each is detectable within 45 days. The organizations in the top 5% do not avoid stalling — they recognize the stall signal and respond within 6 weeks instead of 5 months (Pertama Partners, 2026).

Why the COO — Not the CIO — Owns the Pilot

The CIO owns the tool. The COO owns the workflow the tool is supposed to improve. That distinction determines whether the pilot produces a technology metric (“80% adoption”) or a business metric (“invoice processing cost dropped 40%”).

Deloitte’s State of AI in the Enterprise (n=3,235, August-September 2025) finds 37% of organizations use AI at a surface level with no process changes. McKinsey (n=1,993, July 2025) finds high performers are 2.8x more likely to have fundamentally redesigned workflows — not just deployed tools into existing processes. The workflow redesign is the COO’s job. The tool deployment is the CIO’s.

Fortune’s analysis of companies escaping “pilot purgatory” (March 2026) confirms the pattern: at Travelers, EVP/CTO Mojgan Lefebvre structures pilots with “cross-functional ownership from day one.” At Liberty Mutual, Global CIO Monica Caldas emphasizes “disciplined, business-led execution.” The keyword across every success story is business-led — and at a company with 200-2,000 employees, the business-led executive is the COO.

The CIO is a critical partner. The CIO handles vendor evaluation, security review, license procurement, and technical configuration. But the pilot charter, the success metric, and the kill decision belong to the person who owns the process being changed.

The Five Decisions

Every pilot needs five decisions made explicitly — on paper, with names attached — before the first license is distributed. These are not best practices. They are the structural preconditions that separate the 54% success rate from the 12% (Pertama Partners, n=2,400+, 2025-2026).

Decision 1: One Workflow, Spelled Out

The question: Which single business process will this pilot change?

Not “accounts payable.” Not “customer service.” One specific process with countable inputs, outputs, steps, and handoffs. “Processing vendor invoices from receipt through approval” or “routing inbound support tickets to the correct department within 4 hours.”

The discipline is specificity. Cisco’s Working with AI program reviewed 24 workflows and found an average of 30% of activities within each workflow could be augmented by AI (Fortune, March 2026). They did not pilot “AI across the organization.” They identified which 30% of which workflow.

Selection criteria that predict success:

Criterion	Strong Pilot Workflow	Weak Pilot Workflow
Volume	50+ instances per week	2-3 per month
Output	Countable unit (invoices, tickets, documents)	Subjective (“better decisions”)
Data	Digital inputs already exist	Tribal knowledge, no records
Stakes	Internal process, errors are fixable	Client-facing, errors are visible
Complexity	5-15 steps, 1-2 handoffs	30+ steps, multiple departments
Baseline	Current cost/time is known or measurable	Nobody tracks it today

If current cost and time are not measurable, the pilot cannot prove value. Establish the baseline before procuring the tool — not after.

Decision 2: One Metric That Connects to the P&L

The question: What specific number will this pilot move, and how does that number connect to revenue, cost, or margin?

“Increase adoption to 80%” is not a success metric. It is an activity metric. “Reduce average invoice processing cost from $18 to $11, saving $84,000 annually across 12,000 invoices” is a success metric.

McKinsey (n=1,600+, November 2025) identifies KPI tracking as the single strongest predictor of bottom-line impact from AI — yet fewer than 20% of organizations track KPIs for their AI tools. The 80% flying blind are not failing because AI does not work. They are failing because nobody defined what “working” means.

The metric must pass one test: can the CFO put this number on a slide? If the answer is no, refine until it is yes.

Three metrics that work for first pilots:

Metric	What It Tells You	When to Measure
Cost per outcome	Is AI cheaper than the current process?	Baseline + monthly
Cycle time	Is AI faster end-to-end (not just at one step)?	Baseline + weekly
Error/rework rate	Is AI more accurate (net of review time)?	Baseline + bi-weekly

Decision 3: A Named Owner With a Calendar Hold

The question: Who wakes up every morning thinking about whether this pilot is working?

Projects with sustained executive sponsorship achieve 68% success versus 11% when sponsorship lapses (Pertama Partners, 2026). The median time to sponsorship loss is six months — but the drift begins much earlier. Gallup (n=19,043, May 2025) finds clear leadership communication produces 4.7x more comfort with AI among employees. When the sponsor goes quiet, the organization interprets silence as permission to disengage.

The pilot owner is not the same as the executive sponsor. The executive sponsor is the COO or VP of Operations who provides air cover and clears cross-functional obstacles. The pilot owner is a senior individual contributor or manager who runs the day-to-day: weekly check-ins with the pilot team, friction log maintenance, metric tracking, and escalation to the sponsor when something stalls.

Both roles require calendar commitments, not job descriptions. The sponsor commits to 2 hours per week. The pilot owner commits to 4-6 hours per week. If neither person can identify where those hours come from, the pilot does not have real sponsorship — it has nominal approval.

Decision 4: A Production Path — Not Just a Pilot Plan

The question: If the pilot succeeds, what happens on Day 91?

This is the decision most organizations skip — and it is the most expensive omission. MIT Sloan finds 73% of successful AI pilots never reach production (2024). The average organization scraps 46% of proofs-of-concept before production (S&P Global, n=1,006, 2025). For every 33 pilots launched, roughly 4 graduate to production — a 12% conversion rate (IDC, 2025).

The production path does not need to be a detailed architecture document. It needs to answer four questions before the pilot launches:

Security and compliance: Has the CISO or IT security lead reviewed the tool for data handling, access controls, and regulatory compliance? If not, schedule the review for Week 2 — not Week 12.
Integration: What systems does this tool need to connect to at production scale? If the pilot runs on exported CSV files but production requires an ERP integration, the pilot is not testing production viability.
Cost model: What does production cost — not pilot cost? The pilot budget is 15-20% of Year 1 total cost (research in this corpus documents a 2.8-3.8x multiplier from pilot to production). If the CFO has only approved the pilot budget, the production conversation must happen before the pilot ends, not after.
Kill criteria: What specific outcomes at Day 90 would trigger termination? Define these before the pilot starts, when judgment is not clouded by sunk costs or political investment.

Johnson & Johnson’s portfolio discipline (Fortune, March 2026) starts with the business problem, not the technology — and the top 10-15% of initiatives generate roughly 80% of the impact. The production path is what separates a pilot that earns the right to scale from a pilot that consumes budget while proving nothing about operational viability.

Decision 5: A 90-Day Calendar With Three Gates

The question: When do you stop, check, and decide?

The 90-day prove-and-scale model that Fortune documents across Eaton, Cisco, and Cox Automotive structures pilots into three phases:

Phase	Timeline	Activity	Gate Decision
Prove	Days 1-30	Controlled deployment with 5-15 users on one workflow. Baseline established. Daily check-ins for first 2 weeks.	Day 30 gate: Is adoption above 40%? Is the metric moving? Continue or adjust.
Scale	Days 31-60	Expand to full pilot team (15-50 users). Workflow adjustments based on Month 1 friction log. Weekly check-ins replace daily.	Day 60 gate: Is cost-per-outcome declining toward baseline? Is adoption stabilizing above 50%? Continue, restructure, or prepare to kill.
Integrate	Days 61-90	Full workflow integration. Production readiness review with CIO/CISO. Training materials for broader rollout drafted.	Day 90 gate: Scale to production, restructure and re-pilot, or terminate.

Cisco piloted its Working with AI program in 4 weeks with 5 cross-functional teams, then scaled across the broader organization within 6 weeks (Fortune, March 2026). The speed was possible because the gate structure was defined before launch.

At each gate, ask three questions:

Is the P&L-connected metric improving?
Is adoption holding or growing?
Is the cost-per-outcome trajectory declining?

Three trends in the right direction: proceed to next phase. One metric lagging: diagnose and adjust the lagging dimension (usually workflow design or training — not the tool). All three flat or moving the wrong direction at Day 60: terminate at Day 90 unless the root cause is identified and fixable within 30 days.

Three Warning Signs the Pilot Is Stalling

The six failure archetypes documented in the research corpus (sponsorship fade, data mirage, workflow bypass, pilot trap, culture collision, measurement vacuum) produce predictable early-warning signals. Three are detectable early enough to intervene.

If the executive sponsor — the COO or VP of Operations — has not personally used the AI tool in the last 30 days, the pilot is running on borrowed time. BCG (n=10,635, June 2025) isolates leadership support as a 3.7x multiplier on employee AI sentiment. Employees watch what leaders do, not what leaders say.

The test: Can the sponsor describe, from personal experience, one task the tool does well and one it does poorly? If the answer is no, the sponsor is endorsing a tool they do not understand. The team knows it.

The intervention: The sponsor spends 30 minutes using the tool on a real task — not a demo, not a report about the tool. One real task. Then share the experience with the pilot team. This single action produces more adoption momentum than any training program.

Warning Sign 2: High Adoption, No Outcome Change (Detectable by Day 45)

Usage dashboards show 70% adoption. The pilot team reports the tool is “helpful.” But the target metric — cost per invoice, cycle time, error rate — has not moved. This is the Workflow Bypass pattern: AI accelerated one step, but the bottleneck shifted downstream. Individual speed increased. Organizational throughput did not.

Faros AI documented this exact pattern across 10,000+ developers: 21% more tasks completed per person, 98% more pull requests generated, zero improvement in organizational delivery speed (July 2025). The speed went into longer review queues, not faster outcomes.

The test: Compare the pilot metric (cost per outcome, cycle time) against baseline. If adoption is above 50% but the metric is flat after 45 days, the workflow was not redesigned — the tool was inserted into a broken process.

The intervention: Map where the output of the AI-assisted step goes next. Find the new bottleneck. Redesign that step — or accept that this particular workflow does not benefit from AI at the system level, regardless of how fast one step became.

Warning Sign 3: The Pilot Team Cannot State the Success Metric (Detectable by Day 14)

Ask three members of the pilot team: “What number does this pilot need to hit by Day 90 for the company to invest more?” If they give three different answers — or no answer — the measurement vacuum is already active.

Pertama Partners finds 73% of failed AI projects lack clear executive alignment on success metrics. The problem is not that the metric was never defined. It is that the metric was defined in a charter document that the pilot team never read, or was defined in terms (“improve efficiency”) that mean different things to different people.

The test: The pilot owner, the executive sponsor, and at least one pilot team member can independently state the same metric, the same target, and the same timeline. If they cannot, alignment does not exist — regardless of what the charter says.

The intervention: A 15-minute meeting where the sponsor states the metric, the target, and the timeline, then asks the team to repeat it back. Post it on the wall. Include it at the top of every weekly check-in. The metric becomes real when it is visible, not when it is written in a document nobody opens.

Key Data Points

Metric	Finding	Source
Success rate with pre-defined metrics vs. without	54% vs. 12% (4.5x)	Pertama Partners, n=2,400+, 2025-2026
Success rate with sustained executive sponsorship	68% vs. 11% without	Pertama Partners, n=2,400+, 2025-2026
Organizations capturing substantial AI value	5%	BCG, n=10,600, 2025
Successful pilots reaching production	27% (73% never reach production)	MIT Sloan, 2024
Proofs-of-concept scrapped before production	46%	S&P Global, n=1,006, 2025
Pilot-to-production conversion rate	12% (4 of 33)	IDC, 2025
High performers that redesigned workflows	2.8x more likely	McKinsey, n=1,993, July 2025
Organizations using AI at surface level (no process change)	37%	Deloitte, n=3,235, August-September 2025
Leadership support impact on employee AI sentiment	3.7x multiplier	BCG, n=10,635, June 2025
Individual task gains vs. organizational improvement	21% more tasks, 0% org throughput	Faros AI, n=10,000+, July 2025
Pilot-to-production cost multiplier	2.8-3.8x	Pertama Partners, 2026
Organizations tracking AI KPIs	Fewer than 20%	McKinsey, n=1,600+, November 2025
Cisco: workflows reviewed, % activities augmented	24 workflows, avg. 30% augmented	Fortune, March 2026
Cox Automotive: AI solutions in production	20 delivering measurable value	Fortune, March 2026

What This Means for Your Organization

The five decisions on this card are not a framework. They are a checklist. Print it, bring it to the meeting where someone proposes the first AI pilot, and do not approve the pilot until every line has an answer written next to it.

The structural advantage for a 200-2,000 person company is real. Cisco piloted in 4 weeks and scaled in 6. Mid-market companies move from pilot to production in 90 days where enterprises take 9 months or longer (MIT, 2024). But that speed advantage only materializes when the pilot is structured for production from day one — not when it is structured as an experiment with a vague hope of scaling later.

The three warning signs are early enough to act on. A sponsor who has not used the tool by Day 30 can start using it on Day 31. A workflow bypass detected at Day 45 can be redesigned by Day 60. A measurement vacuum visible at Day 14 can be closed in a single 15-minute meeting. None of these interventions are expensive. All of them require someone paying attention.

If this card raised questions about which workflow to select for the first pilot, how to design the production path, or how to structure the 90-day gates around the specific operations in your organization — that is the conversation worth having before the first dollar is spent. brandon@brandonsneider.com

Sources

Pertama Partners — “AI Project Failure Statistics 2026.” n=2,400+ enterprise AI initiatives, 2025-2026. Source for 54% vs. 12% metric success rate, 68% vs. 11% sponsorship impact, 2.8-3.8x cost multiplier, 73% lacking aligned metrics. Independent consulting analysis aggregating RAND, MIT Sloan, McKinsey, and Deloitte data. High credibility. https://www.pertamapartners.com/insights/ai-project-failure-statistics-2026
Fortune — “From Pilot Mania to Portfolio Discipline: How the Best Companies Are Escaping AI Purgatory.” March 19, 2026. Named case studies from Eaton, Cox Automotive, Cisco, Johnson & Johnson, Liberty Mutual, Travelers. Source for 90-day prove-and-scale model, Cisco’s 4-week pilot, Cox Automotive’s 20 production solutions, J&J’s 80/20 impact concentration. Independent business journalism. High credibility. https://fortune.com/2026/03/19/from-pilot-mania-to-portfolio-discipline-ai-purgatory/
McKinsey — “The State of AI in 2025.” n=1,993 respondents across 105 countries, June-July 2025. Source for 2.8x workflow redesign rate among high performers, <20% KPI tracking rate. Independent survey. High credibility. https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai
Deloitte — “State of AI in the Enterprise 2026.” n=3,235 senior leaders across 24 countries, August-September 2025. Source for 37% surface-level AI use. Independent survey. High credibility. https://www.deloitte.com/us/en/what-we-do/capabilities/applied-artificial-intelligence/content/state-of-ai-in-the-enterprise.html
BCG — “AI at Work 2025.” n=10,600+ workers, 11 countries. Only 5% of organizations achieving substantial AI returns. Independent survey. High credibility. https://www.bcg.com/publications/2025/ai-at-work-momentum-builds-but-gaps-remain
MIT Sloan — Pilot-to-production conversion research. 2024. Source for 73% of successful pilots never reaching production. Academic institution. Very high credibility.
S&P Global 451 Research — Voice of the Enterprise: AI & Machine Learning, Use Cases 2025. n=1,006, March 2025. Source for 46% proof-of-concept scrapping rate. Independent analyst. High credibility. https://www.spglobal.com/market-intelligence/en/news-insights/research/ai-experiences-rapid-adoption-but-with-mixed-outcomes-highlights-from-vote-ai-machine-learning
IDC — AI pilot-to-production conversion rate (4 of 33). 2025. Source for 12% production conversion rate. Independent analyst. High credibility.
BCG — “AI at Work: Momentum Builds, but Gaps Remain.” n=10,635 across 11 countries, June 2025. Source for 3.7x leadership support multiplier. Independent survey. High credibility. https://www.bcg.com/publications/2025/ai-at-work-momentum-builds-but-gaps-remain
Faros AI — “The AI Productivity Paradox.” n=10,000+ developers, 1,255 teams, July 2025. Source for 21% individual task gains with zero organizational throughput improvement. Vendor but observational telemetry data. High credibility. https://www.faros.ai/blog/ai-software-engineering
Gallup — “State of the Global Workplace 2025.” n=19,043, May 2025. Source for 4.7x leadership communication impact on employee AI comfort. Independent survey. Very high credibility. https://www.gallup.com/workplace/349484/state-of-the-global-workplace.aspx
Entrepreneur — “Why So Many AI Pilots Stall — and How Winners Break Through.” March 2026. Source for three executive mistakes (no metrics, avoiding understanding, treating AI as shortcut). Independent journalism. Moderate credibility. https://www.entrepreneur.com/science-technology/why-so-many-ai-pilots-stall-and-how-winners-break/502325

Brandon Sneider | brandon@brandonsneider.com March 2026