Executive Summary
- The #2 reason AI pilots fail — after selecting the wrong task — is launching without success criteria. Organizations that define metrics before deployment achieve a 54% success rate versus 12% that do not, a 4.5x difference created entirely by measurement discipline (Pertama Partners, n=2,400+ initiatives, 2025-2026).
- Fewer than 20% of organizations track KPIs for their AI tools — yet McKinsey identifies KPI tracking as the single strongest predictor of bottom-line impact (n=1,600+, November 2025). The 80% flying blind are not failing because AI does not work. They are failing because they cannot tell whether it does.
- This card gives every executive in the room three numbers to track from Day 1. Not a dashboard of 15 KPIs. Not a measurement framework that requires a data team. Three numbers, one spreadsheet, 15 minutes per week. The same three numbers that separate the 5% capturing value from the 95% running pilots that quietly die.
Why Three Numbers — Not Fifteen
Measurement frameworks with a dozen KPIs produce two outcomes: paralysis or abandonment. A 200-person company does not have a dedicated AI analytics team. The CIO already manages 14 vendor dashboards. Adding a complex AI measurement suite is the fastest way to ensure nobody measures anything.
The Pertama Partners analysis of 2,400+ enterprise AI initiatives (2025-2026) reveals that the discriminating factor is not sophistication of measurement — it is existence of measurement. Pre-defined success criteria of any kind produce 4.5x better outcomes. The three metrics below are deliberately simple because simple gets tracked, and tracked gets managed.
Each metric answers one question a CEO will ask within 90 days:
| Metric | The CEO Question | When It Matters |
|---|---|---|
| Adoption rate | “Are people actually using this?” | Week 1 onward |
| Time saved per task | “Is it working?” | Week 3 onward |
| Cost-per-outcome trajectory | “Is it worth what we’re paying?” | Month 2 onward |
Metric 1: Adoption Rate — “Are People Using It?”
What to measure: The percentage of licensed users who performed a meaningful AI interaction in the past seven days. Not logins. Not “opened the tool.” Meaningful use: generated a draft, ran an analysis, completed a task with AI assistance.
How to measure it: Most enterprise AI tools report weekly active users in their admin console. For M365 Copilot, check the Microsoft 365 admin center usage reports. For standalone tools, pull OAuth session data from the identity provider. If neither is available, run a 3-question weekly Slack or email survey: “Did you use [tool name] this week? For what task? Did it save time?”
Benchmarks at each gate:
| Timeframe | Healthy | Warning | Critical |
|---|---|---|---|
| Day 30 | 40-60% of licensed users active | 20-40% | Below 20% |
| Day 60 | 50-70% with 15-25% in daily use | 30-50% | Below 30% |
| Day 90 | 60-75% with shift from light to heavy usage | 40-60% | Below 40% |
These thresholds are drawn from Nebuly’s enterprise AI benchmarks (2025) and Worklytics adoption data across industries (2025). The 40-60% Day 30 range represents the minimum viable activation threshold — below it, the pilot is already stalling.
What adoption rate does not tell you: That the tool is creating value. High adoption with no productivity gain is the pattern Faros AI documented across 10,000+ developers: 21% more tasks completed per person, zero improvement in organizational throughput (Faros AI, n=10,000+, July 2025). Adoption confirms people are using the tool. It does not confirm the tool is working. That is what Metric 2 answers.
The manager adoption signal: Worklytics benchmarks show that manager adoption should run 1.2-1.5x the individual contributor rate. When managers adopt AI tools, their teams follow. When managers do not use the tools they asked their teams to adopt, the signal is unmistakable — and adoption stalls within 60 days.
Metric 2: Time Saved Per Task — “Is It Working?”
What to measure: The net minutes saved per completed task, after subtracting review and correction time. Not gross AI speed. Net human time savings.
This distinction is the difference between a real metric and a vendor demo. Workday’s 2026 data shows that 37-40% of AI time savings are consumed by reviewing, correcting, and verifying AI output. A process that takes 60 minutes manually and 15 minutes with AI but 25 minutes to review produces a net savings of 20 minutes — not 45.
How to measure it: Select 3-5 high-frequency tasks from the pilot workflow. For each:
| Step | Action | Tool |
|---|---|---|
| 1 | Document the pre-AI baseline time (from the baseline sprint before deployment) | Stopwatch, time-tracking tool, or manager estimate |
| 2 | Log AI-assisted completion time for 10 instances | Same method |
| 3 | Log review/correction time for those same 10 instances | Same method |
| 4 | Calculate: Baseline - (AI time + Review time) = Net savings | Spreadsheet |
Benchmarks:
| Task Category | Expected Net Savings (90 days) | Source |
|---|---|---|
| Meeting summaries | 20-26 min/day | UK CDDO trial, n=20,000, 2025 |
| Invoice processing | 60-80% cost reduction per unit | APQC benchmarks, 2025 |
| Tier-1 customer tickets | 14% more resolved per hour | Brynjolfsson/Stanford RCT, n=5,179, 2025 |
| Boilerplate code | 25-35% speed improvement | DORA, ~5,000 developers, 2025 |
| Contract clause review | 70-80% time reduction | Concord/industry, 2025 |
| Email drafting | 30-50% time reduction | Microsoft/Forrester, 2025 |
The 11-minute threshold: Microsoft’s research identifies 11 minutes of daily time savings as the point where users begin perceiving real productivity benefit. Below this threshold, the tool feels like overhead rather than help. If Metric 2 shows less than 11 minutes per day in net savings, investigate whether the AI is being applied to the right tasks — not whether the tool is broken.
What time saved does not tell you: That the savings are worth the cost. A tool that saves 15 minutes per person per day across 50 users saves 625 hours per month. If the tool costs $50,000 per month and those hours are not redirected to revenue-generating or cost-reducing work, the math does not close. That is what Metric 3 answers.
Metric 3: Cost-Per-Outcome Trajectory — “Is It Worth What We’re Paying?”
What to measure: The fully loaded cost to produce one unit of output in the target workflow, tracked monthly, compared against the pre-AI baseline.
This is the metric that connects to the P&L. It is also the metric most organizations skip — which is why 72% of CIOs report breaking even or losing money on AI investments (Gartner, n=506 CIOs, May 2025) and only 29% of executives say they can measure ROI confidently (IBM, 2025-2026).
How to calculate it:
Step 1: Define the “outcome.” This must be a countable unit of completed work, not activity.
| Workflow | Outcome Unit | Not This (Activity) |
|---|---|---|
| Accounts payable | Processed invoice | “Used the AI tool” |
| Customer service | Resolved ticket | “AI-assisted interaction” |
| Contract review | Reviewed contract | “Documents scanned” |
| Content creation | Published piece | “Drafts generated” |
| Software development | Merged pull request | “Lines of code” |
Step 2: Calculate the fully loaded cost per outcome.
Cost per outcome = (Labor cost + AI tool cost + Review/correction cost + Management overhead) ÷ Number of completed outcomes
The denominator is outcomes, not interactions. An AI that generates 50 drafts but produces 10 usable pieces costs 5x what the per-interaction metrics suggest.
Step 3: Compare to baseline and track the trajectory.
| Month | Pre-AI Cost/Unit | AI-Assisted Cost/Unit | Trajectory |
|---|---|---|---|
| Baseline | $18.00 | — | — |
| Month 1 | — | $22.00 | ↑ Higher (expected: learning curve) |
| Month 2 | — | $14.50 | ↓ Declining (on track) |
| Month 3 | — | $11.00 | ↓ Declining (healthy) |
Why trajectory matters more than the number: Month 1 will almost always show costs rising. Training time, workflow disruption, the review tax, and simple learning curves push costs up before they come down. This is normal. The Pertama Partners data shows a 2-4 week productivity dip is expected during AI adoption. The critical signal is the slope from Month 1 to Month 3. A declining trajectory means the organization is learning. A flat or rising trajectory at Month 3 means something is structurally wrong — the task selection, the workflow design, or the training investment.
The true cost multiplier: Remember that license fees represent 10-20% of actual AI deployment cost (DX Research/Atlan, 2025). Year 1 total cost typically runs 2.5x the license fee. The cost-per-outcome calculation must include the full cost, not just the line item on the software invoice. If the CFO is tracking only license spend, the metric is reporting a fiction.
What cost-per-outcome tells the CEO: Whether to scale, restructure, or kill the pilot at Day 90. A declining cost-per-outcome trajectory with healthy adoption is the green light for production investment. A flat trajectory with high adoption means the tool works but the workflow was not redesigned — the most common fixable failure mode. A rising trajectory with declining adoption is the signal to terminate.
The One-Page 90-Day Tracking Sheet
Copy this to a spreadsheet. Update weekly. Bring it to every pilot review meeting.
| Week | Adoption Rate (% active) | Time Saved/Task (net min) | Cost/Outcome ($) | Notes |
|---|---|---|---|---|
| Baseline | — | — | $_____ | Pre-deployment measurement |
| Week 1 | ___% | ___min | $_____ | |
| Week 2 | ___% | ___min | $_____ | |
| Week 3 | ___% | ___min | $_____ | |
| Week 4 | ___% | ___min | $_____ | Day 30 gate review |
| Week 5 | ___% | ___min | $_____ | |
| Week 6 | ___% | ___min | $_____ | |
| Week 7 | ___% | ___min | $_____ | |
| Week 8 | ___% | ___min | $_____ | Day 60 gate review |
| Week 9 | ___% | ___min | $_____ | |
| Week 10 | ___% | ___min | $_____ | |
| Week 11 | ___% | ___min | $_____ | |
| Week 12 | ___% | ___min | $_____ | Day 90 decision: scale / restructure / kill |
At each gate, ask three questions:
- Is adoption rising, flat, or falling?
- Is net time saved per task increasing or decreasing?
- Is cost per outcome declining toward or below baseline?
Three rising or declining trends in the right direction: scale. Mixed signals with one metric lagging: restructure the lagging dimension (usually workflow design or training). All three flat or moving the wrong direction: terminate and reallocate budget.
Key Data Points
| Metric | Finding | Source |
|---|---|---|
| Success rate with pre-defined metrics vs. without | 54% vs. 12% (4.5x) | Pertama Partners, n=2,400+, 2025-2026 |
| Organizations tracking AI KPIs | Fewer than 20% | McKinsey, n=1,600+, November 2025 |
| CIOs breaking even or losing money on AI | 72% | Gartner, n=506, May 2025 |
| Executives who can measure AI ROI confidently | 29% | IBM, 2025-2026 |
| AI time savings consumed by review/correction | 37-40% | Workday, 2026 |
| Daily time savings threshold for perceived benefit | 11 minutes | Microsoft, 2025 |
| True cost vs. license fee | 2.5x Year 1 (license = 10-20% of total) | DX Research/Atlan, 2025 |
| Individual task gains vs. organizational improvement | 21% more tasks, 0% org throughput | Faros AI, n=10,000+, 2025 |
| Employees saving 1-7 hours/week with AI | 85% | Workday, February 2026 |
| Organizations that can link EBIT impact to AI | Only 5.5% report >5% EBIT | McKinsey, n=1,600+, November 2025 |
What This Means for Your Organization
The pattern across 2,400+ enterprise AI initiatives is unambiguous: the organizations that define success before they deploy AI are 4.5x more likely to achieve it. Not because the metrics themselves are magic — but because the act of defining them forces three decisions most companies skip. Which workflow are you targeting? What does that workflow cost today? What would success look like in 90 days? Companies that answer these questions before writing a purchase order make fundamentally different deployment decisions than companies that answer them at the quarterly review.
The three metrics on this card are not the only things worth measuring. At six months, you should track capacity reallocation — what the organization did with the time saved. At twelve months, you should measure P&L impact at the process level. But in the first 90 days, three numbers are enough. More metrics at this stage create the illusion of rigor while diluting focus. Track adoption, time saved, and cost per outcome. If all three trend in the right direction, you have earned the right to invest in a more sophisticated measurement infrastructure.
If this card raised questions about which workflow to baseline, how to calculate your cost per outcome, or how to design the 90-day gate reviews around these metrics — that is exactly the conversation worth having early. brandon@brandonsneider.com.
Sources
-
Pertama Partners — “AI Project Failure Statistics 2026.” n=2,400+ enterprise AI initiatives, 2025-2026. Source for 54% vs. 12% metric success rate, overall failure patterns. Independent consulting analysis aggregating RAND, MIT Sloan, McKinsey, and Deloitte data. High credibility. https://www.pertamapartners.com/insights/ai-project-failure-statistics-2026
-
McKinsey — “The State of AI in 2025.” n=1,600+ respondents, November 2025. Source for <20% KPI tracking rate as strongest predictor of bottom-line impact, 5.5% EBIT threshold. Independent survey with consistent annual methodology. High credibility. https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai
-
Gartner — “5 AI Metrics That Actually Prove ROI to Your Board.” n=506 CIOs, May 2025. Source for 72% of CIOs breaking even or losing money on AI investments. Independent analyst firm. High credibility. https://www.gartner.com/en/articles/ai-value-metrics
-
IBM — “How to Maximize AI ROI in 2026.” 2025-2026. Source for 29% confident ROI measurement rate. Vendor-published but citing independent research. Moderate credibility. https://www.ibm.com/think/insights/ai-roi
-
Workday — “Measure the ROI of AI With This One Weird Trick.” February 2026. Source for 37-40% review tax on AI time savings, 85% of employees saving 1-7 hours/week. Vendor-published platform data. Moderate-high credibility. https://www.workday.com/en-us/perspectives/finance/2026/02/measure-roi-of-ai.html
-
Faros AI — “The AI Productivity Paradox.” n=10,000+ developers, 1,255 teams, July 2025. Source for 21% individual task gains with zero organizational throughput improvement. Vendor but observational telemetry data. High credibility. https://www.faros.ai/blog/ai-software-engineering
-
Nebuly — “Defining Adoption Benchmarks for Enterprise AI.” 2025. Source for 40-60% Day 30 activation threshold, 15-25% daily active use benchmark. AI platform vendor benchmarks. Moderate credibility. https://www.nebuly.com/blog/defining-adoption-benchmarks-for-enterprise-ai-what-good-looks-like-at-30-60-and-90-days
-
Worklytics — “2025 Benchmarks: What Percentage of Employees Use AI Tools Weekly.” 2025. Source for industry-specific adoption rate benchmarks and manager adoption multiplier. Analytics vendor aggregated data. Moderate credibility. https://www.worklytics.co/resources/2025-ai-adoption-benchmarks-employee-usage-statistics
-
Microsoft — 11-minute daily time savings perception threshold. 2025. Vendor research. Moderate-high credibility. https://www.microsoft.com/en-us/worklab/
-
DX Research/Atlan — Year 1 TCO analysis. License fees = 10-20% of total. 2.5x Year 1 multiplier. 2025. Independent. High credibility.
-
UK Central Digital and Data Office — Microsoft 365 Copilot trial. n=20,000 civil servants, 2025. Source for 26 min/day meeting summary savings. Government study. High credibility. https://www.geekwire.com/2025/microsoft-ai-tools-saved-british-government-workers-26-minutes-a-day-new-study-shows/
-
Brynjolfsson, Li, Raymond — “Generative AI at Work.” n=5,179 agents. Quarterly Journal of Economics, 2025. Source for 14% customer service productivity gain. Academic RCT. Very high credibility. https://www.nber.org/papers/w31161
-
APQC — Accounts payable automation benchmarks. 2025. Source for 60-80% invoice processing cost reduction. Independent. High credibility. https://www.apqc.org/resources/benchmarking/
-
DORA — State of DevOps 2025. ~5,000 developers. Source for 25-35% boilerplate coding speed gains. Google-affiliated but peer-reviewed. High credibility. https://dora.dev/research/2025/dora-report/
Brandon Sneider | brandon@brandonsneider.com March 2026