← Findings 🕐 11 min read

Findings

Three Numbers to Track in Your First 90 Days: The Success Metrics Card That Prevents the #2 AI Failure Mode

Brandon Sneider · March 2026

Measurement frameworks with a dozen KPIs produce two outcomes: paralysis or abandonment. A 200-person company does not have a dedicated AI analytics team.

Executive Summary

The #2 reason AI pilots fail — after selecting the wrong task — is launching without success criteria. Organizations that define metrics before deployment achieve a 54% success rate versus 12% that do not, a 4.5x difference created entirely by measurement discipline (Pertama Partners, n=2,400+ initiatives, 2025-2026).
Fewer than 20% of organizations track KPIs for their AI tools — yet McKinsey identifies KPI tracking as the single strongest predictor of bottom-line impact (n=1,600+, November 2025). The 80% flying blind are not failing because AI does not work. They are failing because they cannot tell whether it does.
This card gives every executive in the room three numbers to track from Day 1. Not a dashboard of 15 KPIs. Not a measurement framework that requires a data team. Three numbers, one spreadsheet, 15 minutes per week. The same three numbers that separate the 5% capturing value from the 95% running pilots that quietly die.

Why Three Numbers — Not Fifteen

Measurement frameworks with a dozen KPIs produce two outcomes: paralysis or abandonment. A 200-person company does not have a dedicated AI analytics team. The CIO already manages 14 vendor dashboards. Adding a complex AI measurement suite is the fastest way to ensure nobody measures anything.

The Pertama Partners analysis of 2,400+ enterprise AI initiatives (2025-2026) reveals that the discriminating factor is not sophistication of measurement — it is existence of measurement. Pre-defined success criteria of any kind produce 4.5x better outcomes. The three metrics below are deliberately simple because simple gets tracked, and tracked gets managed.

Each metric answers one question a CEO will ask within 90 days:

Metric	The CEO Question	When It Matters
Adoption rate	“Are people actually using this?”	Week 1 onward
Time saved per task	“Is it working?”	Week 3 onward
Cost-per-outcome trajectory	“Is it worth what we’re paying?”	Month 2 onward

Metric 1: Adoption Rate — “Are People Using It?”

What to measure: The percentage of licensed users who performed a meaningful AI interaction in the past seven days. Not logins. Not “opened the tool.” Meaningful use: generated a draft, ran an analysis, completed a task with AI assistance.

How to measure it: Most enterprise AI tools report weekly active users in their admin console. For M365 Copilot, check the Microsoft 365 admin center usage reports. For standalone tools, pull OAuth session data from the identity provider. If neither is available, run a 3-question weekly Slack or email survey: “Did you use [tool name] this week? For what task? Did it save time?”

Benchmarks at each gate:

Timeframe	Healthy	Warning	Critical
Day 30	40-60% of licensed users active	20-40%	Below 20%
Day 60	50-70% with 15-25% in daily use	30-50%	Below 30%
Day 90	60-75% with shift from light to heavy usage	40-60%	Below 40%

These thresholds are drawn from Nebuly’s enterprise AI benchmarks (2025) and Worklytics adoption data across industries (2025). The 40-60% Day 30 range represents the minimum viable activation threshold — below it, the pilot is already stalling.

What adoption rate does not tell you: That the tool is creating value. High adoption with no productivity gain is the pattern Faros AI documented across 10,000+ developers: 21% more tasks completed per person, zero improvement in organizational throughput (Faros AI, n=10,000+, July 2025). Adoption confirms people are using the tool. It does not confirm the tool is working. That is what Metric 2 answers.

The manager adoption signal: Worklytics benchmarks show that manager adoption should run 1.2-1.5x the individual contributor rate. When managers adopt AI tools, their teams follow. When managers do not use the tools they asked their teams to adopt, the signal is unmistakable — and adoption stalls within 60 days.

Metric 2: Time Saved Per Task — “Is It Working?”

What to measure: The net minutes saved per completed task, after subtracting review and correction time. Not gross AI speed. Net human time savings.

This distinction is the difference between a real metric and a vendor demo. Workday’s 2026 data shows that 37-40% of AI time savings are consumed by reviewing, correcting, and verifying AI output. A process that takes 60 minutes manually and 15 minutes with AI but 25 minutes to review produces a net savings of 20 minutes — not 45.

How to measure it: Select 3-5 high-frequency tasks from the pilot workflow. For each:

Step	Action	Tool
1	Document the pre-AI baseline time (from the baseline sprint before deployment)	Stopwatch, time-tracking tool, or manager estimate
2	Log AI-assisted completion time for 10 instances	Same method
3	Log review/correction time for those same 10 instances	Same method
4	Calculate: Baseline - (AI time + Review time) = Net savings	Spreadsheet

Benchmarks:

Task Category	Expected Net Savings (90 days)	Source
Meeting summaries	20-26 min/day	UK CDDO trial, n=20,000, 2025
Invoice processing	60-80% cost reduction per unit	APQC benchmarks, 2025
Tier-1 customer tickets	14% more resolved per hour	Brynjolfsson/Stanford RCT, n=5,179, 2025
Boilerplate code	25-35% speed improvement	DORA, ~5,000 developers, 2025
Contract clause review	70-80% time reduction	Concord/industry, 2025
Email drafting	30-50% time reduction	Microsoft/Forrester, 2025

The 11-minute threshold: Microsoft’s research identifies 11 minutes of daily time savings as the point where users begin perceiving real productivity benefit. Below this threshold, the tool feels like overhead rather than help. If Metric 2 shows less than 11 minutes per day in net savings, investigate whether the AI is being applied to the right tasks — not whether the tool is broken.

What time saved does not tell you: That the savings are worth the cost. A tool that saves 15 minutes per person per day across 50 users saves 625 hours per month. If the tool costs $50,000 per month and those hours are not redirected to revenue-generating or cost-reducing work, the math does not close. That is what Metric 3 answers.

Metric 3: Cost-Per-Outcome Trajectory — “Is It Worth What We’re Paying?”

What to measure: The fully loaded cost to produce one unit of output in the target workflow, tracked monthly, compared against the pre-AI baseline.

This is the metric that connects to the P&L. It is also the metric most organizations skip — which is why 72% of CIOs report breaking even or losing money on AI investments (Gartner, n=506 CIOs, May 2025) and only 29% of executives say they can measure ROI confidently (IBM, 2025-2026).

How to calculate it:

Step 1: Define the “outcome.” This must be a countable unit of completed work, not activity.

Workflow	Outcome Unit	Not This (Activity)
Accounts payable	Processed invoice	“Used the AI tool”
Customer service	Resolved ticket	“AI-assisted interaction”
Contract review	Reviewed contract	“Documents scanned”
Content creation	Published piece	“Drafts generated”
Software development	Merged pull request	“Lines of code”

Step 2: Calculate the fully loaded cost per outcome.

Cost per outcome = (Labor cost + AI tool cost + Review/correction cost + Management overhead) ÷ Number of completed outcomes

The denominator is outcomes, not interactions. An AI that generates 50 drafts but produces 10 usable pieces costs 5x what the per-interaction metrics suggest.

Step 3: Compare to baseline and track the trajectory.

Month	Pre-AI Cost/Unit	AI-Assisted Cost/Unit	Trajectory
Baseline	$18.00	—	—
Month 1	—	$22.00	↑ Higher (expected: learning curve)
Month 2	—	$14.50	↓ Declining (on track)
Month 3	—	$11.00	↓ Declining (healthy)

Why trajectory matters more than the number: Month 1 will almost always show costs rising. Training time, workflow disruption, the review tax, and simple learning curves push costs up before they come down. This is normal. The Pertama Partners data shows a 2-4 week productivity dip is expected during AI adoption. The critical signal is the slope from Month 1 to Month 3. A declining trajectory means the organization is learning. A flat or rising trajectory at Month 3 means something is structurally wrong — the task selection, the workflow design, or the training investment.

The true cost multiplier: Remember that license fees represent 10-20% of actual AI deployment cost (DX Research/Atlan, 2025). Year 1 total cost typically runs 2.5x the license fee. The cost-per-outcome calculation must include the full cost, not just the line item on the software invoice. If the CFO is tracking only license spend, the metric is reporting a fiction.

What cost-per-outcome tells the CEO: Whether to scale, restructure, or kill the pilot at Day 90. A declining cost-per-outcome trajectory with healthy adoption is the green light for production investment. A flat trajectory with high adoption means the tool works but the workflow was not redesigned — the most common fixable failure mode. A rising trajectory with declining adoption is the signal to terminate.

The One-Page 90-Day Tracking Sheet

Copy this to a spreadsheet. Update weekly. Bring it to every pilot review meeting.

Week	Adoption Rate (% active)	Time Saved/Task (net min)	Cost/Outcome ($)	Notes
Baseline	—	—	$_____	Pre-deployment measurement
Week 1	___%	___min	$_____
Week 2	___%	___min	$_____
Week 3	___%	___min	$_____
Week 4	___%	___min	$_____	Day 30 gate review
Week 5	___%	___min	$_____
Week 6	___%	___min	$_____
Week 7	___%	___min	$_____
Week 8	___%	___min	$_____	Day 60 gate review
Week 9	___%	___min	$_____
Week 10	___%	___min	$_____
Week 11	___%	___min	$_____
Week 12	___%	___min	$_____	Day 90 decision: scale / restructure / kill

At each gate, ask three questions:

Is adoption rising, flat, or falling?
Is net time saved per task increasing or decreasing?
Is cost per outcome declining toward or below baseline?

Three rising or declining trends in the right direction: scale. Mixed signals with one metric lagging: restructure the lagging dimension (usually workflow design or training). All three flat or moving the wrong direction: terminate and reallocate budget.

Key Data Points

Metric	Finding	Source
Success rate with pre-defined metrics vs. without	54% vs. 12% (4.5x)	Pertama Partners, n=2,400+, 2025-2026
Organizations tracking AI KPIs	Fewer than 20%	McKinsey, n=1,600+, November 2025
CIOs breaking even or losing money on AI	72%	Gartner, n=506, May 2025
Executives who can measure AI ROI confidently	29%	IBM, 2025-2026
AI time savings consumed by review/correction	37-40%	Workday, 2026
Daily time savings threshold for perceived benefit	11 minutes	Microsoft, 2025
True cost vs. license fee	2.5x Year 1 (license = 10-20% of total)	DX Research/Atlan, 2025
Individual task gains vs. organizational improvement	21% more tasks, 0% org throughput	Faros AI, n=10,000+, 2025
Employees saving 1-7 hours/week with AI	85%	Workday, February 2026
Organizations that can link EBIT impact to AI	Only 5.5% report >5% EBIT	McKinsey, n=1,600+, November 2025

What This Means for Your Organization

The pattern across 2,400+ enterprise AI initiatives is unambiguous: the organizations that define success before they deploy AI are 4.5x more likely to achieve it. Not because the metrics themselves are magic — but because the act of defining them forces three decisions most companies skip. Which workflow are you targeting? What does that workflow cost today? What would success look like in 90 days? Companies that answer these questions before writing a purchase order make fundamentally different deployment decisions than companies that answer them at the quarterly review.

The three metrics on this card are not the only things worth measuring. At six months, you should track capacity reallocation — what the organization did with the time saved. At twelve months, you should measure P&L impact at the process level. But in the first 90 days, three numbers are enough. More metrics at this stage create the illusion of rigor while diluting focus. Track adoption, time saved, and cost per outcome. If all three trend in the right direction, you have earned the right to invest in a more sophisticated measurement infrastructure.

If this card raised questions about which workflow to baseline, how to calculate your cost per outcome, or how to design the 90-day gate reviews around these metrics — that is exactly the conversation worth having early. brandon@brandonsneider.com.

Sources

Pertama Partners — “AI Project Failure Statistics 2026.” n=2,400+ enterprise AI initiatives, 2025-2026. Source for 54% vs. 12% metric success rate, overall failure patterns. Independent consulting analysis aggregating RAND, MIT Sloan, McKinsey, and Deloitte data. High credibility. https://www.pertamapartners.com/insights/ai-project-failure-statistics-2026
McKinsey — “The State of AI in 2025.” n=1,600+ respondents, November 2025. Source for <20% KPI tracking rate as strongest predictor of bottom-line impact, 5.5% EBIT threshold. Independent survey with consistent annual methodology. High credibility. https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai
Gartner — “5 AI Metrics That Actually Prove ROI to Your Board.” n=506 CIOs, May 2025. Source for 72% of CIOs breaking even or losing money on AI investments. Independent analyst firm. High credibility. https://www.gartner.com/en/articles/ai-value-metrics
IBM — “How to Maximize AI ROI in 2026.” 2025-2026. Source for 29% confident ROI measurement rate. Vendor-published but citing independent research. Moderate credibility. https://www.ibm.com/think/insights/ai-roi
Workday — “Measure the ROI of AI With This One Weird Trick.” February 2026. Source for 37-40% review tax on AI time savings, 85% of employees saving 1-7 hours/week. Vendor-published platform data. Moderate-high credibility. https://www.workday.com/en-us/perspectives/finance/2026/02/measure-roi-of-ai.html
Faros AI — “The AI Productivity Paradox.” n=10,000+ developers, 1,255 teams, July 2025. Source for 21% individual task gains with zero organizational throughput improvement. Vendor but observational telemetry data. High credibility. https://www.faros.ai/blog/ai-software-engineering
Nebuly — “Defining Adoption Benchmarks for Enterprise AI.” 2025. Source for 40-60% Day 30 activation threshold, 15-25% daily active use benchmark. AI platform vendor benchmarks. Moderate credibility. https://www.nebuly.com/blog/defining-adoption-benchmarks-for-enterprise-ai-what-good-looks-like-at-30-60-and-90-days
Worklytics — “2025 Benchmarks: What Percentage of Employees Use AI Tools Weekly.” 2025. Source for industry-specific adoption rate benchmarks and manager adoption multiplier. Analytics vendor aggregated data. Moderate credibility. https://www.worklytics.co/resources/2025-ai-adoption-benchmarks-employee-usage-statistics
Microsoft — 11-minute daily time savings perception threshold. 2025. Vendor research. Moderate-high credibility. https://www.microsoft.com/en-us/worklab/
DX Research/Atlan — Year 1 TCO analysis. License fees = 10-20% of total. 2.5x Year 1 multiplier. 2025. Independent. High credibility.
UK Central Digital and Data Office — Microsoft 365 Copilot trial. n=20,000 civil servants, 2025. Source for 26 min/day meeting summary savings. Government study. High credibility. https://www.geekwire.com/2025/microsoft-ai-tools-saved-british-government-workers-26-minutes-a-day-new-study-shows/
Brynjolfsson, Li, Raymond — “Generative AI at Work.” n=5,179 agents. Quarterly Journal of Economics, 2025. Source for 14% customer service productivity gain. Academic RCT. Very high credibility. https://www.nber.org/papers/w31161
APQC — Accounts payable automation benchmarks. 2025. Source for 60-80% invoice processing cost reduction. Independent. High credibility. https://www.apqc.org/resources/benchmarking/
DORA — State of DevOps 2025. ~5,000 developers. Source for 25-35% boilerplate coding speed gains. Google-affiliated but peer-reviewed. High credibility. https://dora.dev/research/2025/dora-report/

Brandon Sneider | brandon@brandonsneider.com March 2026