Measuring AI Success at 90 Days, 6 Months, and 12 Months: A Mid-Market Measurement Framework

Executive Summary

The single strongest predictor of AI bottom-line impact is tracking well-defined KPIs — yet fewer than 20% of organizations do it. McKinsey’s State of AI survey (n=1,600+, November 2025) identifies KPI tracking as the practice most correlated with financial returns from AI. The 80% that skip measurement are flying blind.
Companies that establish pre-deployment baselines are 3x more likely to achieve positive AI ROI. The most common mistake in enterprise AI is deploying tools without documenting current cost-per-transaction, hours-per-process, and error rates. Without a baseline, improvement is unmeasurable.
Projects with clear pre-approval success metrics achieve a 54% success rate versus 12% without. RAND Corporation data (2,400+ enterprise initiatives tracked through 2025-2026) shows that defining “what does success look like?” before writing the first check separates winners from the 80% failure rate.
Measurement cadence matters as much as metrics. The 5% of organizations capturing real AI value run weekly adoption dashboards, monthly enablement reviews, and quarterly business impact assessments — not annual strategy reviews.
The right measurement framework evolves with maturity. At 90 days, you measure adoption and baselines. At 6 months, you measure workflow integration and efficiency gains. At 12 months, you measure P&L impact and competitive position.

The Measurement Gap: Why 80% of Companies Cannot Prove AI Works

The data is stark. The NBER surveyed 5,956 executives across the US, UK, Germany, and Australia (February 2026) and found that 89% report zero measurable AI impact on labor productivity. McKinsey’s November 2025 survey finds that only 5.5% of organizations report AI contributing more than 5% of EBIT. Deloitte’s State of AI in the Enterprise (n=3,235, August-September 2025) shows that while 66% of organizations report productivity gains, only 20% can demonstrate measurable revenue impact.

These numbers do not mean AI fails. They mean most companies cannot measure whether it works — because they never established what “working” looks like before deployment.

The Pertama Partners analysis of 2,400+ enterprise AI initiatives (2025-2026) makes this concrete: projects with clear pre-approval metrics achieve a 54% success rate; projects without defined metrics achieve 12%. That is a 4.5x difference created entirely by the discipline of measurement.

Before You Deploy Anything: The Baseline Sprint

The single most common mistake in AI deployment — more common than picking the wrong tool or underinvesting in training — is failing to document current performance before introducing AI.

The 30-Day Baseline Protocol:

For every workflow you plan to augment with AI, document three numbers:

Cost per transaction, fully loaded. Include labor (hourly rate × time), tools, error correction, rework, and management overhead. A manual invoice costs $15-$22 to process (APQC benchmarks). A customer service interaction costs $12-$18 via email, $25-$35 via phone (industry benchmarks, 2025). If you do not know your cost per transaction, you cannot calculate savings.
Hours per process cycle. How long does the task take end-to-end, including handoffs, approvals, and rework? Log time for four weeks across the team, not just one performer. Variation across people reveals where AI might help most.
Error/rework rate. What percentage of outputs require correction, revision, or escalation? This becomes the quality baseline. AI that speeds up a process but increases errors from 5% to 15% has destroyed value, not created it.

Additionally, document:

Volume metrics. How many transactions, tickets, reports, or deliverables per week?
Cycle time. From request to completion, what is the elapsed time?
Customer/stakeholder satisfaction scores if available.

This baseline sprint is not optional. Anthropic’s own research (November 2025) found that AI reduces task completion time by approximately 80% on targeted tasks — but without a pre-deployment benchmark, that number is a vendor claim rather than your measured reality.

The Three-Phase Measurement Framework

Phase 1: 90 Days — Adoption and Early Signal (Are People Using It?)

At 90 days, you are measuring whether the tools landed, not whether they delivered ROI. Expecting P&L impact in the first quarter is unrealistic and will produce misleading conclusions.

Leading Indicators (check weekly):

Metric	Target at 90 Days	Source
Active user rate	40-60% of licensed seats	Worklytics benchmark, 2025
Time-to-proficiency	7-14 days from first use to consistent usage	Worklytics benchmark, 2025
Engagement depth	10+ prompts per active user per day (moving toward 15-25)	Worklytics benchmark, 2025
Manager adoption rate	1.2-1.5x individual contributor rate	Worklytics benchmark, 2025
Department adoption spread	At least 3 departments with >30% adoption	Industry practice
Training completion rate	80%+ of target users	Industry practice
Support ticket volume for AI tools	Declining week over week	Industry practice

What to Watch For:

The “11-minute threshold.” Microsoft’s research finds that users who save at least 11 minutes per day begin to perceive real productivity benefit. If your weekly survey data shows users estimating less than this, investigate whether the tool is being applied to the wrong tasks.
Light vs. heavy user distribution. Worklytics defines light users as 1-5 prompts per week and heavy users as 20+ prompts per week. At 90 days, you want to see the distribution shifting from light to heavy. A bimodal distribution — power users and non-users — signals a training or workflow design problem, not a tool problem.
Cross-tool overlap. If 30-50% of AI users are working across multiple AI tools (Copilot for code, ChatGPT for writing, specialized tools for domain tasks), you have organic adoption. If usage is confined to one tool in one department, you have a pilot, not a rollout.

What NOT to Measure at 90 Days:

Do not attempt to measure ROI, revenue impact, or P&L contribution at 90 days. IBM’s research states plainly that AI ROI “might not materialize in the short term.” Premature ROI measurement kills promising initiatives by applying the wrong evaluation criteria at the wrong stage. The 56% executive sponsorship dropout rate within six months (Pertama Partners, 2025) is partly caused by unrealistic early-stage expectations.

90-Day Dashboard: Three Cards

Build a simple weekly dashboard with three metrics:

Adoption rate — % of licensed users active in the past 7 days
Hours redirected — self-reported time saved per user per week (survey-based)
Satisfaction score — user sentiment (simple 1-5 scale)

Phase 2: 6 Months — Integration and Efficiency (Is It Changing How Work Gets Done?)

At six months, the question shifts from “are people using it?” to “is it embedded in workflows, and can we see efficiency gains?”

Core Metrics (check monthly):

Metric	Target at 6 Months	Source/Benchmark
Active user rate	60-75% of licensed seats	Worklytics maturity model
Cost per transaction (vs. baseline)	15-30% reduction	Industry data, 2025-2026
Cycle time (vs. baseline)	20-40% reduction on targeted processes	BCG/McKinsey survey data
Error/rework rate (vs. baseline)	Flat or declining	Quality baseline
AI-assisted task completion rate	25-40% of target workflow tasks	Worklytics benchmark
Process straight-through rate	Measurable increase in automated completion	Gartner value framework
Capacity unlocked	Documented reallocation of freed time	Industry practice

The Capacity Unlock Test:

The most important 6-month metric is not time saved — it is what the organization did with the time saved. One manufacturing CFO reported that a 5-person finance team freed 80 hours monthly through AI automation, then redirected that capacity to variance analysis that identified $1.2M in cost savings (ChatFin, 2026).

If your teams are saving time but no one can point to what they are doing with it, you have an efficiency gain that will evaporate at the next budget cycle. Document the reallocation.

The Review Tax:

Workday’s 2026 data reveals that 37-40% of AI time savings are consumed by reviewing, correcting, and verifying AI output. Your measurement framework must account for this. If a process takes 60 minutes manually and an AI draft takes 15 minutes, but review and correction take 25 minutes, the net savings are 20 minutes (33%), not 45 minutes (75%). Honest measurement of net savings, not gross AI speed, separates credible programs from vendor demos.

Pilot-to-Production Conversion:

By month six, at least one pilot should have graduated to production use. MIT Sloan’s research (2025) shows that 95% of generative AI pilots fail to reach production. If your pilots are still pilots at six months, investigate whether the issue is measurement (unclear success criteria), governance (no approval pathway), or technical (integration barriers).

6-Month Dashboard: Four Cards

Adoption depth — heavy user percentage (20+ prompts/week)
Net time savings — hours saved minus review/correction time, per process
Cost per transaction trend — tracked against pre-deployment baseline
Capacity reallocation — documented evidence of freed time redirected to higher-value work

Phase 3: 12 Months — Business Impact (Can You See It in the P&L?)

At twelve months, you should be able to answer: “What did we get for the money?” If you cannot, the measurement framework failed at phase one (no baseline) or phase two (no process integration).

Impact Metrics (check quarterly):

Metric	Target at 12 Months	Source/Benchmark
Active user rate	75%+ of licensed seats	Worklytics optimization stage
Measurable productivity improvement	15-30% on targeted processes	Worklytics benchmark
Cost savings documented	Specific dollar amounts tied to baselines	Pre-deployment baseline
Revenue impact (if applicable)	Measurable — even if modest	Deloitte: only 20% achieve this
Employee satisfaction with AI tools	Net positive sentiment	Internal survey
AI tool utilization vs. license cost	>60% utilization (vs. 54% SaaS average)	Zylo SaaS benchmark
Processes with AI embedded in standard workflow	5+ production processes	Internal tracking

The P&L Connection:

McKinsey’s data shows that only 39% of organizations can link any EBIT impact to AI, and for most, the impact is below 5%. The organizations that can make this connection do three things differently:

They established baselines. They know the before number, so the after number means something.
They track at the process level, not the tool level. “Copilot improved productivity” is unmeasurable. “Contract review cycle time dropped from 4.2 days to 2.1 days after deploying AI-assisted review” is measurable and ties to billable hours or client satisfaction.
They isolate AI impact from other variables. They use control groups (teams with and without AI), before/after analysis with normalization, or phased rollouts that create natural comparisons.

The Budget Allocation Check:

BCG’s 10-20-70 framework recommends that 70% of AI investment go to people and processes, 20% to technology and data, and 10% to algorithms. At 12 months, compare your actual spend allocation to this benchmark. Successful projects invest 47% of budget in foundations (data, governance, change management) versus 18% in failed projects (Pertama Partners, 2025). If your spending skewed toward licenses and away from training and process redesign, your results will reflect it.

Key Data Points

<20% of organizations track KPIs for AI solutions — yet this is the strongest predictor of bottom-line impact (McKinsey State of AI, n=1,600+, November 2025)
54% success rate with pre-defined metrics vs. 12% without (RAND/Pertama Partners, 2,400+ enterprise initiatives, 2025-2026)
3x higher likelihood of positive ROI when pre-deployment baselines are established (industry analysis, 2025-2026)
89% of executives report zero AI productivity impact (NBER, n=5,956, February 2026) — primarily a measurement problem, not an AI problem
37-40% of AI time savings consumed by review and correction (Workday, 2026) — the “review tax” most ROI models ignore
56% of executive sponsors disengage within 6 months (Pertama Partners, 2025) — premature ROI expectations are a contributing factor
95% of GenAI pilots fail to reach production (MIT Sloan, 2025) — undefined success criteria are a primary cause
68% success rate with sustained CEO involvement vs. 11% without (Pertama Partners, 2025)
11 minutes of daily time savings is the threshold where users perceive real productivity benefit (Microsoft, 2025)
Only 5.5% of organizations report AI contributing >5% of EBIT (McKinsey, November 2025)

What This Means for Your Organization

The measurement gap is the single largest solvable problem in enterprise AI. Most organizations buy tools, deploy them, and then ask whether they worked — in that order. The 5% that capture real value reverse the sequence: they define what “working” looks like, document current performance, deploy tools, and then measure against their own baselines.

For a 200-500 person company, this does not require a data science team or an expensive analytics platform. It requires discipline. Four weeks of documenting current process costs and cycle times before deployment. A simple weekly dashboard with three to four metrics. Monthly reviews that ask “what are we doing with the time we saved?” rather than “are people using the tool?”

The practical implication is uncomfortable but liberating: you probably cannot demonstrate AI ROI right now because you do not know what your processes cost today. The first step is not buying another AI tool. The first step is a stopwatch and a spreadsheet — documenting the cost per transaction, hours per cycle, and error rate for the five processes you plan to augment. That baseline is worth more than any vendor demo, because it turns “AI improved productivity” from a hope into a testable hypothesis.

If you are already past deployment without baselines, it is not too late. Establish baselines now for new processes, and use before/after estimation for existing ones. Imperfect measurement is infinitely better than no measurement — which is where 80% of organizations are today.

Sources

McKinsey, “The State of AI: Agents, Innovation, and Transformation” (November 2025, n=1,600+ respondents) — KPI tracking as strongest predictor of bottom-line impact, fewer than 20% tracking KPIs. Source credibility: High — large-sample annual survey, independent methodology. https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai
NBER Working Paper 34836, “Firm Data on AI” by Yotzov & Barrero (February 2026, n=5,956 executives across US, UK, Germany, Australia) — 89% report zero AI productivity impact. Source credibility: High — independent academic research, large stratified sample. https://www.nber.org/papers/w34836
Pertama Partners, “AI Project Failure Statistics 2026” (2025-2026, 2,400+ enterprise initiatives tracked) — 54% success rate with pre-defined metrics vs. 12% without, 80.3% overall failure rate, budget allocation patterns. Source credibility: Medium-high — consulting firm aggregating multiple research sources including RAND, MIT Sloan, Deloitte. https://www.pertamapartners.com/insights/ai-project-failure-statistics-2026
MIT Sloan Management Review (August 2025) — 95% of GenAI pilots fail to reach production. Source credibility: High — independent academic institution. https://sloanreview.mit.edu/
Deloitte, “State of AI in the Enterprise 2026” (n=3,235, August-September 2025) — 66% report productivity gains but only 20% can demonstrate measurable revenue impact. Source credibility: High — large global survey, established methodology. https://www.deloitte.com/us/en/what-we-do/capabilities/applied-artificial-intelligence/content/state-of-ai-in-the-enterprise.html
Worklytics, “Top 10 KPIs Every AI Adoption Dashboard Must Track” (2025) — Adoption benchmarks, engagement depth targets, maturity stage definitions. Source credibility: Medium — vendor-published but based on aggregated customer data, specific and measurable. https://www.worklytics.co/resources/top-10-kpis-ai-adoption-dashboard-2025-dax-formulas
BCG, “The Widening AI Value Gap” (September 2025) — 10-20-70 resource allocation framework. Source credibility: High — major consulting firm with large enterprise client base. https://media-publications.bcg.com/The-Widening-AI-Value-Gap-Sept-2025.pdf
Gartner, “5 AI Metrics That Actually Prove ROI to Your Board” (2025-2026) — Value framework linking leading and lagging indicators, 72% of CIOs report breaking even or losing money on AI investments. Source credibility: High — leading analyst firm, survey of 506 CIOs. https://www.gartner.com/en/articles/ai-value-metrics
IBM, “How to Maximize AI ROI in 2026” — 29% of executives measure ROI confidently, technical debt paydown improves AI ROI by up to 29%. Source credibility: Medium — vendor-published but citing independent research. https://www.ibm.com/think/insights/ai-roi
Workday AI Time Savings Data (2026) — 37-40% of AI time savings consumed by review and correction. Source credibility: Medium-high — based on platform usage data across enterprise customers. Referenced in multiple analyst reports.
Gartner CIO Survey (May 2025, n=506 CIOs) — 72% of CIOs report breaking even or losing money on AI investments. Source credibility: High — established analyst firm, specific survey methodology. https://www.gartner.com/en/newsroom/press-releases/2025-10-20-gartner-survey-finds-all-it-work-will-involve-ai-by-2030-organizations-must-navigate-ai-readiness-and-human-readiness-to-find-capture-and-sustain-value

Created by Brandon Sneider | brandon@brandonsneider.com March 2026