Measuring AI Success at 90 Days, 6 Months, and 12 Months: A Mid-Market Measurement Framework
Executive Summary
- The single strongest predictor of AI bottom-line impact is tracking well-defined KPIs — yet fewer than 20% of organizations do it. McKinsey’s State of AI survey (n=1,600+, November 2025) identifies KPI tracking as the practice most correlated with financial returns from AI. The 80% that skip measurement are flying blind.
- Companies that establish pre-deployment baselines are 3x more likely to achieve positive AI ROI. The most common mistake in enterprise AI is deploying tools without documenting current cost-per-transaction, hours-per-process, and error rates. Without a baseline, improvement is unmeasurable.
- Projects with clear pre-approval success metrics achieve a 54% success rate versus 12% without. RAND Corporation data (2,400+ enterprise initiatives tracked through 2025-2026) shows that defining “what does success look like?” before writing the first check separates winners from the 80% failure rate.
- Measurement cadence matters as much as metrics. The 5% of organizations capturing real AI value run weekly adoption dashboards, monthly enablement reviews, and quarterly business impact assessments — not annual strategy reviews.
- The right measurement framework evolves with maturity. At 90 days, you measure adoption and baselines. At 6 months, you measure workflow integration and efficiency gains. At 12 months, you measure P&L impact and competitive position.
The Measurement Gap: Why 80% of Companies Cannot Prove AI Works
The data is stark. The NBER surveyed 5,956 executives across the US, UK, Germany, and Australia (February 2026) and found that 89% report zero measurable AI impact on labor productivity. McKinsey’s November 2025 survey finds that only 5.5% of organizations report AI contributing more than 5% of EBIT. Deloitte’s State of AI in the Enterprise (n=3,235, August-September 2025) shows that while 66% of organizations report productivity gains, only 20% can demonstrate measurable revenue impact.
These numbers do not mean AI fails. They mean most companies cannot measure whether it works — because they never established what “working” looks like before deployment.
The Pertama Partners analysis of 2,400+ enterprise AI initiatives (2025-2026) makes this concrete: projects with clear pre-approval metrics achieve a 54% success rate; projects without defined metrics achieve 12%. That is a 4.5x difference created entirely by the discipline of measurement.
Before You Deploy Anything: The Baseline Sprint
The single most common mistake in AI deployment — more common than picking the wrong tool or underinvesting in training — is failing to document current performance before introducing AI.
The 30-Day Baseline Protocol:
For every workflow you plan to augment with AI, document three numbers:
-
Cost per transaction, fully loaded. Include labor (hourly rate × time), tools, error correction, rework, and management overhead. A manual invoice costs $15-$22 to process (APQC benchmarks). A customer service interaction costs $12-$18 via email, $25-$35 via phone (industry benchmarks, 2025). If you do not know your cost per transaction, you cannot calculate savings.
-
Hours per process cycle. How long does the task take end-to-end, including handoffs, approvals, and rework? Log time for four weeks across the team, not just one performer. Variation across people reveals where AI might help most.
-
Error/rework rate. What percentage of outputs require correction, revision, or escalation? This becomes the quality baseline. AI that speeds up a process but increases errors from 5% to 15% has destroyed value, not created it.
Additionally, document:
- Volume metrics. How many transactions, tickets, reports, or deliverables per week?
- Cycle time. From request to completion, what is the elapsed time?
- Customer/stakeholder satisfaction scores if available.
This baseline sprint is not optional. Anthropic’s own research (November 2025) found that AI reduces task completion time by approximately 80% on targeted tasks — but without a pre-deployment benchmark, that number is a vendor claim rather than your measured reality.
The Three-Phase Measurement Framework
Phase 1: 90 Days — Adoption and Early Signal (Are People Using It?)
At 90 days, you are measuring whether the tools landed, not whether they delivered ROI. Expecting P&L impact in the first quarter is unrealistic and will produce misleading conclusions.
Leading Indicators (check weekly):
| Metric | Target at 90 Days | Source |
|---|---|---|
| Active user rate | 40-60% of licensed seats | Worklytics benchmark, 2025 |
| Time-to-proficiency | 7-14 days from first use to consistent usage | Worklytics benchmark, 2025 |
| Engagement depth | 10+ prompts per active user per day (moving toward 15-25) | Worklytics benchmark, 2025 |
| Manager adoption rate | 1.2-1.5x individual contributor rate | Worklytics benchmark, 2025 |
| Department adoption spread | At least 3 departments with >30% adoption | Industry practice |
| Training completion rate | 80%+ of target users | Industry practice |
| Support ticket volume for AI tools | Declining week over week | Industry practice |
What to Watch For:
- The “11-minute threshold.” Microsoft’s research finds that users who save at least 11 minutes per day begin to perceive real productivity benefit. If your weekly survey data shows users estimating less than this, investigate whether the tool is being applied to the wrong tasks.
- Light vs. heavy user distribution. Worklytics defines light users as 1-5 prompts per week and heavy users as 20+ prompts per week. At 90 days, you want to see the distribution shifting from light to heavy. A bimodal distribution — power users and non-users — signals a training or workflow design problem, not a tool problem.
- Cross-tool overlap. If 30-50% of AI users are working across multiple AI tools (Copilot for code, ChatGPT for writing, specialized tools for domain tasks), you have organic adoption. If usage is confined to one tool in one department, you have a pilot, not a rollout.
What NOT to Measure at 90 Days:
Do not attempt to measure ROI, revenue impact, or P&L contribution at 90 days. IBM’s research states plainly that AI ROI “might not materialize in the short term.” Premature ROI measurement kills promising initiatives by applying the wrong evaluation criteria at the wrong stage. The 56% executive sponsorship dropout rate within six months (Pertama Partners, 2025) is partly caused by unrealistic early-stage expectations.
90-Day Dashboard: Three Cards
Build a simple weekly dashboard with three metrics:
- Adoption rate — % of licensed users active in the past 7 days
- Hours redirected — self-reported time saved per user per week (survey-based)
- Satisfaction score — user sentiment (simple 1-5 scale)
Phase 2: 6 Months — Integration and Efficiency (Is It Changing How Work Gets Done?)
At six months, the question shifts from “are people using it?” to “is it embedded in workflows, and can we see efficiency gains?”
Core Metrics (check monthly):
| Metric | Target at 6 Months | Source/Benchmark |
|---|---|---|
| Active user rate | 60-75% of licensed seats | Worklytics maturity model |
| Cost per transaction (vs. baseline) | 15-30% reduction | Industry data, 2025-2026 |
| Cycle time (vs. baseline) | 20-40% reduction on targeted processes | BCG/McKinsey survey data |
| Error/rework rate (vs. baseline) | Flat or declining | Quality baseline |
| AI-assisted task completion rate | 25-40% of target workflow tasks | Worklytics benchmark |
| Process straight-through rate | Measurable increase in automated completion | Gartner value framework |
| Capacity unlocked | Documented reallocation of freed time | Industry practice |
The Capacity Unlock Test:
The most important 6-month metric is not time saved — it is what the organization did with the time saved. One manufacturing CFO reported that a 5-person finance team freed 80 hours monthly through AI automation, then redirected that capacity to variance analysis that identified $1.2M in cost savings (ChatFin, 2026).
If your teams are saving time but no one can point to what they are doing with it, you have an efficiency gain that will evaporate at the next budget cycle. Document the reallocation.
The Review Tax:
Workday’s 2026 data reveals that 37-40% of AI time savings are consumed by reviewing, correcting, and verifying AI output. Your measurement framework must account for this. If a process takes 60 minutes manually and an AI draft takes 15 minutes, but review and correction take 25 minutes, the net savings are 20 minutes (33%), not 45 minutes (75%). Honest measurement of net savings, not gross AI speed, separates credible programs from vendor demos.
Pilot-to-Production Conversion:
By month six, at least one pilot should have graduated to production use. MIT Sloan’s research (2025) shows that 95% of generative AI pilots fail to reach production. If your pilots are still pilots at six months, investigate whether the issue is measurement (unclear success criteria), governance (no approval pathway), or technical (integration barriers).
6-Month Dashboard: Four Cards
- Adoption depth — heavy user percentage (20+ prompts/week)
- Net time savings — hours saved minus review/correction time, per process
- Cost per transaction trend — tracked against pre-deployment baseline
- Capacity reallocation — documented evidence of freed time redirected to higher-value work
Phase 3: 12 Months — Business Impact (Can You See It in the P&L?)
At twelve months, you should be able to answer: “What did we get for the money?” If you cannot, the measurement framework failed at phase one (no baseline) or phase two (no process integration).
Impact Metrics (check quarterly):
| Metric | Target at 12 Months | Source/Benchmark |
|---|---|---|
| Active user rate | 75%+ of licensed seats | Worklytics optimization stage |
| Measurable productivity improvement | 15-30% on targeted processes | Worklytics benchmark |
| Cost savings documented | Specific dollar amounts tied to baselines | Pre-deployment baseline |
| Revenue impact (if applicable) | Measurable — even if modest | Deloitte: only 20% achieve this |
| Employee satisfaction with AI tools | Net positive sentiment | Internal survey |
| AI tool utilization vs. license cost | >60% utilization (vs. 54% SaaS average) | Zylo SaaS benchmark |
| Processes with AI embedded in standard workflow | 5+ production processes | Internal tracking |
The P&L Connection:
McKinsey’s data shows that only 39% of organizations can link any EBIT impact to AI, and for most, the impact is below 5%. The organizations that can make this connection do three things differently:
- They established baselines. They know the before number, so the after number means something.
- They track at the process level, not the tool level. “Copilot improved productivity” is unmeasurable. “Contract review cycle time dropped from 4.2 days to 2.1 days after deploying AI-assisted review” is measurable and ties to billable hours or client satisfaction.
- They isolate AI impact from other variables. They use control groups (teams with and without AI), before/after analysis with normalization, or phased rollouts that create natural comparisons.
The Budget Allocation Check:
BCG’s 10-20-70 framework recommends that 70% of AI investment go to people and processes, 20% to technology and data, and 10% to algorithms. At 12 months, compare your actual spend allocation to this benchmark. Successful projects invest 47% of budget in foundations (data, governance, change management) versus 18% in failed projects (Pertama Partners, 2025). If your spending skewed toward licenses and away from training and process redesign, your results will reflect it.
Key Data Points
- <20% of organizations track KPIs for AI solutions — yet this is the strongest predictor of bottom-line impact (McKinsey State of AI, n=1,600+, November 2025)
- 54% success rate with pre-defined metrics vs. 12% without (RAND/Pertama Partners, 2,400+ enterprise initiatives, 2025-2026)
- 3x higher likelihood of positive ROI when pre-deployment baselines are established (industry analysis, 2025-2026)
- 89% of executives report zero AI productivity impact (NBER, n=5,956, February 2026) — primarily a measurement problem, not an AI problem
- 37-40% of AI time savings consumed by review and correction (Workday, 2026) — the “review tax” most ROI models ignore
- 56% of executive sponsors disengage within 6 months (Pertama Partners, 2025) — premature ROI expectations are a contributing factor
- 95% of GenAI pilots fail to reach production (MIT Sloan, 2025) — undefined success criteria are a primary cause
- 68% success rate with sustained CEO involvement vs. 11% without (Pertama Partners, 2025)
- 11 minutes of daily time savings is the threshold where users perceive real productivity benefit (Microsoft, 2025)
- Only 5.5% of organizations report AI contributing >5% of EBIT (McKinsey, November 2025)
What This Means for Your Organization
The measurement gap is the single largest solvable problem in enterprise AI. Most organizations buy tools, deploy them, and then ask whether they worked — in that order. The 5% that capture real value reverse the sequence: they define what “working” looks like, document current performance, deploy tools, and then measure against their own baselines.
For a 200-500 person company, this does not require a data science team or an expensive analytics platform. It requires discipline. Four weeks of documenting current process costs and cycle times before deployment. A simple weekly dashboard with three to four metrics. Monthly reviews that ask “what are we doing with the time we saved?” rather than “are people using the tool?”
The practical implication is uncomfortable but liberating: you probably cannot demonstrate AI ROI right now because you do not know what your processes cost today. The first step is not buying another AI tool. The first step is a stopwatch and a spreadsheet — documenting the cost per transaction, hours per cycle, and error rate for the five processes you plan to augment. That baseline is worth more than any vendor demo, because it turns “AI improved productivity” from a hope into a testable hypothesis.
If you are already past deployment without baselines, it is not too late. Establish baselines now for new processes, and use before/after estimation for existing ones. Imperfect measurement is infinitely better than no measurement — which is where 80% of organizations are today.
Sources
-
McKinsey, “The State of AI: Agents, Innovation, and Transformation” (November 2025, n=1,600+ respondents) — KPI tracking as strongest predictor of bottom-line impact, fewer than 20% tracking KPIs. Source credibility: High — large-sample annual survey, independent methodology. https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai
-
NBER Working Paper 34836, “Firm Data on AI” by Yotzov & Barrero (February 2026, n=5,956 executives across US, UK, Germany, Australia) — 89% report zero AI productivity impact. Source credibility: High — independent academic research, large stratified sample. https://www.nber.org/papers/w34836
-
Pertama Partners, “AI Project Failure Statistics 2026” (2025-2026, 2,400+ enterprise initiatives tracked) — 54% success rate with pre-defined metrics vs. 12% without, 80.3% overall failure rate, budget allocation patterns. Source credibility: Medium-high — consulting firm aggregating multiple research sources including RAND, MIT Sloan, Deloitte. https://www.pertamapartners.com/insights/ai-project-failure-statistics-2026
-
MIT Sloan Management Review (August 2025) — 95% of GenAI pilots fail to reach production. Source credibility: High — independent academic institution. https://sloanreview.mit.edu/
-
Deloitte, “State of AI in the Enterprise 2026” (n=3,235, August-September 2025) — 66% report productivity gains but only 20% can demonstrate measurable revenue impact. Source credibility: High — large global survey, established methodology. https://www.deloitte.com/us/en/what-we-do/capabilities/applied-artificial-intelligence/content/state-of-ai-in-the-enterprise.html
-
Worklytics, “Top 10 KPIs Every AI Adoption Dashboard Must Track” (2025) — Adoption benchmarks, engagement depth targets, maturity stage definitions. Source credibility: Medium — vendor-published but based on aggregated customer data, specific and measurable. https://www.worklytics.co/resources/top-10-kpis-ai-adoption-dashboard-2025-dax-formulas
-
BCG, “The Widening AI Value Gap” (September 2025) — 10-20-70 resource allocation framework. Source credibility: High — major consulting firm with large enterprise client base. https://media-publications.bcg.com/The-Widening-AI-Value-Gap-Sept-2025.pdf
-
Gartner, “5 AI Metrics That Actually Prove ROI to Your Board” (2025-2026) — Value framework linking leading and lagging indicators, 72% of CIOs report breaking even or losing money on AI investments. Source credibility: High — leading analyst firm, survey of 506 CIOs. https://www.gartner.com/en/articles/ai-value-metrics
-
IBM, “How to Maximize AI ROI in 2026” — 29% of executives measure ROI confidently, technical debt paydown improves AI ROI by up to 29%. Source credibility: Medium — vendor-published but citing independent research. https://www.ibm.com/think/insights/ai-roi
-
Workday AI Time Savings Data (2026) — 37-40% of AI time savings consumed by review and correction. Source credibility: Medium-high — based on platform usage data across enterprise customers. Referenced in multiple analyst reports.
-
Gartner CIO Survey (May 2025, n=506 CIOs) — 72% of CIOs report breaking even or losing money on AI investments. Source credibility: High — established analyst firm, specific survey methodology. https://www.gartner.com/en/newsroom/press-releases/2025-10-20-gartner-survey-finds-all-it-work-will-involve-ai-by-2030-organizations-must-navigate-ai-readiness-and-human-readiness-to-find-capture-and-sustain-value
Created by Brandon Sneider | brandon@brandonsneider.com March 2026