Real ROI by Business Function: Where Enterprises Are Actually Seeing Returns from AI
Executive Summary
- Only 5-6% of enterprises qualify as “AI high performers” with measurable EBIT impact of 5%+ (McKinsey 2025)
- 88% of companies use AI in at least one function, but meaningful bottom-line impact remains rare
- The highest proven ROI comes from the most boring applications: customer service routing, invoice processing, fraud detection
- Academic studies with control groups show mixed results: 14-40% productivity gains in some tasks, 19% slower in others
- Most enterprise AI ROI comes from configuring SaaS tools, not custom development – the “build your own” era of 2024 largely failed
- The functions where AI delivers the most value are NOT the ones getting the most press
The Evidence Hierarchy: What We Actually Know
Before ranking functions, it matters enormously where the data comes from. Here is the honest landscape of AI ROI evidence, ordered by reliability.
Tier 1: Randomized Controlled Trials (Highest Reliability)
These are rare but they exist, and they paint a more nuanced picture than vendor marketing.
Brynjolfsson, Li, & Raymond (Stanford/MIT, 2023-2024) – Customer Support
- 5,179 customer support agents, staggered rollout, real workplace setting
- AI assistance increased productivity by 14% on average (issues resolved per hour)
- Novice/low-skilled workers: 34% improvement; experienced workers: minimal gains
- Requests to speak to a manager declined by 25%
- Published in the Quarterly Journal of Economics (2025)
- Verdict: Real, measured, peer-reviewed. The strongest evidence we have for any business function.
Dell’Acqua et al. (Harvard/BCG, 2023) – “The Jagged Frontier”
- 758 BCG consultants (7% of individual contributor workforce), pre-registered experiment
- For tasks within AI capabilities: 25% faster, 40% higher quality, 12% more tasks completed
- For tasks outside AI capabilities: 19 percentage points worse performance
- Introduced the critical concept of AI’s “jagged technological frontier”
- Verdict: Peer-reviewed, rigorous, but reveals AI is a double-edged sword. Context determines everything.
Noy & Zhang (MIT, 2023) – Professional Writing Tasks
- 453 college-educated professionals, randomized ChatGPT access
- Time to complete writing tasks decreased by 40%, output quality rose by 18%
- Benefits concentrated among lower-ability workers (inequality compression)
- Published in Science
- Verdict: Strong methodology, but tasks were 20-30 minute writing exercises, not full workflows.
METR (2025) – Experienced Software Developers
- 16 experienced open-source developers, 246 tasks, randomized AI access
- Developers using AI were 19% SLOWER, not faster
- Developers believed they were 20% faster (massive perception gap)
- Developers averaged 5 years of experience on their specific repositories
- Tools used: Cursor Pro with Claude 3.5/3.7 Sonnet (frontier models at the time)
- Verdict: Small sample but rigorous methodology. Critically, shows experienced developers on familiar codebases may not benefit. Contradicts industry narratives.
Tier 2: Large Consulting Surveys (Useful but Self-Reported)
McKinsey Global Survey – “The State of AI” (2025)
- 88% of companies use AI in at least one function
- Only 39% report any EBIT impact at enterprise level; most say it is less than 5%
- Only ~6% qualify as “high performers” (5%+ EBIT impact from AI)
- Revenue increases most commonly reported in: marketing & sales, strategy & corporate finance, product development
- Cost reductions of 10-20% reported in: software engineering, manufacturing, IT
- High performers invest 3x more in process redesign than in software itself
- High performers are 3x more likely to have fundamentally redesigned workflows
- Caveat: Self-reported survey data from executives. Selection bias is real.
BCG – “From Potential to Profit” (January 2026)
- 1,400+ C-suite executives surveyed
- Big gaps emerging between “winners” and “observers”
- Focus on strategic priorities, transforming processes, and preparing workforces
- Caveat: C-suite self-reporting; details behind paywall.
Deloitte – “State of AI in the Enterprise” (2026)
- 3,235 leaders surveyed (Aug-Sep 2025)
- 74% say projects meet or exceed ROI expectations
- But: satisfactory ROI typically takes 2-4 years (3-4x longer than conventional tech)
- 44% of cybersecurity respondents report ROI surpassing expectations (highest of any function)
- Only 20% of organizations achieving revenue growth through AI (vs. 74% who hope to)
- Caveat: “Meeting expectations” is not the same as “delivering strong ROI.” Expectations may have been lowered.
Tier 3: Vendor-Funded Studies (Use with Extreme Caution)
Forrester TEI (Total Economic Impact) Studies
- All TEI studies are commissioned and paid for by the vendor being studied
- Recent AI-related TEIs include: Microsoft 365 Copilot, Microsoft Foundry, Five9, boost.ai, PolyAI, Writer
- Methodology: Forrester interviews select customers, builds financial model with benefits/costs/risks
- Forrester claims editorial independence: “Clients cannot purchase favorable opinions or results”
- However: vendors select which customers Forrester interviews, positive results are the norm, and these studies cannot be used in Forrester’s syndicated (independent) research
- Microsoft Copilot TEI claims: up to 353% ROI for SMBs
- Verdict: Treat as marketing collateral with a research veneer. The methodology is sound in theory but the selection bias in customer interviews is structural. These are NOT independent research.
Tier 4: Vendor Claims (Marketing, Not Evidence)
- GitHub Copilot: “55% faster task completion” (internal study, not independently verified)
- Google Cloud: “74% of executives achieve ROI in first year with AI agents”
- Salesforce: Various AI-driven revenue claims
- Verdict: Useful for understanding product capabilities, not for ROI planning.
Business Functions Ranked by Proven ROI
Based on the evidence hierarchy above, here is an honest ranking from strongest proven ROI to weakest.
Tier A: Strong Evidence of Positive ROI
1. Customer Service AI (Chatbots, Routing, Agent Assistance)
Evidence strength: HIGH – multiple RCTs plus extensive enterprise deployment data
| Metric | Result | Source |
|---|---|---|
| Productivity increase | 14% (issues resolved/hour) | Brynjolfsson et al. (Stanford/MIT) |
| Novice worker improvement | 34% | Brynjolfsson et al. (Stanford/MIT) |
| Cost per chat reduction | 70% | Vodafone case study |
| Routine inquiry handling | 80% of inquiries automated | IBM |
| Operational cost reduction | 23-30% | Multiple enterprise reports |
| Projected labor cost savings | $80B by 2026 | Gartner |
| ROI ratio | $3.50 returned per $1 spent | Industry composite |
| Time to positive ROI | 8-14 months | Multiple reports |
Why it works so well:
- High volume, repetitive, well-structured interactions
- Easy to measure (resolution time, cost per contact, customer satisfaction)
- Primarily SaaS configuration, not custom development
- AI handles Tier 1 queries, routes complex ones to humans
- Works best for: FAQ deflection, initial triage, agent assistance, sentiment routing
The honest caveat: The 34% gains for novices vs. minimal gains for experienced agents (Brynjolfsson) suggests AI acts more as a “knowledge equalizer” than a universal productivity booster. Organizations staffed entirely by senior agents may see smaller gains.
Configuration vs. custom: ~80-90% of value comes from configuring off-the-shelf platforms (Zendesk AI, Intercom Fin, Freshworks Freddy, Five9). Custom development adds value only for industry-specific knowledge bases or complex multi-system integrations.
2. Finance AI (Fraud Detection, Invoice Processing, Forecasting)
Evidence strength: HIGH for fraud detection and invoice processing; MODERATE for forecasting
| Metric | Result | Source |
|---|---|---|
| Fraud detection accuracy | >90% | Industry composite |
| JP Morgan fraud/ops savings | $50M/year | JP Morgan case study |
| Visa fraud prevention | $40B in prevented fraud (2023) | Visa |
| Invoice processing cost reduction | 80%+ (from $15-40 to under $5/invoice) | Industry benchmarks 2025 |
| Forecast accuracy improvement | 25-40% | Multiple enterprise reports |
| AP automation ROI | 420% annual (breakeven in 2 months) | Case study composite |
| Finance team productivity boost | 44-54% | McKinsey |
| Manual data work reduction | 20-30% (top-tier firms) | McKinsey |
Why it works so well:
- Fraud detection has immediate, measurable dollar impact
- Invoice processing is high-volume, rules-based, perfect for AI
- Financial data is structured, clean, and abundant
- ROI is directly quantifiable in dollar terms
The honest caveat: The $50M and $40B numbers from JP Morgan and Visa reflect massive scale enterprises. Mid-market companies will see proportionally smaller but still meaningful returns. The “44-54% productivity boost” from McKinsey is self-reported survey data, not experimentally verified.
Configuration vs. custom: Fraud detection is overwhelmingly SaaS (Featurespace, Feedzai, built-in bank platform features). Invoice processing is now largely AI-OCR SaaS (90%+ value from configuration). Financial forecasting is where custom development starts to matter – integrating company-specific data sources, custom models for unusual business patterns.
3. IT Operations AI (AIOps, Incident Response, Monitoring)
Evidence strength: MODERATE – strong operational metrics but limited independent studies
| Metric | Result | Source |
|---|---|---|
| Event noise reduction | ~99.2% | Enterprise benchmarks |
| MTTR reduction | 50-66% | Multiple case studies |
| Outage reduction | ~70% | Mature enterprise deployments |
| Hours saved per incident | 4.87 hours | SolarWinds 2025 |
| Monthly hours reclaimed | Up to 9,500 (global deployments) | Enterprise benchmarks |
| Cybersecurity ROI exceeding expectations | 44% of respondents | Deloitte 2026 |
| ROI payback period | 3-6 months | Industry estimates |
Why it works so well:
- Downtime costs are enormous ($10K+/hour), so even small improvements have massive ROI
- Alert fatigue is a real, measurable problem that AI directly addresses
- Pattern recognition across millions of log entries is genuinely superhuman
- Clear before/after metrics (MTTD, MTTR, uptime SLA compliance)
Configuration vs. custom: ~70% SaaS configuration (Datadog, PagerDuty, ServiceNow AIOps, Splunk). Custom development matters for correlating proprietary system data and building organization-specific runbooks.
Tier B: Moderate Evidence of Positive ROI
4. Supply Chain AI (Demand Planning, Inventory Optimization)
Evidence strength: MODERATE – strong theoretical basis, real results at scale, but high failure rate
| Metric | Result | Source |
|---|---|---|
| Forecast accuracy improvement | 20-50% | Multiple enterprise reports |
| Forecast error reduction | From 28.76% to 16.43% | Academic/ML benchmarks |
| Cost savings | 26-31% | McKinsey |
| Executives reporting 12-month ROI | 77% | Industry survey |
| GenAI initiatives struggling with sustained ROI | Up to 95% | Industry analysis |
Why it works so well (when it does):
- Demand forecasting is a mathematical problem with clear right/wrong answers
- Integrating external signals (weather, events, economic data) is genuinely valuable
- Small improvements in forecast accuracy translate to large inventory savings
- Established ML techniques (not just GenAI) with decades of refinement
The honest caveat: The 95% failure rate for sustained ROI is the critical number here. Success requires clean data, clear governance, and deliberate risk management. Most companies have none of these. The 77% reporting 12-month ROI may reflect survivorship bias – companies that failed abandoned the effort.
Configuration vs. custom: This is one area where custom development genuinely matters (~40-50% of value). Off-the-shelf tools (Blue Yonder, Kinaxis, o9 Solutions) provide the platform, but integrating company-specific data sources, supplier relationships, and demand signals requires significant custom work.
5. Sales AI (Forecasting, Lead Scoring, Pipeline Management)
Evidence strength: MODERATE – strong adoption data, ROI numbers mostly self-reported
| Metric | Result | Source |
|---|---|---|
| Revenue increase | 13-15% | Industry surveys |
| Sales cycle reduction | Up to 68% shorter | Industry reports |
| Forecast accuracy improvement | From 60-75% to 90-98% | Platform vendor data |
| Quota attainment improvement | Up to 30% | Vendor reports |
| ROI over 3 years | 299% average | Unified platform vendor data |
| Teams with AI seeing revenue growth | 83% vs 66% without | Industry survey |
| Time to positive ROI | 12-18 months | Industry consensus |
Why it works so well (when it does):
- CRM data is abundant and structured
- Lead scoring is a classic ML classification problem with clear feedback loops
- Forecast accuracy directly impacts planning, hiring, and cash management
- Sales teams are highly measurable by nature
The honest caveat: The “90-98% accuracy” claims come from platform vendors. Only 7% of sales organizations actually achieve 90%+ forecast accuracy. The gap between vendor claims and enterprise reality is enormous. Also, 69% of sales ops leaders say forecasting is getting harder, suggesting AI has not yet solved the core problem for most.
Configuration vs. custom: ~85% SaaS configuration (Salesforce Einstein, Gong, Clari, HubSpot). Custom development mainly for: proprietary data integration, industry-specific scoring models, and multi-system orchestration.
6. Marketing AI (Content, Personalization, Campaign Optimization)
Evidence strength: MODERATE – strong adoption, ROI data mostly from ecommerce/digital
| Metric | Result | Source |
|---|---|---|
| Average ROI | ~300% (marketing teams using AI in 2025) | Industry surveys |
| Campaign launch speed | 75% faster | Multiple reports |
| Ad spend ROI improvement | 30% higher | Industry benchmarks |
| Personalized email transaction rates | 6x higher | Ecommerce data |
| Revenue increase from personalization | Up to 41% | Ecommerce studies |
| Time savings per employee | ~11.4 hours/week | McKinsey |
| Executives who can measure ROI confidently | Only 29% | Industry survey |
Why it works so well (in specific contexts):
- Content generation is a natural fit for LLMs
- A/B testing provides clear feedback loops
- Personalization at scale was impossible without AI
- Digital marketing has excellent attribution data
The honest caveat: The 300% average ROI figure is self-reported and likely inflated by selection bias. The most telling statistic: only 29% of executives say they can confidently measure AI ROI in marketing. If you cannot measure it, you cannot claim it. Much of marketing AI ROI may be phantom gains from attribution modeling changes rather than actual incremental value.
Configuration vs. custom: ~90% SaaS configuration (HubSpot, Marketo, Jasper, Copy.ai, Persado). Custom development rarely justified unless dealing with highly regulated industries or proprietary data advantages.
Tier C: Limited or Mixed Evidence of ROI
7. HR AI (Recruiting, Screening, Workforce Planning)
Evidence strength: LOW-MODERATE – strong efficiency metrics, significant bias/legal risks
| Metric | Result | Source |
|---|---|---|
| AI adoption in recruiting | 43% (up from 26% YoY) | Industry survey |
| Cost-per-hire reduction | 20-40% | Multiple reports |
| Time-to-hire reduction | 30-50% | Multiple reports |
| Enterprise annual savings | Average $2.3M | Industry estimate |
| ROI timeline | 8-18 months | Industry consensus |
| Bias reduction (when properly implemented) | 56-61% | Research studies |
| Organizations with ongoing bias challenges | 67% | Industry survey |
| Enterprises planning AI-run hiring by 2026 | 33% | HR Dive |
Why gains are real but risky:
- Resume screening is high-volume and repetitive (good for AI)
- Scheduling automation eliminates enormous coordination overhead
- Candidate sourcing across platforms benefits from AI aggregation
The honest caveat: HR AI carries the highest regulatory and reputational risk of any function. New York City already requires annual bias audits for automated hiring tools. California finalized AI hiring regulations in October 2025. Colorado AI Act takes effect June 2026. People mirror AI hiring biases ~90% of the time even when aware of them (University of Washington, 2025). The cost savings are real but the legal liability is growing faster than the savings.
Configuration vs. custom: ~85% SaaS (HireVue, Eightfold, Phenom, Greenhouse AI features). Custom development needed mainly for bias testing/compliance frameworks specific to organizational demographics.
8. Software Engineering AI (Coding Assistants, Code Review)
Evidence strength: CONTRADICTORY – strong vendor claims vs. sobering independent studies
| Metric | Result | Source |
|---|---|---|
| Task speed-up (vendor claim) | Up to 55% | GitHub internal study |
| Average time saved per week | 3.6 hours | Industry analytics (135K developers) |
| Code suggestion acceptance rate | ~30% | Industry data |
| AI-generated code quality issues | 1.7x more issues than human code | CodeRabbit analysis |
| Security weaknesses in Python output | 29.1% contain vulnerabilities | Independent analysis |
| Developer trust in AI output | Only 33% trust it | Industry survey |
| METR RCT: experienced dev speed change | 19% SLOWER | METR 2025 (peer-reviewed) |
| Developer perception vs. reality gap | Believed 20% faster; actually 19% slower | METR 2025 |
| Enterprises achieving profitability from AI tools | Less than 47% | Industry survey |
The contradiction explained:
- AI coding tools help with unfamiliar codebases and boilerplate (aligned with BCG “jagged frontier” – tasks within AI capabilities see big gains)
- AI coding tools hurt experienced developers on familiar codebases (METR study)
- The 55% speed-up is real for certain task types but does not translate to 55% overall productivity gain
- 70% of AI suggestions are rejected, suggesting the “productivity” framing overstates actual impact
- Quality concerns (1.7x more issues, 29% security weaknesses) create downstream costs not captured in speed metrics
Configuration vs. custom: Nearly 100% SaaS/configuration (GitHub Copilot, Cursor, Windsurf, Tabnine). Custom fine-tuning on proprietary codebases is emerging but adds cost with uncertain returns.
9. Legal AI (Contract Review, Legal Research, Document Analysis)
Evidence strength: LOW-MODERATE – growing adoption, limited rigorous ROI data
| Metric | Result | Source |
|---|---|---|
| Time savings | 6-20% per week for 62% of professionals | Wolters Kluwer |
| Revenue gains attributed to AI | 6-20% for ~50% of professionals | Industry survey |
| Contract review speed | 60-80% faster | Vendor platforms |
| Time saved per contract draft | 1-2 hours | Top US law firm case study |
| Documents processed increase | 4x per week | Top US law firm case study |
| Organizations with better CLM integration ROI | 40-60% better ROI | Industry analysis |
| Executives reporting EBITDA lift from AI | Only 15% | Forrester 2026 |
Why it is slow to prove ROI:
- Legal work requires high accuracy – errors have serious consequences
- Attorney time is expensive, so even modest time savings are meaningful in dollar terms
- But: the adversarial nature of legal work means AI errors are actively exploited by opposing counsel
- Regulatory caution limits aggressive deployment
The honest caveat: The 60-80% faster contract review numbers come from vendor platforms measuring specific tasks, not end-to-end legal processes. Total time savings of 6-20% per week is more realistic. Legal AI remains largely in “augmentation” mode – speeding up research and first drafts, not replacing judgment calls.
Configuration vs. custom: ~70% SaaS (Harvey, Casetext/CoCounsel, Spellbook, Luminance). Custom development matters for: firm-specific clause libraries, jurisdiction-specific training, integration with proprietary document management systems.
Configuration vs. Custom Development: The Honest Breakdown
The Big Picture
The “build your own AI” wave of 2024 has largely failed. As Gartner notes, enterprises tried to build their own AI, ran proofs of concept, hired ML engineers, experimented with custom models – and most of it failed. In 2025-2026, CIOs are shifting decisively to commercial off-the-shelf solutions.
Where Does the ROI Actually Come From?
| Approach | Est. Share of Enterprise AI Spending | Est. Share of Enterprise AI ROI |
|---|---|---|
| Configuring existing SaaS with AI features | ~45-55% | ~60-70% |
| Integrating/connecting AI-enabled tools | ~25-30% | ~15-20% |
| Custom AI model development | ~15-25% | ~10-15% |
Note: These are estimated ranges synthesized from Gartner, Deloitte, and McKinsey data. No single source provides this exact breakdown. The directional message is clear: most ROI comes from configuration, not custom development.
Where Custom Development Actually Matters
Custom AI development delivers outsized ROI in a narrow set of conditions:
- Proprietary data advantage: When your organization has data no one else has (e.g., JP Morgan’s transaction graph for fraud detection)
- Regulatory moat: When industry-specific compliance requires models trained on domain-specific data
- Supply chain specificity: When demand signals are unique to your business (e.g., a retailer combining weather, local events, and store-level inventory data)
- Integration orchestration: When the value comes from connecting 5+ systems in ways no single vendor supports
- Competitive differentiation: When AI IS the product, not a tool supporting the product
Where Custom Development Is a Waste of Money
- Building a chatbot from scratch when Zendesk/Intercom/Freshworks can be configured in weeks
- Training custom LLMs when prompt engineering with commercial APIs achieves 90% of the result
- Building proprietary document processing when AI-OCR SaaS has reached 98-99% accuracy
- Custom lead scoring models when CRM-native AI features exist
- Any project where the primary goal is “we want to own our AI” rather than solving a specific business problem
The 39% Tax
Gartner data shows that 39% of developer time is spent designing, building, and testing custom integrations. For AI projects specifically, 95% of IT leaders cite integration as a challenge to seamless AI implementation. This “integration tax” is the hidden cost that destroys ROI projections for custom AI development.
The AI That Nobody Talks About
The highest-ROI AI applications are almost universally the most boring ones. Here is what actually saves money.
Invoice and Accounts Payable Processing
- Per-invoice cost reduced from $15-40 to under $5 (80%+ reduction)
- AI-OCR accuracy: 98-99%
- Processing speed: 90% faster
- Invoice processing per FTE: up to 400% increase
- Market growing from $2.8B (2024) to projected $47.1B (2034)
- Breakeven: 2 months. Annual ROI: 420%.
- This is possibly the single highest-ROI AI application in enterprise, and nobody writes breathless blog posts about it.
Email Routing and Ticket Classification
- Automatically categorizes, prioritizes, and routes incoming communications
- Reduces misrouting by 60-80%
- Saves 5-15 minutes per ticket in manual triage time
- At scale (10,000+ tickets/month), this saves 800-2,500 hours/month
- Entirely SaaS configuration – zero custom development needed
Data Entry and Form Processing
- AI reads forms, extracts structured data, populates systems
- 99.5% field-level accuracy (better than human data entry)
- Eliminates double-entry across systems
- Typical savings: 10-15 minutes per form x thousands of forms/month
RPA (Robotic Process Automation) + AI
- 100-200% ROI in first 12 months
- Operational cost reduction: 30-80%
- Global work hours saved: estimated 2.2 billion annually
- Cargill: automated 70% of order processing, saving $15M/year
- NHS Newcastle: 7,000 hours/year saved automating HR workflows
- ROI payback: 6-9 months
- Market growing from $3.79B (2024) to projected $30.85B (2030)
Alert Deduplication and Noise Reduction
- IT operations teams drown in alerts – AIOps reduces noise by 99.2%
- This is not glamorous. It is a goldmine.
- Every false alert that wakes up an on-call engineer at 3am costs morale, productivity, and retention
Document Classification and Compliance Tagging
- Automatically tags documents for retention policies
- Routes compliance-sensitive materials to appropriate reviewers
- Reduces compliance audit prep time by 40-60%
- Zero controversy because it replaces work nobody wanted to do
The Common Thread
These applications share characteristics that explain their success:
- High volume, low complexity: Thousands of identical decisions per day
- Clear right/wrong answers: Easy to measure accuracy
- Structured or semi-structured data: AI does not need to “reason,” just classify
- Zero emotional attachment: Nobody’s identity is tied to routing emails
- Immediate, measurable savings: Dollar-for-dollar comparison with human labor cost
- Low risk of catastrophic failure: A misrouted email is annoying, not lawsuit-inducing
Key Academic Studies Reference Table
| Study | Year | N | Design | Domain | Key Finding | Published In |
|---|---|---|---|---|---|---|
| Brynjolfsson, Li, Raymond | 2023-24 | 5,179 agents | Staggered RCT | Customer support | +14% productivity; +34% for novices | QJE (2025) |
| Dell’Acqua et al. (BCG/Harvard) | 2023 | 758 consultants | Pre-registered RCT | Consulting | +25% speed, +40% quality inside frontier; -19pp outside | Organization Science (2025) |
| Noy & Zhang (MIT) | 2023 | 453 professionals | Randomized experiment | Writing tasks | -40% time, +18% quality | Science (2023) |
| METR | 2025 | 16 developers, 246 tasks | Randomized crossover | Software development | 19% slower with AI (contradicts beliefs) | arXiv preprint |
| Microsoft New Future of Work | 2025 | Meta-analysis | Literature review | Multiple | Mixed results across domains | Microsoft Research |
What This Means for Enterprise Decision-Makers
The Three Rules of AI ROI
Rule 1: Start with the boring stuff. Invoice processing, email routing, ticket classification, and data entry will deliver faster ROI with lower risk than any “transformative AI strategy.” These are the foundation.
Rule 2: Configure before you build. For every dollar spent on custom AI development, ask: “Can we get 80% of this value by configuring an existing SaaS tool?” The answer is usually yes. The Gartner data is clear: most custom AI projects from 2024 failed. Buy, configure, integrate.
Rule 3: Measure function-by-function, not enterprise-wide. “What is our AI ROI?” is the wrong question. “What is our AI ROI for invoice processing?” is the right question. The McKinsey data shows that even high performers see wildly different returns across functions. Aggregate ROI metrics are meaningless.
The Uncomfortable Truths
-
Most AI ROI claims are marketing. Vendor-funded Forrester TEI studies, internal GitHub studies, and platform vendor ROI calculators are not independent evidence. Plan accordingly.
-
The METR study should worry everyone selling AI coding tools. Experienced developers on familiar codebases got 19% slower – and believed they were 20% faster. If perception and reality can diverge that much, how many other “productivity gains” are phantoms?
-
High performers invest 3x more in process redesign than software. The McKinsey data is unambiguous: buying AI tools without redesigning workflows produces minimal returns. The tool is 25% of the value; the workflow redesign is 75%.
-
HR AI is a legal time bomb. The efficiency gains are real but the regulatory landscape is moving faster than the technology. New York, California, Colorado, and the EU are all tightening rules. A bias lawsuit can erase years of hiring cost savings in a single settlement.
-
2-4 years to ROI is the honest timeline. Deloitte reports that most organizations achieve satisfactory AI ROI in 2-4 years, 3-4x longer than conventional technology deployments. Anyone promising 90-day transformation is selling something.
Sources
Academic / Peer-Reviewed
- Brynjolfsson, Li, Raymond. “Generative AI at Work.” Quarterly Journal of Economics (2025). Stanford SIEPR
- Dell’Acqua et al. “Navigating the Jagged Technological Frontier.” Organization Science (2025). Harvard Business School
- Noy & Zhang. “Experimental evidence on the productivity effects of generative artificial intelligence.” Science (2023). Science
- METR. “Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity.” (2025). METR
Consulting Firm Reports
- McKinsey. “The State of AI in 2025.” McKinsey
- BCG. “From Potential to Profit: Closing the AI Impact Gap.” (January 2026). BCG
- Deloitte. “State of AI in the Enterprise 2026.” Deloitte
Analyst Firms
- Gartner. “Worldwide AI Spending Will Total $2.5 Trillion in 2026.” Gartner
- Gartner. “40% of Enterprise Apps Will Feature AI Agents by 2026.” Gartner
- Forrester. TEI Methodology. Forrester
- Forrester. TEI of Microsoft 365 Copilot. Forrester/Microsoft
Industry Reports and Data
- Menlo Ventures. “2025: The State of Generative AI in the Enterprise.” Menlo Ventures
- Wolters Kluwer. “Legal AI Adoption: Time Savings, Contract Review, Revenue Growth.” Wolters Kluwer
- Master of Code. “AI ROI: Why Only 5% of Enterprises See Real Returns in 2026.” Master of Code
- PwC. “2026 AI Business Predictions.” PwC
Created by Brandon Sneider | brandon@brandonsneider.com March 2026