Executive Summary

  • The corpus (300+ documents) converges on a single finding: AI tool access and AI value creation are two different problems. 78-88% of organizations have deployed AI somewhere; 5-6% are generating substantial financial returns. The gap is not a technology problem.
  • The “5-6% figure” has two independent sources with different methodologies that happen to converge: BCG’s 41-dimension maturity scoring against Capital IQ TSR data (5% “future-built”), and McKinsey’s self-reported >5% EBIT impact threshold (6%). The convergence across methodologies strengthens the claim.
  • In finance, AI is generating documented returns in fraud detection, compliance monitoring, and wealth management research synthesis — not in trading, where the infrastructure complexity and regulatory environment are prohibitive for most firms.
  • The failure pathways are consistent and predictable: pilot theater (proving demos rather than redesigning workflows), tool deployment without process redesign, and inadequate data foundation. All three appear in the corpus repeatedly across independent studies.
  • Team collaboration at AI speed is unsolved at scale. The NBER Copilot RCT (n=7,137) found zero change in meeting attendance, document output, or task composition despite email time savings — meaning AI is making individuals faster at the wrong tasks while the coordination bottleneck persists.
  • Legacy data solutions are available (RAG, vector search, synthetic data augmentation, Glean for enterprise search) but require an honest data audit first. Most companies skip the audit and discover the problem after the pilot fails.
  • Employee anxiety follows a four-archetype pattern (Visionaries, Disruptors, Endangered, Complacent), and the evidence-based interventions work in sequence: training before access, manager certification, peer ambassador networks, outcome tracking. Forced adoption without this sequence triggers the sabotage patterns the corpus flags.

Q1: What Are the Core Enterprise AI Adoption Insights From the Corpus?

Three findings cut across every study in the knowledge base:

The adoption/value gap is not closing. 78% organizational AI usage (Stanford HAI 2025, McKinsey n=1,491) versus 5-6% generating substantial financial returns (BCG n=1,250; McKinsey n=1,993, November 2025). BCG’s latest AI Radar (n=1,803 C-suite, Jan 2025) put it starkly: 75% rank AI top-3 strategic priority; only 25% see significant value from AI initiatives. The gap has existed for three years and is not narrowing.

The constraint is workflow redesign, not technology. The NBER Copilot RCT (66 firms, n=7,137, 2025) is the clearest data point: Microsoft Copilot saves 1.4-2 hours per week on email. Zero change in meeting attendance, document output, or task composition. The tool worked. The work didn’t change. Organizations that redesign workflows around AI outputs — not just give people access — are the ones capturing value.

Future-built firms are pulling away. BCG’s 41-dimension maturity analysis (n=1,250, Capital IQ TSR validation, September 2025) found: 5% of firms qualify as “future-built.” They generate 1.7x more revenue growth, 3.6x higher 3-year TSR, 2.7x better ROIC than laggards. They spend 26% more on IT, 64% more of that on AI, and reinvest AI returns into stronger capabilities. The gap is compounding.


Q2: Where Does the 5-6% Figure Come From?

Two independent sources with different methodologies converge:

BCG “Build for the Future” / “Widening AI Value Gap” (September 2025, n=1,250 senior executives):

  • Methodology: 41-dimension maturity scoring across technology, talent, operating model, and strategy dimensions
  • Financial validation: BCG mapped company scores against Capital IQ TSR data — external validation, not self-reported
  • Result: 5% qualify as “future-built” based on combined score threshold
  • These firms generate 3.6x TSR vs. laggards over 3 years
  • BCG conflict: BCG sells AI consulting. Flag this when citing. The Capital IQ cross-validation partially mitigates it.

McKinsey “State of AI 2025” (November 2025, n=1,993 respondents, 105 nations):

  • Methodology: Self-reported survey. Companies self-classified as generating >5% EBIT impact from AI
  • Result: 6% qualify as high performers by this threshold
  • McKinsey conflict: Same advisory conflict as BCG. Self-reported EBIT impact is the weakest measurement methodology.
  • The 6% figure cites Tier 2 evidence (Q3 2025 results from a survey window) — note model generation in all uses.

Why the convergence matters: BCG used external financial data and a multi-dimensional maturity score. McKinsey used self-reported financials. Different methodologies, different samples (1,250 vs. 1,993), different dates. Both landed at 5-6%. This convergence across independent surveys is meaningful — it does not appear to be noise.

The honest credibility note: Both firms have advisory conflicts. Neither study is an RCT. The BCG Capital IQ validation is the strongest piece of evidence in the corpus for this specific claim. Use both, flag both.


Q3: In Finance, Where Is AI Making the Most Difference?

The corpus has documented returns in four areas of financial services. The evidence quality varies significantly:

Application Documented Case Measured ROI Evidence Quality
Fraud detection JPMorgan $1.5B in fraud prevented annually Tier 2 — company reported, not independent
Compliance monitoring JPMorgan, financial services sector broadly 30-50% compliance violation reduction cited Medium — industry survey, not RCT
Wealth management research Morgan Stanley (OpenAI partnership) Advisor research time reduction; specific figures not independently verified Low — vendor case study
Document/contract review Balyasny Asset Management (n=600 employees) Internal efficiency gains — specifics not disclosed Low — vendor case study

Where AI is NOT generating documented returns in finance:

  • Algorithmic trading at mid-market scale: Infrastructure requirements (millisecond-level data feeds, co-location, regulatory approval) are prohibitive outside of hedge funds and major banks. No mid-market applicability.
  • Robo-advisory: Concentrated at retail scale (Betterment, Wealthfront). Not a mid-market B2B opportunity.
  • Credit underwriting: Regulatory scrutiny (ECOA, FCRA, state fair lending) makes AI-assisted credit decisions legally complex for companies without dedicated legal/compliance resources.

The gap in the corpus: No RCT evidence on AI in financial services. The best evidence is vendor case studies (Morgan Stanley/OpenAI) and industry surveys. This is a research gap — add to queue.


Q4: What Are the Known Failure Pathways?

The corpus identifies five consistent failure modes across studies. Three are dominant:

Failure Mode 1: Pilot Theater BCG finds only 5% of organizations capture substantial financial returns from AI (n=10,600, 2025). The common mechanism behind the gap: pilots are designed to demonstrate technology capability rather than test business process redesign. A successful demo of document summarization is not evidence of workflow value — it is evidence that the model can summarize documents. The pilot succeeds; the business outcome never materializes.

Failure Mode 2: Tool Without Process The NBER Copilot RCT is the definitive evidence: tool access without workflow redesign produces zero change in organizational output. Email time savings do not automatically transfer to more productive work — they transfer to more email. The BCG 10-20-70 budget rule (10% algorithms, 20% technology, 70% people and process) captures the right investment ratio. Most companies invert this.

Failure Mode 3: Data Foundation Skipped The corpus repeatedly cites a “Year Zero” investment (data readiness, process mapping, security baseline) that companies skip to get to production faster. The result: 2-4 weeks productivity dip per team during transition, integration costs running 2.4x initial estimates, and governance costs running 2x estimates (AlterSquare, 20+ client projects, 2026). The $500K difference between a well-executed and poorly-executed mid-market AI program (3-year, 500-person company) is attributable entirely to Year Zero planning.

Failure Mode 4: Autonomous Agent Deployment Without Guardrails The corpus has multiple documented cases of runaway agent costs: $2,400 overnight API bills from agent loops. METR’s RCT (n=16 experienced developers, July 2025) found developers using AI agents on complex open-ended tasks were 19% slower than without AI — the model generation tested matters (Claude 3.5/3.7 and GPT-4o). Agentic AI deployment is at 34-67% merge rates for code generation; the variance is so high that production deployment without careful measurement is reckless.

Failure Mode 5: Code Complexity Accumulation Carnegie Mellon’s study found a 40.7% code complexity increase in AI-assisted development (measured by cyclomatic complexity). This is a leading indicator: code review backlogs grow, deployment velocity slows, technical debt accumulates. The Faros data (GitHub Copilot enterprise deployments) found 98% more PRs but zero improvement in delivery metrics — the coding bottleneck moved to the review bottleneck, and organizations that didn’t redesign the review process got no net benefit.


Q5: Has Anyone Solved Cross-Team Collaboration When Everyone Is AI-Powered?

Short answer: no. The corpus has partial evidence but no solved case.

What the evidence shows:

The NBER Copilot RCT found zero change in meeting attendance despite 2 hours/week email savings — individuals got faster at email; meeting behavior (the primary coordination mechanism) did not change. The tool did not touch the coordination layer.

The Faros data showed a coordination problem emerging from AI productivity: individual developers shipped 98% more code, review queues expanded, and delivery throughput did not improve. The bottleneck moved from writing to reviewing. Companies that redesigned the review process (Monday.com: 800+ issues/month caught in CI/CD) captured the benefit. Companies that did not saw the speed evaporate.

What’s missing in the corpus:

  • No RCT evidence on AI-assisted project management tools (Microsoft Copilot for Teams, Asana AI, etc.) at organizational scale
  • No longitudinal data past 12 months — we do not know if cross-team coordination improves after teams develop new AI-native working norms
  • The BCG/MIT Sloan agentic AI study (n=2,102, 21 industries) found 47% deploying agentic AI without coordination strategy — this is a leading indicator of future coordination failure, not a solved case

The honest answer for a C-suite audience: AI is solving individual productivity. No one has published evidence of it solving team coordination at scale. The companies that are capturing value are doing so by identifying where the bottleneck moved and redesigning around it — not by letting AI coordination emerge naturally.


Q6: Who Is Solving Legacy Data Gaps, and How?

The corpus covers four approaches, with very different cost profiles:

Approach 1: RAG (Retrieval-Augmented Generation)

  • Connects existing documents/databases to a language model at query time without moving or restructuring data
  • Cost: Implementation $100K-$300K at mid-market scale; does not require data cleaning first
  • Limitation: Quality of answers is directly limited by quality of underlying documents. Messy legacy data produces messy RAG outputs.
  • Best for: Knowledge management, policy lookup, internal search on existing document stores
  • Pgvector (open source, runs on PostgreSQL) is the default recommendation for companies already on Postgres; Glean is the enterprise-grade alternative for unstructured document search

Approach 2: Synthetic Data Augmentation

  • Generates synthetic training examples to fill gaps in sparse datasets
  • Relevant when: ML models need training data that doesn’t exist or is protected (PII, HIPAA)
  • Cost: Moderate; requires data science resources
  • Limitation: Synthetic data amplifies existing biases if not carefully audited

Approach 3: Data Contracts + API Abstraction Layer

  • Defines formal schemas for data exchange between legacy systems and AI applications; isolates AI from legacy volatility
  • Cost: $50K-$150K for initial implementation; ongoing governance overhead
  • This is the “right” architectural answer but requires buy-in from IT and engineering leadership

Approach 4: Vendor-Managed Migration (Salesforce, ServiceNow, SAP AI integrations)

  • Hyperscalers embed AI directly in existing business applications; data stays in vendor ecosystem
  • Cost: Bundled into existing enterprise contracts, but consumption pricing surprises are common
  • Limitation: Creates vendor lock-in; does not solve the underlying data quality problem

The honest gap: No mid-market case studies in the corpus for companies that started with genuinely poor legacy data (paper-based records, inconsistent schemas across acquired companies) and successfully operationalized AI within 18 months. This is a real research gap.


Q7: How Are the Best Companies Managing Employee Anxiety and Preventing Adoption Sabotage?

The corpus has the most complete picture here, primarily from the change management research:

The Four Employee Archetypes (framework for targeting interventions):

Archetype Size Profile Risk Intervention
Visionaries ~15% Excited, already experimenting Low Give them formal ambassador role; channel enthusiasm
Disruptors ~20% Skilled, resistant, testing limits Medium-High Direct engagement; address specific concerns with data
Endangered ~25% Genuinely threatened (task overlap with AI) High Proactive reskilling; role redesign; concrete path forward
Complacent ~40% Waiting to see; minimal engagement Low-Medium Manager-led peer normalization; low-pressure onboarding

Proven Interventions (with evidence):

  • Training before access (Colgate model): Build AI literacy before giving tool access. Reduces “I can’t use this” anxiety; increases first-use confidence. The corpus does not have an RCT on this — it appears as best practice in multiple practitioner accounts.
  • Manager certification (mandatory): Managers must complete AI training before their teams get access. Removes the “my boss doesn’t use it, why should I” rationalization. Citi’s rollout (70% adoption, 182K employees) relied on this.
  • Peer ambassador networks: Citi deployed peer champions rather than top-down mandates. The corpus cites this as the primary mechanism for their adoption rate. No control group; cannot attribute causality.
  • PURE framework (DBS Bank): Purposeful, Unsurprising, Respectable, Explainable — four criteria for every AI deployment decision. Applied to employee-facing AI decisions specifically to reduce “why is AI watching me” anxiety. DBS is a large bank; mid-market applicability is partial.
  • IKEA reskilling case: Redeployed 8,500 customer service workers from order-taking to design advice and complaint resolution after AI handled routine queries. Resulted in EUR 1.3B additional revenue from the new service model. This is the strongest “reskilling creates value, not just preservation” case in the corpus — but it is a $40B+ revenue company, not mid-market.

What the corpus does not have:

  • RCT evidence on any of the above interventions — these are practitioner accounts and case studies, not controlled studies
  • Longitudinal data past 12 months — we don’t know if adoption sticks or backslides after the initial change management investment
  • Mid-market case studies for 200-500 person companies — the IKEA and Citi cases involve structural resources (dedicated training teams, large L&D budgets) not available at mid-market scale
  • Data on IgniteTech (mentioned as a case in the field) — not in corpus; strong research queue candidate given its potential as a software-industry forcing function

The sabotage pattern: The corpus notes passive resistance (minimal tool use, workarounds) more than active sabotage. The mechanism is typically: employee believes AI threatens their role → uses tool minimally to demonstrate compliance → reports no productivity gains → uses reported failures as evidence to stop the program. The intervention that works is addressing the threat perception directly — role redesign conversations before deployment, not after resistance appears.


Key Research Gaps Identified

Gap Why It Matters Audience
RCT evidence on change management interventions All current evidence is practitioner accounts; cannot attribute causality CHRO, COO
Longitudinal AI adoption studies (18+ months) Don’t know if gains persist; don’t know backslide rate CEO, CFO
Mid-market case studies (200-500 employees) IKEA, Citi, JPMorgan cases not applicable at this scale All
Finance AI applications beyond fraud/compliance Board finance committees want sector-specific evidence CFO
IgniteTech / software-industry forcing function data Potential mid-market forcing mechanism CEO
Cross-team coordination RCT No evidence that AI solves team coordination, only individual productivity COO, CHRO

What This Means for Your Organization

The seven questions above map to three decision points every mid-market executive eventually reaches.

First: whether the 5-6% figure applies to them. It probably does — but the question is which side of it they are on. The difference between future-built and laggard is not industry, size, or technology budget. It is whether the leadership team treats AI deployment as a workflow redesign project (requiring people, process, and data investment) rather than a software procurement project (requiring a license purchase and a training webinar).

Second: whether their failure mode is predictable. It usually is. The pilot theater pattern, the tool-without-process pattern, and the skipped Year Zero pattern appear in studies of independent populations across three years of research. They are not bad luck — they are predictable given specific setup conditions. A pre-mortem that asks “which of these five failure modes are we currently on track for?” is a more useful board conversation than a status update on AI adoption rates.

Third: whether their employees are positioned to adopt or positioned to resist. The archetype framework is useful not as a classification exercise but as a resource allocation guide: the 25% who are genuinely endangered by their current role need concrete path redesign conversations before deployment, not after resistance appears. The cost of addressing this proactively is lower than the cost of a stalled deployment.

If these questions are live in your organization and you want to map them against what the evidence actually shows at your scale and sector — brandon@brandonsneider.com.


Sources

This analysis synthesizes findings from the following corpus documents:

  • BCG “Build for the Future / Widening AI Value Gap” (n=1,250, September 2025)
  • BCG AI Radar (n=1,803 C-suite, January 2025)
  • McKinsey “State of AI 2025” (n=1,993, November 2025)
  • NBER Dillon et al. “Shifting Work Patterns” RCT (66 firms, n=7,137, 2025)
  • NBER Jiang et al. “AI and the Extended Workday” (ATUS n=123,603, 2025)
  • METR RCT (n=16 experienced developers, July 2025)
  • Carnegie Mellon code complexity study (2025)
  • Faros GitHub Copilot enterprise deployment analysis (2025)
  • AlterSquare true cost analysis (20+ client projects, 2026)
  • Citi AI adoption case (182K employees, 2025)
  • IKEA reskilling case (8,500 workers, EUR 1.3B revenue)
  • DBS Bank PURE framework (2025)
  • Stanford HAI AI Index 2025 (April 2025)

Brandon Sneider | brandon@brandonsneider.com April 2026