Executive Summary
- The corpus proves workflow selection matters more than tool selection, but provides almost no direct guidance on how to do it. McKinsey’s regression of 25 attributes (n=1,993) shows workflow redesign is the single strongest predictor of EBIT impact from AI — yet only 21% of organizations have done it. The constraint is not knowledge that it matters. It is methodology for deciding where to start.
- A five-criterion diagnostic — process standardization, handoff complexity, data coherence, decision-point density, and workflow debt load — predicts whether AI overlaid on a workflow will produce incremental gains (score 0–5) or automate existing dysfunction (score 6–15). Organizations applying this diagnostic before deployment classify 2–4 workflows as rebuild candidates and 6–10 as overlay candidates in Year 1. Getting this classification right before signing vendor contracts is the highest-leverage decision the AI program makes.
- ROI definition before deployment is equally neglected and equally decisive. AI projects with pre-approval financial success metrics achieve a 54% success rate versus 12% without. The metric shift matters too: the 2026 measurement standard is direct P&L impact, not productivity. Futurum’s 1H 2026 survey (n=830) documents financial ROI (revenue + profitability) as the primary metric nearly doubling year-over-year to 21.7%, while productivity fell 5.8 points as the leading indicator.
- The value concentration data tells you where to look first. BCG’s 2025 analysis (n=1,250) finds 70% of total enterprise AI value concentrates in core business functions: R&D/innovation (15%), digital marketing and customer journey (17%), sales (7%), manufacturing (9%), digital supply chain (6%), and maintenance (6%). Support functions — HR, finance, legal, procurement — account for the remaining 30%. The first workflow candidates are in core, not support.
- Three pre-deployment ROI parameters define a defensible business case: baseline measurement (what does this process cost today, per unit, per cycle), counterfactual definition (what would have happened without AI over the same period), and time horizon (what is the latest acceptable date for first P&L impact, with a defined kill point if it is not reached). Without all three, the business case is not a business case — it is a forecast dressed up as one.
Why This Question Is the Hardest One in Enterprise AI
Every credible AI study in the 2025–2026 corpus lands at the same place: the value is in workflow redesign, not in tool deployment. BCG puts the split at 10/20/70 — 10% of value from algorithms, 20% from technology and data, 70% from process change. McKinsey’s regression confirms it empirically: across 25 organizational attributes, workflow redesign has the largest effect on EBIT impact, larger than governance, talent, technology, or any other factor tested.
The frustrating reality is that every study stops there. They prove the thesis. They do not build the method.
The question Brandon’s workshop audiences ask most consistently is not “should we redesign workflows?” — they have read the McKinsey deck. The question is “which workflows, in what order, defined how?” This document answers that.
Part 1: Workflow Selection — Where to Look and How to Score
Where the Value Is Concentrated
BCG’s industry-workflow analysis (n=~900 for workflows, September 2025) provides the first empirical answer to the “where” question. Across industries, 70% of total AI value concentrates in core business functions, not support functions.
Value distribution by function (BCG, 2025):
| Function | Share of total AI value potential | Movement vs. 2024 |
|---|---|---|
| R&D and innovation | 15% | Stable |
| Digital marketing | 9% | +8pp (sales/marketing category shift) |
| Consumer journey | 8% | — |
| Pricing | 5% | — |
| Sales | 7% | — |
| Core customer service | 5% | — |
| Manufacturing | 9% | -7pp (operations category shift) |
| Maintenance | 6% | — |
| Digital supply chain | 6% | — |
| Total core | 70% | +8pp |
| IT | 13% | +6pp |
| Customer support | 4% | — |
| HR, Finance, Legal, Procurement | 13% | -8pp combined |
The implication for workflow selection is direct: the first candidates are in revenue-generating and production functions, not in the administrative functions where AI pilots typically start because they are low-risk. Starting with legal document review or HR scheduling is not wrong — but it competes for the 30% of potential value. Starting with demand forecasting, marketing campaign orchestration, or production quality detection competes for the 70%.
Industry-level empirics: BCG’s Exhibit 4 documents specific workflow adoption rates and average self-reported impact by industry. These are the most actionable numbers in the corpus for initial workflow prioritization:
| Industry | Workflow | Scaled or Deployed (%) | Average Impact | KPI |
|---|---|---|---|---|
| Insurance | Claims validation / fraud detection | 50% | 25–32% cost savings | Cost |
| Energy | Infrastructure monitoring / predictive maintenance | 45% | 23–30% | Cost savings |
| Insurance | Underwriting optimization | 39% | 25–41% | Revenue |
| Travel/infrastructure | Dynamic pricing optimization | 31% | 17–29% | Revenue |
| Industrial goods | Robotics in production | 29% | 18–25% | Cost savings |
| Consumer | Demand forecasting / inventory optimization | 23% | 16–23% | Cost savings |
| Technology/Telecom | Product ideation / development insights | 20% | 20–33% | Revenue |
Source credibility: MEDIUM. Self-reported impact by executives at BCG-surveyed companies. No independent verification. Directionally useful; treat specific percentages as order-of-magnitude.
The pattern: highest-adoption, highest-impact workflows are data-intensive, high-frequency, and have a clear cost or revenue KPI. Lowest-adoption workflows tend to be judgment-heavy with diffuse outcomes.
The Five-Criterion Diagnostic
Once the candidate function is identified (core first, support second), the next question is which specific workflows within that function should be tackled first. The overlay-vs-rebuild framework in the companion research (ai-workflow-redesign-overlay-vs-rebuild.md) provides the operative scoring tool.
The five criteria are drawn from Bain’s workflow debt analysis (2025), Deloitte’s human-AI interaction design framework (2026), and HBS/Microsoft Frontier Firm research on structural AI adoption failures (March 2026):
| Criterion | Overlay signal (score 0–1) | Rebuild signal (score 2–3) |
|---|---|---|
| Process standardization | Runs the same way every time; written procedures match actual practice | “It depends who does it.” Significant variation between individuals, shifts, or locations |
| Handoff complexity | 1–2 handoffs; clear ownership; information transfers cleanly | 4+ handoffs; information degrades between steps; multiple systems require manual re-entry |
| Data coherence | Inputs arrive in consistent formats from reliable sources; single system of record | Multiple sources of truth; Excel supplements the ERP; tribal knowledge supplements documentation |
| Decision-point density | Few judgment calls per cycle; clear criteria; rare exceptions | Multiple experience-dependent judgment calls; high exception rate; “just ask Sarah” steps |
| Workflow debt load | Designed intentionally; steps exist for current reasons; cycle time near theoretical minimum | Accumulated organically; approval steps for problems solved in 2019; workarounds calcified into standard practice |
Scoring interpretation:
- 0–5: Overlay candidate — add AI tools to existing workflow without process redesign
- 6–10: Rebuild candidate — clean-sheet redesign required before AI deployment
- 11–15: Rebuild required — overlaying AI will automate the dysfunction
This is directional, not a formula. A workflow scoring 8 driven by one extreme criterion (e.g., severe handoff complexity but otherwise clean) may need targeted redesign of only that layer, not a full clean-sheet rebuild.
Practical application: A 200–500 person company can complete this scoring for 8–15 workflows in a single day by having the process owner and one frontline worker fill out the criteria. The output is a workflow portfolio map — overlay, rebuild, or targeted rebuild — that grounds the CEO/department head prioritization conversation in evidence rather than vendor demos.
The Sequencing Principle
MIT SMR’s Compound Benefits research (Kiron/Schrage, April 2026) adds a non-obvious selection criterion: start where your people have the deepest expertise. This is counterintuitive — it suggests not starting with the workflow that is easiest to automate, but with the one where your team is best equipped to recognize when the AI output is wrong.
The reasoning is operational: evaluation quality degrades when validators cannot distinguish a good AI output from a confident-but-wrong one. The MIT Sloan “persuasion bombing” finding (Feb 2026) shows LLMs overwhelm human reviewers through confident framing and volume. In a domain where humans lack expertise to push back, the AI wins the argument by default — including when it is wrong.
Selection criteria summary — synthesized from BCG, McKinsey, MIT SMR:
- Core business function (not support) — competing for 70% of value potential
- Data-intensive, high-frequency, clear KPI — matches the BCG Exhibit 4 empirical winners
- Overlay-vs-rebuild diagnostic score — determines what the workflow needs before AI can work
- Domain expertise available for evaluation — ensures human review is substantive, not performative
- Measurability in advance — can you define what “better” looks like in numbers before you start?
Part 2: ROI Definition — The Three Parameters That Make a Business Case Defensible
The Measurement Problem
The failure mode documented across every major 2026 survey is not that AI fails to produce results — it is that organizations cannot prove whether it did or did not.
Futurum Group (n=830 IT decision-makers, February 2026) documents that direct financial ROI is now the primary measurement standard: financial metrics (revenue + profitability) collectively surpassed productivity as the leading indicator, nearly doubling year-over-year to 21.7% of respondents. Productivity fell 5.8 points as the primary metric. The implication for business case construction: a proposal built on “hours saved” will not survive a 2026 CFO review.
Pertama Partners’ analysis of 2,400+ enterprise AI initiatives (2025–2026) provides the hardest ROI evidence in the corpus on this question: AI projects with pre-defined financial success metrics achieve a 54% success rate versus 12% without. That 42-point gap comes entirely from defining in advance what success looks like — not from the AI technology, the vendor, or the implementation team.
The Pertama data also documents what happens without kill criteria: the median failed AI project costs $4.2M and takes 11 months to die. Projects that run their full course without defined kill criteria cost $6.8M on average and deliver $1.9M in value — a negative 72% ROI. The kill-point definition is not pessimism; it is the difference between an $800,000 controlled exit and a $6.8M write-off.
The Three Pre-Deployment ROI Parameters
A defensible ROI definition requires all three of the following, established before any vendor is engaged:
Parameter 1: Baseline measurement
What does this process cost today — per unit, per cycle, per transaction — measured in real dollars and time? The baseline must be empirical, not estimated. Best practice: measure the current state for 30–60 days before the pilot starts, tracking cost per cycle, error rate, cycle time, and headcount engaged. This is not overhead — it is the denominator of the ROI calculation.
Without a measured baseline, the post-deployment comparison is unavoidable rationalization. Organizations that skip baseline measurement almost universally report “significant productivity gains” that cannot be traced to the P&L.
Relevant benchmark: fewer than 1 in 5 organizations track well-defined KPIs for gen AI solutions before deployment (McKinsey State of AI, n=1,491, 2025). This is the highest-EBIT-correlation practice, and it is the least commonly implemented one.
Parameter 2: Counterfactual definition
What would have happened over the measurement period without the AI deployment? This sounds academic but is operationally critical. Three variables affect it:
- Volume growth: If transaction volume increases 20% during the pilot, did productivity improve or did the team just get lucky that volume fit existing capacity?
- Seasonal adjustment: A Q4 pilot that shows efficiency gains against Q3 baseline is measuring seasonality, not AI impact.
- Parallel initiatives: If training, software upgrades, or headcount changes happen during the same period, attributing outcomes to AI requires explicit controls.
The simplest approach: run the AI pilot in one comparable business unit while a second unit serves as a control. The HBS/Microsoft Frontier Firm Initiative (14-organization cohort, March 2026) documents that organizations with explicit control conditions achieve far cleaner ROI attribution than those running company-wide deployments with no baseline.
Parameter 3: Time horizon with defined kill point
Define three decision gates in advance:
- Gate 1 (90 days): Adoption rate above 25%. If fewer than 1 in 4 intended users is engaging with the tool, the deployment is not underway — it is theoretical. The Pertama kill-trigger criterion.
- Gate 2 (6 months): Cost-per-transaction trending down, not flat or rising. If the unit economics have not moved after 6 months, the workflow selection or implementation has a problem that more time will not fix.
- Gate 3 (12 months): Documentable P&L impact. Not “we believe the time savings translate to value” — a line item. Revenue, cost avoidance, headcount reallocation documented at the department level. If this gate is not passed, the project moves to pivot or exit — not extension.
These are not aspirational milestones. They are kill criteria. The asymmetry is intentional: passing them continues the investment; missing them triggers a structured review, not a “let’s give it more time” conversation.
The ROI Calculation Architecture
Revenue-impact model (preferred for core business workflows):
The 2026 measurement shift means the strongest business cases connect AI deployment directly to revenue. The calculation:
Revenue impact = (Volume uplift × Average transaction value) + (Conversion rate improvement × Pipeline value) + (Retention improvement × Average customer LTV)
The BCG beauty company case (virtual assistant deployed across 20 markets) demonstrates the approach: target was $100 million in incremental revenue, defined pre-deployment as a function of conversion rate improvement and customer engagement metrics. That is a pre-deployment ROI definition. The target was specific, measurable, and independently verifiable.
Cost-avoidance model (appropriate for support and compliance workflows):
Cost impact = (Current cost per cycle × Cycle volume) × Efficiency gain % − (Total implementation cost ÷ amortization period)
The BCG energy workflow data provides calibration: infrastructure monitoring / predictive maintenance achieves 23–30% cost impact among companies that have scaled the workflow. The pre-deployment business case for a mid-market manufacturer would anchor on the 23% lower bound, document current maintenance cost per unit, and project forward with a 3-year amortization of the $15,000–$50,000 rebuild investment.
Headcount reallocation model (appropriate for high-volume, high-repetition workflows):
This is the most common model and the most frequently gamed. The discipline: headcount reallocation counts as a P&L benefit only if it produces one of three outcomes:
- Documented headcount reduction (people exit the organization or move to documented open requisitions)
- Documented reallocation to measurably higher-value work (the receiving function must have a quantifiable output target)
- Capacity to absorb documented growth without additional headcount (the alternative-cost calculation — what it would have cost to hire for the volume increase)
“Time savings that can be redeployed” is not a P&L line item. It is an aspiration. Every Pertama case of a failed AI project involves this class of benefit in the original business case — genuine time savings that never materialized as documented business outcomes.
The Total Cost of Ownership Requirement
No ROI calculation is defensible without a complete cost model. The 2.5x rule from the CFO AI Decision Framework: multiply the Year 1 software license by 2.5 to estimate total Year 1 cost of ownership.
The seven cost layers (200–500 person deployment):
| Cost layer | Typical range | When |
|---|---|---|
| Software licensing | $15–$80/seat/month | Month 1 |
| Integration and configuration | $25,000–$200,000 | Months 1–3 |
| Security review and compliance | $25,000–$150,000 | Months 1–3 |
| Training and change management | $500–$2,000/employee | Months 1–6 |
| Productivity dip during adoption | 5–15% output reduction, 4–8 weeks | Months 1–3 |
| Ongoing support and optimization | 15–20% of license cost annually | Ongoing |
| Data preparation and governance | $30,000–$150,000 | Months 1–6 |
Layer 5 — the productivity dip — is the most consistently omitted. For a 300-person deployment, even a 5% output reduction over 6 weeks represents $150,000–$300,000 in absorbed labor cost. It does not appear on a vendor ROI calculator. It appears in Q3 margins.
Part 3: Connecting Selection to ROI — The Pre-Deployment Checklist
The five selection criteria and three ROI parameters combine into a single pre-deployment checklist. A CIO or COO should be able to answer all eight questions before a vendor engagement begins.
Workflow selection verification:
- Is this workflow in a core business function (70% value pool) rather than a support function (30% value pool)?
- Does the workflow score below 6 on the five-criterion diagnostic (overlay candidate) or above 6 (rebuild required before AI deployment)?
- Is there measurable data volume and a clear KPI — cost per transaction, revenue per cycle, error rate — that defines “better”?
- Does the team that will use and review AI outputs have deep enough expertise to recognize wrong answers?
ROI definition verification: 5. Has the baseline been measured empirically (30–60 days of current-state data), or is it estimated? 6. Is there a control condition — a comparable unit not receiving the AI deployment — to isolate AI impact from volume, seasonal, and concurrent-initiative effects? 7. Are there three defined kill gates (90 days, 6 months, 12 months) with specific numeric triggers? 8. Does the business case express P&L impact in one of three documentable forms: revenue uplift, cost reduction, or headcount reallocation with a verifiable receiving function?
A “no” on any of these eight is not a reason to cancel the project. It is a reason to pause before signing the vendor contract, fix the gap, and then proceed. The gap that cannot be fixed — almost always question 5 (no baseline) or question 8 (no documentable P&L form) — is the gap that produces the $4.2M failed project 11 months later.
What This Means for Your Organization
Two patterns dominate mid-market AI failures that reach Brandon’s desk. The first: a company picks a workflow because it seemed manageable — low risk, easy to pilot — rather than because it had the highest value potential. Twelve months later, the pilot succeeded technically and produced nothing on the P&L. The second: a company builds a business case based on productivity savings, gets board approval, and cannot explain 18 months later why the productivity gains never appeared in the financials.
Both failures are preventable. The workflow selection criteria and the ROI definition framework above are the prevention. Neither requires a consultant to implement — they require discipline.
The highest-leverage action a mid-market CIO or COO can take before the next AI vendor meeting: run the five-criterion diagnostic on the three workflows you are considering, and write down the numeric baseline for each one. If you cannot measure the current state before deployment, you cannot prove the AI worked after. That is not a measurement problem — it is a governance problem.
If this raised questions specific to your portfolio or your organization’s current AI program, I’d welcome the conversation — brandon@brandonsneider.com.
Key Data Points
| Finding | Source | Date | Credibility |
|---|---|---|---|
| Workflow redesign is #1 EBIT predictor out of 25 attributes tested across n=1,993 organizations | McKinsey State of AI (n=1,993, Jul 2025 survey) | Nov 2025 | HIGH — independent survey, Johnson’s Relative Weights regression |
| Only 21% of organizations have fundamentally redesigned any workflows when deploying gen AI | McKinsey State of AI (n=1,491, Jul 2024 survey) | Mar 2025 | HIGH — consistent with Nov 2025 edition |
| 70% of total AI value concentrates in core business functions (R&D, marketing, manufacturing, supply chain) | BCG Build for the Future 2025 (n=1,250) | Sep 2025 | MEDIUM — consulting firm, self-reported, proprietary methodology |
| 5% of companies achieve substantial AI value; 60% are laggards despite active use | BCG Build for the Future 2025 (n=1,250) | Sep 2025 | MEDIUM — consistent with McKinsey (6%) and MIT CISR independent research |
| AI projects with pre-defined financial success metrics achieve 54% success rate vs. 12% without | Pertama Partners, 2,400+ enterprise AI initiatives | 2025–2026 | MEDIUM — proprietary dataset, not independently audited |
| Median failed AI project: $4.2M cost, 11 months to die | Pertama Partners (2025); RAND (2025) | 2025 | MEDIUM — proprietary; directionally consistent with McKinsey project-outcome data |
| Financial ROI (revenue + profitability) as primary AI metric nearly doubled YoY to 21.7%; productivity fell 5.8 points | Futurum Group 1H 2026 (n=830 IT decision-makers) | Feb 2026 | MEDIUM-HIGH — independent analyst firm, not vendor-funded |
| Future-built companies achieve 76% higher match between where AI is deployed and where it delivers impact | BCG Build for the Future 2025 (n=1,250) | Sep 2025 | MEDIUM — BCG-defined metric, same survey |
| Future-built companies deploy 62% of AI initiatives vs. 12% at laggards; 9–12 months to full deployment vs. 12–18 months | BCG Build for the Future 2025 (n=1,250) | Sep 2025 | MEDIUM — same survey |
| 5% productivity gain (AI overlay, no redesign) vs. 30% (same tech, workflow redesign) — European telecom | Deloitte Global Human Capital Trends 2026 (n=9,000+) | Mar 2026 | MODERATE — single unnamed case, consistent with Bain banking case |
| UK bank: 60–100 day process → 1 day; 40 people, 10 handoffs → 4–5 people, zero handoffs, after clean-sheet redesign | Bain “Unsticking Your AI Transformation” (2025) | 2025 | HIGH — specific metrics, named methodology |
| Organizations with systematic AI feedback loops 6x more likely to achieve substantial financial benefits | MIT SMR, Kiron/Schrage (Apr 2026) | Apr 2026 | HIGH — MIT Sloan, independent academic source |
| Fewer than 1 in 5 organizations track well-defined KPIs for gen AI before deployment — the practice with highest EBIT correlation | McKinsey State of AI (n=1,491, 2025) | Mar 2025 | HIGH — independent survey |
Sources
-
BCG “The Widening AI Value Gap: Build for the Future 2025” — Apotheker, Beauchene, de Bellefonds et al. September 2025. n=1,250 senior executives, 9 industries, 41 AI capability dimensions. https://media-publications.bcg.com/The-Widening-AI-Value-Gap-Sept-2025.pdf. Credibility: MEDIUM — BCG consulting commercial interest in AI transformation; financial metrics are cross-sectional correlations, not RCT; directionally corroborated by MIT CISR (n=721, independent academic research).
-
McKinsey “State of AI” March 2025 edition — n=1,491, 101 countries, July 2024 fieldwork. Published March 12, 2025. https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai. Credibility: MEDIUM-HIGH — consistent with November 2025 follow-on (n=1,993); workflow redesign #1 EBIT predictor is the highest-granularity finding in the McKinsey AI series.
-
MIT SMR “How to Reap Compound Benefits from Generative AI” — Kiron/Schrage. April 6, 2026. MIT Sloan Management Review. https://sloanreview.mit.edu/article/how-to-reap-compound-benefits-from-generative-ai/. Credibility: HIGH — independent academic publication; 6x and 73% figures draw from survey data with published methodology.
-
Futurum Group “Enterprise AI ROI Shifts as Agentic Priorities Surge” — n=830 global IT decision-makers. February 17, 2026. https://futurumgroup.com/press-release/enterprise-ai-roi-shifts-as-agentic-priorities-surge/. Credibility: MEDIUM-HIGH — independent analyst firm; press release summary, not full methodology; survey described as “global IT decision-makers,” geography and industry breakdown not disclosed in press release.
-
Pertama Partners / Enterprise AI Initiative Analysis — 2,400+ enterprise AI initiatives, 2025–2026. Referenced in CFO AI Decision Framework corpus file (
research/07-adoption-challenges/cfo-ai-decision-framework.md). Credibility: MEDIUM — proprietary dataset; not independently audited; directionally consistent with RAND Corporation and McKinsey project-outcome data. -
Deloitte “Global Human Capital Trends 2026” — Poynton, Flynn, Scoble-Williams et al. March 4, 2026. n=9,000+ business and HR leaders, 89 countries, Oxford Economics fieldwork. https://www.deloitte.com/us/en/insights/topics/talent/human-capital-trends.html. Credibility: HIGH for survey findings; Deloitte consulting commercial interest in workforce transformation engagements.
-
Bain “Unsticking Your AI Transformation” (2025). UK bank workflow redesign case study: 60–100 day → 1 day process. Cited in companion research
ai-workflow-redesign-overlay-vs-rebuild.md. Credibility: HIGH — specific metrics, named methodology, directionally consistent with independent evidence. -
HBS/Microsoft Frontier Firm Initiative — March 2026. 14-organization cohort studying structural AI adoption barriers including productivity reabsorption. Cited in companion research. Credibility: HIGH — Harvard/Microsoft research collaboration.
Brandon Sneider | brandon@brandonsneider.com April 2026