Executive Summary
- Across hundreds of enterprise AI deployments tracked in this corpus, four organizational pre-conditions predict failure with more reliability than any technology choice: no workflow redesign mandate, no named governance owner, unassessed data, and no production path in the pilot design.
- The 18 red flags below are drawn from failure evidence — METR’s RCT, FinCo’s governance failure, OutSystems’s forced-adoption collapse, Writer’s C-suite candor study, McKinsey’s 1,993-organization benchmark, and Rewired’s five transformation sins (Introduction). They are not theoretical.
- The checklist is calibrated for a pre-deployment 90-minute executive review. If three or more flags are present, stop and remediate before investing further. The corpus shows that proceeding with three or more flags active reduces production success rates below 15%.
How to Use This Checklist
Score each flag: Present (1) / Not Present (0). Flags marked ⚠️ are high-severity: a single one of these warrants a pause-and-fix review regardless of other scores.
If 3+ flags present: stop and fix before proceeding. The probability of production success with 3+ flags is below 15% based on cross-corpus failure rate data (BCG 60% failure rate, McKinsey 6% high-performer share, Grant Thornton 15% vs. 58% revenue-growth divide, Pertama Partners 2,400+ initiative analysis).
The 18 Red Flags
Category A — Strategy and Mandate
1. ⚠️ No explicit definition of what activity the AI will eliminate (not just accelerate)
If the workflow design does not specify what disappears when AI is deployed, the bottleneck will move downstream, not away. ActivTrak’s behavioral study (n=163,638 workers, 443 million hours, 2025) measured the result: after AI deployment with no redesign mandate, email volume increased 104%, chat messages 145%, and deep focus sessions dropped 9%. The tool added speed; the surrounding work expanded to fill it.
Source: research/07-adoption-challenges/ai-deployment-failure-modes-synthesis.md; research/07-adoption-challenges/ai-failure-pattern-library.md
2. AI initiative scoped entirely to IT without cross-functional authority
IT can deploy a tool. IT cannot redesign the business workflow that generates the value. McKinsey’s 1,993-organization survey finds the single most predictive behavioral gap between high and low performers is workflow redesign: 55% of high performers fundamentally redesigned workflows vs. 18% of others — a 3x gap across 25 tested attributes. Workflow redesign requires business-unit authority. It cannot be delegated to IT alone.
Source: research/04-consulting-firms/mckinsey-ai-transformation-manifesto-2026.md; Lamarre et al., 2024, Ch. 4 p.63 — “have business leaders lead the reimagination”
3. Pilot is designed to answer “does AI work?” with no production budget attached
S&P Global (n=1,006, 2025) finds 46% of AI proofs-of-concept are scrapped before production. McKinsey’s November 2025 data shows two-thirds of firms remain stuck in pilot mode. The mechanism: pilot budgets clear; the 3–5x larger production budget requires a separate business case, which requires quantified pilot outcomes, which require pre-defined metrics — a chain that most pilots are not designed to complete. A pilot without a named production path and a named owner for the production business case is a research project, not a deployment.
Source: research/07-adoption-challenges/ai-deployment-failure-modes-synthesis.md; research/09-ai-adoption-cycle/first-90-days-ai-sponsor-executive-playbook.md
4. ⚠️ Use-case portfolio spans more than 10 initiatives with no domain concentration
Rewired names the “use-case implementation race” as the primary anti-pattern in AI strategy: “high activity, no P&L impact” (Introduction, pp.20–21). BCG’s AI Workforce Transformation 2026 research corroborates: spreading AI efforts across “dozens or hundreds of use cases” rather than concentrating on 3–4 central priorities produces the 60% failure rate. McKinsey’s Transformation Manifesto (20 leading companies) found that concentrating on 1–3 business domains with full workflow reinvention delivered 20% EBITDA uplift vs. incremental use-case accumulation.
Source: research/04-consulting-firms/mckinsey-rewired-2nd-edition-synthesis.md (Capability 1); research/04-consulting-firms/mckinsey-ai-transformation-manifesto-2026.md
Category B — Governance and Ownership
5. ⚠️ No single named individual whose performance review includes AI governance outcomes
McKinsey’s Responsible AI maturity benchmarking (n=~500, 2025–2026) quantifies the ownership gap directly: organizations with a clearly accountable function for RAI score 2.6/4.0 on the maturity scale; those without score 1.8 — a 0.8-point gap that maps directly to the financial performance divide. A committee is not a named owner. A shared model is not a named owner.
Source: research/04-consulting-firms/mckinsey-ai-trust-maturity-2026.md; research/07-adoption-challenges/ai-deployment-failure-modes-synthesis.md
6. Governance framework designed without an expiration or review cycle
The MIT CISR FinCo case is the corpus anchor for this flag. FinCo built comprehensive governance infrastructure — board-sponsored policy, AI Review Committees, tiered decision rights, “FinGPT” with full logging and PII masking. The policy consumed close to a year and hundreds of stakeholders. Result: more shadow AI than before, because the governance had no mechanism for adapting as the technology moved. A low-risk agent prototype stalled six months in review. Employees reverted to unsanctioned tools. The failure was not bad governance design — it was governance without a living review cycle matched to the technology’s pace.
Source: research/04-consulting-firms/mckinsey-methodology-critique.md; research/07-adoption-challenges/ai-deployment-failure-modes-synthesis.md
7. 52% signal: department-level AI initiatives operating without formal approval or oversight
EY Technology Pulse Poll (n=500 US tech leaders, January–February 2026) finds 52% of department-level AI initiatives operate without formal approval or oversight. At the enterprise level, the relevant diagnostic question is: does leadership know what AI is actually running in each business unit? If the answer is “we think so,” the governance gap is active and shadow AI is accumulating.
Source: research/04-consulting-firms/ey-autonomous-ai-tech-pulse-2026.md
8. No kill criteria defined before the pilot launches
Pilots without pre-defined kill criteria run until the budget expires rather than until the evidence says stop. The Pertama Partners 2,400+ initiative analysis finds that 54% of AI initiatives with pre-defined financial success metrics succeed, vs. 12% without. Three required kill criteria: (1) adoption >25% at 90 days, (2) unit cost declining at 6 months, (3) documentable P&L impact at 12 months. Absence of all three is the flag; absence of even one materially reduces success probability.
Source: research/07-adoption-challenges/ai-workflow-selection-roi-definition-framework.md
Category C — Data Readiness
9. ⚠️ Pilot ran on curated or hand-cleaned sample data, not live production feed
The Data Mirage failure pattern (Pertama Partners, n=2,400+, 2025–2026): AI pilots succeed on clean, curated sample data; production deployment exposes the real data landscape. 60–70% of project time shifts to data preparation after approval. The business case collapses. Data quality issues appear in 71% of AI project failures; 38% of formally abandoned projects cite “insurmountable data quality” as primary reason.
Source: research/07-adoption-challenges/ai-failure-pattern-library.md; research/07-adoption-challenges/ai-deployment-failure-modes-synthesis.md
10. No formal data readiness assessment completed before pilot approval
Organizations that conduct formal data readiness assessments before AI project approval achieve 47% production success rate vs. 14% without — a 2.6x improvement (Pertama Partners, 2025–2026). Only 7% of enterprises say their data is completely ready for AI (Cloudera/HBR Analytic Services, n=230, March 2026). The assessment needs to cover only the specific workflow’s input data, production sources, and access path — not the whole enterprise.
Source: wiki/data-readiness.md; research/07-adoption-challenges/ai-failure-pattern-library.md
11. Workflow crosses 3+ data domains with no semantic integration layer
The Nebraska Medicine vs. SlickDeals split is the clearest data-domain diagnostic in the corpus. Single-domain workflows (one source system, one ownership function) proceed with infrastructure modernization. Multi-domain workflows (patient + procedure + payer + physician; order + inventory + supplier + finance) require a semantic integration layer — Ontology, canonical data model, or equivalent — before AI can operate reliably. Deploying without this layer produces the Data Mirage failure pattern (above) at production scale.
Source: research/07-adoption-challenges/ai-data-reset-decision-framework.md; research/01-ai-native-landscape/palantir-aipcon-enterprise-agentic-deployment-2026.md
12. No tech-debt line item in the AI business case
IBM IBV “The Tech Debt Reckoning” (n=1,300, November 2025) quantifies the blind-spot cost: 18–29% of total AI implementation cost through 2027 will be absorbed by tech-debt remediation; 15–22% schedule extension (a 30-month program becomes a 36-month program). Business cases that omit these line items project +39% ROI; post-mortem reality is −14%. The $1-$10-$100 rule for data errors applies: $1 to prevent at entry, $10 to remediate downstream, $100 when the error reaches a decision or customer.
Source: research/04-consulting-firms/ibm-ibv-tech-debt-reckoning-2026.md; wiki/data-readiness.md
Category D — Adoption and Workflow Design
13. ⚠️ AI tool deployment preceded workflow redesign
The corpus’s single most replicated finding: Deloitte (n=3,235) finds 37% of AI implementations involve surface-level use with minimal process change — the benchmark for this failure mode at scale. McKinsey (n=1,993) finds only 21% of organizations using generative AI have fundamentally redesigned any workflows. The MIT CISR 4-stage model finds Stage 1 organizations (tools deployed without workflow redesign) post −12.6 percentage points vs. industry-average growth; Stage 3 (with redesign) post +11.3 pp. The technology is not the variable.
Source: wiki/workflow-redesign.md; Lamarre et al., 2024, Ch. 30 p.457 (adoption must be engineered) and Ch. 5 p.92 (right tool for the job)
14. Top-down mandate without employee buy-in mechanism
Dr. Sam Zolfagharian (AI For the C-Suite, April 2026): “One of the quotes from their employees was, ‘You’re bringing these AI tools to replace me. So I’m not going to do anything,’ because it’s coming from the top level.” This is the practitioner-level explanation for the Writer/Workplace Intelligence finding that only 29% of organizations with >$1M AI investment see significant GenAI ROI. The investment is present; adoption is not. The OutSystems EY deployment failure follows the same pattern: forced adoption produces resistance, not usage.
Source: research/13-multimodal-sources/ai-for-the-c-suite/2026-04-14-dr-sam-zolfagharian-how-leaders-should-actually-approach-ai-.md; research/07-adoption-challenges/writer-enterprise-ai-adoption-2026.md
15. No training program for employees who will use the tool
BCG AI at Work 2025 (n=10,600): employees who receive 5+ hours of AI training become regular users at a 79% rate vs. 67% with less training. Workday/Hanover Research (n=3,200, January 2026): 40% of AI time savings are consumed by rework among users who did not receive structured training; only 14% of employees consistently achieve clear positive net outcomes from AI. Rewired frames talent development as a non-negotiable prerequisite (Capability 2), not a Phase 3 deployment activity.
Source: research/07-adoption-challenges/workday-beyond-productivity-ai-rework-2026.md; research/04-consulting-firms/mckinsey-rewired-2nd-edition-synthesis.md (Capability 2)
16. HITL (Human-in-the-Loop) review designed as a rubber-stamp approval click
Thomson Reuters (CIO.com, 2025) monitors validation gates for rubber-stamping and flags reviews completed in under two seconds. The Dietvorst et al. (Management Science, 2016) mechanism requires retained agency to activate: humans accept AI outputs more readily when they can modify them, but only when the review carries real time and real authority. A cosmetic approval click does not activate the adoption mechanism. It creates the appearance of oversight without the substance — the regulatory version of this is the FinCo governance failure at institutional scale.
Source: wiki/hitl-deployment-pattern.md; research/07-adoption-challenges/hitl-as-adoption-architecture.md
Category E — Measurement and Scaling
17. Success metric is “hours saved” rather than a financial outcome
Futurum Group (n=830, February 2026): direct financial ROI (revenue + profitability) nearly doubled YoY to 21.7% as the primary AI success metric; productivity gains (hours saved) fell 5.8 points. Business cases built on hours-saved measures will not survive a 2026 CFO review. METR’s RCT (n=16 experienced developers, July 2025) demonstrated the risk concretely: developers believed they were 20% more productive — the self-reported hours-saved measure — while measured task completion showed they were 19% slower.
Source: research/07-adoption-challenges/ai-workflow-selection-roi-definition-framework.md; research/01-ai-native-landscape/metr-ai-coding-rct-2025.md
18. No reuse architecture: AI deployment is designed for one workflow only
Rewired’s “best use case is the reuse case” principle (Ch. 5, p.93; Ch. 31, p.469) names the economic mechanism: data products and workflow patterns built for one AI deployment amortize across subsequent deployments. Organizations that design AI deployments as point solutions — one workflow, one integration, one model — rebuild the same infrastructure each time. Palantir’s 139% net dollar retention (Q4 2025 audited financials) is the supply-side expression of compounding returns: after the foundational Ontology, each new workflow costs a fraction of the first.
Source: research/04-consulting-firms/mckinsey-rewired-2nd-edition-synthesis.md (Capability 5); research/01-ai-native-landscape/palantir-aipcon-enterprise-agentic-deployment-2026.md
Severity Summary
| Flag | Category | Severity |
|---|---|---|
| 1. No activity-elimination mandate | Strategy | ⚠️ High |
| 2. IT-only scope | Strategy | Medium |
| 3. No production path | Strategy | Medium |
| 4. >10 use cases, no domain focus | Strategy | ⚠️ High |
| 5. No named governance owner | Governance | ⚠️ High |
| 6. No governance review cycle | Governance | Medium |
| 7. Unapproved department AI | Governance | Medium |
| 8. No kill criteria | Governance | Medium |
| 9. Curated pilot data | Data | ⚠️ High |
| 10. No data readiness assessment | Data | Medium |
| 11. 3+ data domains, no integration layer | Data | ⚠️ High |
| 12. No tech-debt line item | Data | Medium |
| 13. AI before workflow redesign | Adoption | ⚠️ High |
| 14. Top-down mandate, no buy-in | Adoption | Medium |
| 15. No training program | Adoption | Medium |
| 16. Rubber-stamp HITL | Adoption | Medium |
| 17. Hours-saved success metric | Measurement | Medium |
| 18. No reuse architecture | Scaling | Medium |
If 3+ Flags Present: Stop and Fix Before Proceeding
The corpus does not have sufficient longitudinal failure data to claim a precise threshold. What the data does support:
- Organizations with all four primary pre-conditions present (flags 1, 5, 9, 13) have less than 15% production success probability based on the BCG/McKinsey/Gartner aggregate failure-rate data.
- Every ⚠️ High-severity flag warrants individual review regardless of overall count.
- Remediating flags 1 (workflow mandate), 5 (named owner), 9 (production data), and 13 (workflow redesign) before deployment is associated with a 3–4x improvement in production success rates across the corpus.
If the discussion about this checklist in your organization produces defensiveness rather than analysis, that defensiveness is a 19th flag.
Related Wiki Articles
- wiki/workflow-redesign.md — flags 1, 2, 13, 14
- wiki/data-readiness.md — flags 9, 10, 11, 12
- wiki/hitl-deployment-pattern.md — flag 16
- wiki/ai-maturity-models.md — framework context for flags 3, 4, 17, 18
- wiki/agentic-ai-governance.md — flags 5, 6, 7, 8
- wiki/ai-deployment-failure-modes.md — pattern library underpinning this checklist
Brandon Sneider | brandon@brandonsneider.com April 2026