The 90-Day AI Health Check: Seven Questions and Five Thresholds That Tell a CEO Whether to Kill, Pivot, or Scale

Brandon Sneider | March 2026

Executive Summary

Most AI initiatives that ultimately fail are diagnosable by Day 30 — yet 88% of organizations lack the checkpoint infrastructure to detect failure signals before month nine. Projects with pre-defined success metrics achieve a 54% success rate versus 12% without them, a 4.5x difference created entirely by the discipline of measurement before deployment (Pertama Partners, n=2,400+ enterprise AI initiatives, 2025-2026).
The median abandoned AI project consumes 11 months and $4.2 million before termination. Twelve percent of failures are detectable in the first 90 days, but the organizations that catch them at that stage reduce sunk costs by 60%. The health check is the instrument that makes early detection systematic rather than accidental.
Gartner predicts 30% of generative AI projects will be abandoned after proof of concept, with 60% failing due to lack of AI-ready data. The 5% capturing real value run structured gate reviews at 30, 60, and 90 days — each with specific questions, quantitative thresholds, and a forcing function for the kill/pivot/scale decision.
The health check below is calibrated for a 200-2,000 person company spending $50K-$500K on its first or second AI initiative. It requires no specialized tooling — a CEO, the executive sponsor, and the internal AI owner can complete it in 90 minutes at each gate.

Why the Health Check Exists

Fortune reported in December 2025 that 2026 is the year CEOs must prove AI is powering growth — not cost cutting alone. ServiceNow’s CFO stated it plainly: “AI will be judged less on promise and more on proof.” Wharton’s third annual AI adoption study (n=800+ enterprise leaders, October 2025) found 72% of organizations now measure AI ROI, up from less than half in 2023, signaling a shift from experimentation to what Wharton calls “accountable acceleration.”

The problem is not that companies stopped measuring. The problem is that they measure the wrong things at the wrong time.

Deloitte’s State of AI in the Enterprise (n=3,235 senior leaders, 24 countries, August-September 2025) finds 66% of organizations report productivity gains — but only 20% can demonstrate measurable revenue impact. Thirty-seven percent use AI at a surface level with no process changes. The gap between “people are using it” and “it changed how work gets done” is where most mid-market AI investments quietly die.

The health check closes that gap by asking the right questions at each 30-day gate — and by defining, in advance, what the answers must look like to justify continued investment.

The Three-Gate Architecture

The instrument operates at three gates. Each gate has a specific purpose, a defined set of questions, quantitative thresholds where available, and three possible outcomes: proceed, pivot, or terminate.

The gate structure reflects a fundamental insight from Pertama Partners’ analysis: 50% of all AI project failures occur between months three and nine. By Day 90, the trajectory is set. Organizations that intervene before the trajectory calcifies reduce sunk costs by 60%. Organizations that wait for the annual review absorb the full loss.

Gate 1: Day 30 — Foundation Audit

Purpose: Confirm that the prerequisites for success exist before committing operational resources.

The Day 30 gate does not ask “Is the AI working?” It asks “Can the AI possibly work given what we have in place?” This distinction matters. Pertama Partners finds that failed AI projects invest only 18% of budget in foundations (data readiness, workflow design, change management) versus 47% in successful ones. The Day 30 gate catches this imbalance before the remaining 82% is spent.

Seven questions at Day 30:

#	Question	What You’re Testing	Red Flag
1	Is there a named executive sponsor who has spent time on this initiative in the past two weeks?	Sponsorship viability	Sponsor has not reviewed progress since approval. Pertama Partners finds projects losing active sponsorship within six months fail at 89% — a 4.1x worse outcome than sustained sponsorship.
2	Are pre-deployment baselines documented for the target workflow?	Measurement readiness	No cost-per-transaction, hours-per-cycle, or error-rate baseline exists. Without a baseline, improvement is unmeasurable — the single most common mistake in enterprise AI deployment.
3	Has the target workflow been mapped end-to-end, including handoffs to adjacent functions?	Workflow understanding	The team is deploying AI into a workflow no one has documented. McKinsey (n=1,993, July 2025) identifies workflow redesign as the strongest predictor of EBIT impact from AI — 3.6x more likely in high performers.
4	Do at least 40% of intended users have tool access and have logged a first meaningful interaction?	Activation threshold	Fewer than 40% activated. Nebuly’s enterprise AI benchmarks (2025) identify 40-60% first-interaction activation as the minimum viable threshold at Day 30. Below this, the pilot is already stalling.
5	Has the team defined 2-3 specific success metrics tied to revenue, margin, or a strategic operational outcome?	Metric specificity	Metrics are vague (“improved productivity,” “better efficiency”) rather than specific (“reduce invoice processing cost from $18 to $9 per unit”). Gartner warns that vague metrics are the leading cause of post-POC abandonment.
6	Is data flowing cleanly into the AI tool, or are manual workarounds required?	Data readiness	The team is manually formatting, cleaning, or transferring data to make the tool function. Gartner predicts 60% of AI projects fail due to data problems — and the workaround pattern at Day 30 is the leading indicator.
7	Has anyone said “no” to a scope expansion request in the past 30 days?	Scope discipline	The pilot scope has expanded since approval without a formal change process. Scope creep is a structural marker of Pattern 2 (The Scope Spiral) in the AI failure pattern library — the second most common failure archetype.

Day 30 decision framework:

Condition	Decision
5-7 questions answered satisfactorily	Proceed to Gate 2. The foundation supports the initiative.
3-4 questions answered satisfactorily	Pivot. Pause deployment, remediate the gaps (typically 2-4 weeks for data or workflow documentation), then restart the 30-day clock.
0-2 questions answered satisfactorily	Terminate or completely restructure. The initiative lacks the prerequisites that distinguish the 54% success rate from the 12%. Continuing is investing in a known failure pattern.

Gate 2: Day 60 — Adoption and Integration Audit

Purpose: Determine whether AI is changing how work gets done, not just whether people are logging in.

The Day 60 gate tests for the gap that Deloitte found in 37% of organizations: surface-level usage with no process change. Usage dashboards answer “who opened the tool?” The Day 60 audit answers “who changed their work?”

Five questions at Day 60:

#	Question	What You’re Testing	Red Flag
1	Are 15-25% of intended users in active daily use — not just weekly login?	Routine integration	Daily active usage below 15%. Nebuly’s 30-day benchmark identifies 15-25% daily active use as the threshold where AI has moved from novelty to routine. Below this at Day 60, the tool is becoming shelfware.
2	Has the target workflow been modified in at least one documented way because of AI?	Workflow change	The team is using AI within the old workflow without modification. This is the performative adoption pattern — the tool is overlay, not integration.
3	Can the team demonstrate one specific output that is measurably better, faster, or cheaper than the pre-AI baseline?	Value evidence	No measurable improvement against the Day 30 baseline. At Day 60, a team should be able to point to at least one concrete before/after comparison. Absence of any measurable gain at this stage is a strong negative signal.
4	Are early adopters expanding their usage from simple queries to end-to-end task completion?	Depth of adoption	Users are still performing the same simple tasks they started with. Nebuly identifies progression from basic queries to complex workflows as the Day 60 marker that distinguishes genuine adoption from compliance. Retention of 70-80% of early adopters signals genuine perceived value.
5	Has the team encountered and resolved at least one unexpected challenge — and documented the resolution?	Learning infrastructure	No challenges documented. A pilot that reports no problems is a pilot that is not being used seriously. The learning journal is the artifact that converts experiment into institutional knowledge.

Day 60 decision framework:

Condition	Decision
4-5 questions answered satisfactorily	Proceed to Gate 3. The initiative is generating evidence of value.
2-3 questions answered satisfactorily	Pivot. The most common Day 60 pivot: add workflow redesign, manager coaching, or training investment to address adoption gaps. Do not add more tools. BCG (n=1,488, 2025) documents that productivity collapses at 4+ AI tools per employee.
0-1 questions answered satisfactorily	Terminate unless the team can identify a specific, remediable root cause. Absence of value evidence at Day 60 predicts absence at Day 180 with high reliability.

Gate 3: Day 90 — Value and Scale-Readiness Audit

Purpose: Make the go/no-go decision for production investment with data, not hope.

The Day 90 gate is the decision point. Pertama Partners documents that the median time from AI project approval to failure is 13.7 months, and that 38% of failures occur between months three and nine. The Day 90 audit catches the initiative at the inflection point — the last moment where intervention is materially cheaper than continuation.

Five questions at Day 90:

#	Question	What You’re Testing	Red Flag
1	Can the sponsor articulate the dollar impact — actual or projected — in one sentence?	Executive fluency	The sponsor cannot state the value proposition without referencing a slide deck. If the champion cannot articulate value at Day 90, the initiative will not survive the next budget cycle.
2	Does the ROI math work at production scale — not just pilot scale?	Production economics	Pilot ROI is positive, but production costs (integration, training, ongoing operations, governance) have not been modeled. Gartner documents 280% average cost overrun from pilot to production. The Day 90 audit must apply the 2.5-4x scaling multiplier to the pilot budget.
3	Is the improvement measurable against the pre-deployment baseline and attributable to AI (not to the Hawthorne effect or concurrent changes)?	Attribution discipline	The team reports improvement but cannot isolate AI’s contribution from other changes. This is where control comparisons matter — the team that kept a manual comparison group can answer this question; the team that did not cannot.
4	Would the target workflow regress if the AI tool were removed tomorrow?	Dependency test	The team could revert without measurable loss. If the process would not regress, AI has not created value — it has created activity. This is the acid test that separates genuine integration from performative adoption.
5	Is the organization prepared for the production requirements — governance, training, support model, vendor management — that the pilot did not test?	Scale readiness	The team assumes production is “the pilot, but for everyone.” Production requires governance documentation, a training program for non-pilot users, a support escalation path, and vendor contract terms reviewed for production-scale pricing and data usage. If none of these exist at Day 90, the initiative is not ready to scale.

Day 90 decision framework:

Condition	Decision
4-5 questions answered satisfactorily	Scale. Proceed to production planning with a budget that applies the 2.5-4x pilot-to-production multiplier. Document the pilot’s evidence in the production business case.
2-3 questions answered satisfactorily	Extend the pilot for 30-60 days with a specific remediation plan. Define what the answers must look like at the re-evaluation to justify production. This is not a delay — it is a disciplined investment to avoid the 280% cost overrun pattern.
0-1 questions answered satisfactorily	Terminate. Conduct a 2-hour post-mortem using the failure pattern library to identify the root cause. Document institutional learning. Reallocate the production budget. The $50K-$150K invested in a terminated pilot is a fraction of the $4.2 million median sunk cost of a project that runs 11 months without intervention.

Key Data Points

Metric	Value	Source
Success rate with pre-defined metrics vs. without	54% vs. 12% (4.5x)	Pertama Partners (n=2,400+, 2025-2026)
Median time to AI project abandonment	11 months	Pertama Partners (n=2,400+, 2025-2026)
Median sunk cost of abandoned AI project	$4.2 million	Pertama Partners (n=2,400+, 2025-2026)
Cost reduction from early-stage failure detection	60%	Pertama Partners (n=2,400+, 2025-2026)
GenAI pilot failure rate (no measurable P&L impact)	95%	MIT NANDA (150 interviews, 350 surveys, 300 deployments, August 2025)
Organizations reporting productivity gains but unable to demonstrate revenue impact	66% gains reported / 20% revenue impact measured	Deloitte State of AI (n=3,235, August-September 2025)
Organizations using AI at surface level with no process change	37%	Deloitte State of AI (n=3,235, August-September 2025)
Projects moving from pilot to production	25% have moved 40%+ of pilots	Deloitte State of AI (n=3,235, August-September 2025)
Average cost overrun from pilot to production	280%	Gartner (2024-2025)
Enterprises now measuring AI ROI	72%	Wharton/GBK (n=800+, October 2025)
Day 30 first-interaction activation threshold	40-60% of intended users	Nebuly enterprise benchmarks (2025)
Day 30 daily active use threshold	15-25% of intended users	Nebuly enterprise benchmarks (2025)
Day 60 early adopter retention threshold	70-80%	Nebuly enterprise benchmarks (2025)
Sustained executive sponsorship success rate vs. loss	68% vs. 11% (4.1x)	Pertama Partners (n=2,400+, 2025-2026)
Budget invested in foundations: successful vs. failed projects	47% vs. 18%	Pertama Partners (n=2,400+, 2025-2026)
GenAI projects abandoned after POC (predicted)	30%	Gartner (July 2024 prediction for end of 2025)

What This Means for Your Organization

The health check is not a bureaucratic exercise. It is the instrument that separates the 54% success rate from the 12%.

Most mid-market companies running AI pilots today are measuring the wrong signals. Usage dashboards tell the CEO how many people logged in. They do not tell the CEO whether the invoice processing workflow now costs $9 per unit instead of $18, whether the sales team’s proposal turnaround dropped from five days to two, or whether the customer service team’s first-response time improved by 40%. The health check forces these questions at the moment when the answers still matter — before the pilot’s trajectory becomes irreversible.

The three-gate structure costs a 200-500 person company approximately 4.5 hours of senior leadership time over 90 days (90 minutes per gate). The alternative — discovering at month nine that the initiative has consumed $200K-$400K without measurable impact — is the pattern that produces the 42% abandonment rate. The health check is cheap insurance against expensive drift.

One practical note: the Day 30 gate is the most important. It catches foundation failures — missing baselines, absent sponsorship, vague metrics — before operational resources are committed. If a CEO implements only one gate, implement this one. The discipline of forcing the seven Day 30 questions before expanding investment would, by itself, change the trajectory of most mid-market AI programs.

If the health check raised questions specific to an initiative already underway in your organization, I would welcome the conversation — brandon@brandonsneider.com

Sources

Pertama Partners, “AI Project Failure Statistics 2026,” February 2026 (updated February 21, 2026). Analysis of 2,400+ enterprise AI initiatives tracked through 2025-2026. Primary source for 54% vs. 12% metric success rates, $4.2M median sunk cost, 11-month median abandonment timeline, and 60% cost reduction from early detection. Aggregates RAND Corporation, MIT Sloan, McKinsey, Deloitte, and Gartner data. Credibility: consulting firm analysis; comprehensive but relies on aggregated secondary sources without publishing granular methodology. https://www.pertamapartners.com/insights/ai-project-failure-statistics-2026
MIT NANDA, “The GenAI Divide: State of AI in Business 2025,” August 2025. Based on 150+ leadership interviews, 350-employee survey, and analysis of 300 public AI deployments. Source for 95% pilot failure rate and buy-vs-build success differential. Credibility: academic research institution; methodology disclosed; moderate sample size for interviews, small for surveys. https://fortune.com/2025/08/18/mit-report-95-percent-generative-ai-pilots-at-companies-failing-cfo/
Deloitte, “State of AI in the Enterprise, 7th Edition,” 2026. Survey of 3,235 senior leaders across 24 countries and six industries, August-September 2025. Source for 37% surface-level adoption, 25% pilot-to-production conversion, 66% productivity gain reporting vs. 20% revenue impact demonstration. Credibility: independent consulting survey; large sample; global methodology. https://www.deloitte.com/us/en/what-we-do/capabilities/applied-artificial-intelligence/content/state-of-ai-in-the-enterprise.html
Wharton/GBK Collective, “Gen AI Fast-Tracks Into the Enterprise: Year Three,” October 2025. Survey of 800+ enterprise leaders. Source for 72% ROI measurement rate and “accountable acceleration” framing. Credibility: academic-industry partnership; third year of longitudinal tracking; moderate sample size. https://knowledge.wharton.upenn.edu/special-report/2025-ai-adoption-report/
McKinsey, “The State of AI in 2025,” November 2025. Survey of 1,993+ respondents. Source for workflow redesign as strongest EBIT predictor and 5.5% of organizations reporting AI contributing more than 5% of EBIT. Credibility: independent consulting survey; large sample; consistent annual methodology. https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai
Gartner, “Lack of AI-Ready Data Puts AI Projects at Risk,” February 2025. Source for 60% project failure prediction due to data problems. Also: “30% of GenAI projects abandoned after POC by end of 2025” (July 2024) and 280% average cost overrun from pilot to production. Credibility: independent analyst firm; predictions based on proprietary research methodology. https://www.gartner.com/en/newsroom/press-releases/2025-02-26-lack-of-ai-ready-data-puts-ai-projects-at-risk
Nebuly, “Defining Adoption Benchmarks for Enterprise AI: What Good Looks Like at 30, 60, and 90 Days,” 2025. Source for 40-60% activation threshold, 15-25% daily usage benchmark, and 70-80% retention markers. Credibility: AI platform vendor; benchmarks based on platform analytics data; useful directional guidance but vendor-sourced. https://www.nebuly.com/blog/defining-adoption-benchmarks-for-enterprise-ai-what-good-looks-like-at-30-60-and-90-days
Fortune, “In 2026, CEOs Must Prove AI Is Powering Growth,” December 2025. Source for CEO accountability framing and ServiceNow CFO quote on AI proof requirements. Credibility: business journalism; editorial framing, not primary research. https://fortune.com/2025/12/09/ai-in-2026-roi-growth-ceos-cost-cutting-layoffs/
HBR, “Most AI Initiatives Fail. This 5-Part Framework Can Help,” November 2025. Ayelet Israeli and Eva Ascarza (Harvard Business School). Source for 5Rs framework and case study of gen AI customer service scaling from <3% to 60% of interactions in 6 months. Credibility: academic authors in peer-reviewed practitioner journal; limited sample size for case studies. https://hbr.org/2025/11/most-ai-initiatives-fail-this-5-part-framework-can-help
BCG, “AI at Work 2025: Momentum Builds, But Gaps Remain,” June 2025. Source for frontline adoption plateau data and 4+ tool productivity collapse finding. Credibility: independent consulting survey; large global sample; consistent annual methodology. https://www.bcg.com/publications/2025/ai-at-work-momentum-builds-but-gaps-remain

Brandon Sneider | brandon@brandonsneider.com March 2026