The 60-Day AI Progress Check: Is Your Pilot Working, or Just Busy?

Brandon Sneider | March 2026


Executive Summary

  • Most AI pilots die quietly. They do not crash at launch — they drift for months in a gray zone where usage dashboards show green but the P&L shows nothing. The median failed AI project consumes 11 months and $4.2 million before termination (Pertama Partners, n=2,400+, 2025-2026). Day 60 is the last exit before that trajectory locks in.
  • The three metrics from the success metrics card (#11) — adoption rate, time saved per task, cost per outcome — produce nine diagnostic combinations at Day 60. Three signal “double down.” Three signal “restructure a specific dimension.” Three signal “prepare to terminate at Day 90.” This card maps each combination to a decision.
  • The productivity J-curve is real: MIT Sloan documents an initial productivity dip during AI adoption, followed by recovery and outperformance among firms that push through with workflow redesign. The question at Day 60 is not whether the numbers are perfect. It is whether the trajectory is bending in the right direction — and if not, whether the root cause is fixable in 30 days.
  • High adoption with no outcome change is the single most dangerous Day 60 signal. Faros AI documented this across 10,000+ developers: 21% more tasks per person, zero organizational throughput improvement (July 2025). If adoption is above 50% and the target metric has not moved, the bottleneck shifted downstream. The tool is fast. The process is broken.

Why Day 60 — Not Day 30, Not Day 90

Day 30 is too early. The Pertama Partners data shows a 2-4 week productivity dip during initial AI adoption. At Day 30, the learning curve is still active. Costs are rising. Adoption is unstable. Judging the pilot now penalizes normal adjustment.

Day 90 is too late. By Day 90, sunk costs create political gravity. The sponsor has committed credibility. The team has invested effort. MIT Sloan finds 73% of successful pilots never reach production — and the decision to extend rather than kill happens most often when the evaluation occurs too late for honest assessment.

Day 60 is the diagnostic window. The learning curve should be flattening. Early adopters have established patterns. The workflow has had five full weeks of real use after the initial adjustment period. Enough data exists to see a trajectory — not just a snapshot. And critically, 30 days remain to restructure before the Day 90 kill-or-scale decision.

Fortune’s analysis of companies escaping pilot purgatory (March 2026) documents the pattern at Eaton, Cisco, and Cox Automotive: the 90-day prove-and-scale model structures a Day 60 gate review as the decisive checkpoint. Cisco piloted its Working with AI program in 4 weeks, then scaled within 6 — because the gate structure forced an honest assessment before momentum replaced judgment.


The Three Metrics at Day 60

These are the same three metrics from the success metrics card (#11). At Day 60, each has a benchmark and a trajectory expectation.

Metric Day 60 Benchmark What “Healthy” Looks Like What “Warning” Looks Like
Adoption rate 50-70% of licensed users active weekly Rising from Day 30 baseline, 15-25% in daily use Flat or declining from Day 30, below 30%
Time saved per task Net savings after review/correction tax Positive net savings on 3+ target tasks, trend improving week-over-week Savings consumed by review time (the 37-40% review tax documented by Workday, 2026)
Cost per outcome Declining toward pre-AI baseline Month 2 cost below Month 1, trajectory toward baseline Flat or rising — costs have not begun to recover from the J-curve dip

The benchmarks draw from Nebuly enterprise AI adoption thresholds (2025), Worklytics cross-industry data (2025), and the Pertama Partners initiative tracking database (n=2,400+, 2025-2026).


The Nine Diagnostic Combinations

Read the three metrics together. The combination determines the decision.

Adoption rising. Time saved positive and growing. Cost per outcome declining.

This is the 5% pattern. The pilot is working. The learning curve is flattening. The workflow redesign is producing measurable results.

Decision: Confirm the Day 90 production path. Begin the security/compliance review with the CISO if not already underway. Draft the broader rollout training plan. Notify the CFO that the production budget conversation is coming.

The risk at this stage: Overconfidence. BCG (n=1,250, September 2025) finds that even among the 5% generating substantial value, organizations that scale without addressing data quality, integration architecture, and governance create the Pilot Trap — successful experiments that cannot survive production conditions. Production cost runs 2.8-3.8x the pilot budget (Pertama Partners, 2026). Confirm the production path is real before celebrating the pilot metrics.

Scenario 2: Adoption High, Time Saved Positive, Cost Flat — Restructure the Cost Model

People are using it. It is saving time. But the total cost per outcome has not improved.

This means the time savings are real but are being offset by costs the pilot budget did not anticipate: review and correction time, management overhead, workflow coordination, or the tool cost itself relative to the volume of outcomes.

Decision: Audit the fully loaded cost. The license fee is 4-17% of true AI deployment cost (AlterSquare, 2026). The 23x multiplier means a $700/month/seat tool costs $16,100/month/seat when accounting for implementation, training, workflow redesign, review time, and management overhead. If the cost-per-outcome calculation only tracks the license, the metric is reporting a fiction.

The intervention: Recalculate cost per outcome with full loading. If the number is declining when fully loaded, the metric was just undercounted — proceed. If flat even fully loaded, identify which cost component is absorbing the time savings. Usually it is review overhead or downstream coordination — both fixable with workflow adjustment in the remaining 30 days.

Scenario 3: Adoption High, Time Saved Flat, Cost Declining — Investigate Measurement

An unusual pattern. Costs declining without measurable time savings suggests the cost reduction is coming from somewhere other than the AI tool — headcount changes, process simplifications, or seasonal volume shifts. Alternatively, time savings are real but the measurement methodology is not capturing them.

Decision: Validate the data. Are the cost reductions attributable to the AI deployment, or are they coincidental? Is the time-saved measurement capturing net savings (after review time) or missing savings that accumulate in non-obvious ways?

Scenario 4: Adoption High, Time Saved Flat, Cost Flat — The Workflow Bypass

This is the most common Day 60 failure pattern. People are using the tool. It is not producing measurable results. This is the Workflow Bypass documented across every major study.

Faros AI’s data (n=10,000+, July 2025) shows the mechanism: AI accelerated individual tasks by 21%, but the bottleneck moved downstream. Pull requests increased 98%. Review queues grew. Organizational delivery speed: zero change. The tool made one step faster. The process absorbed the speed.

HBR’s research (n=2,000+, Fall 2025) adds a psychological dimension: employees with high AI anxiety use AI tools more than low-anxiety colleagues (65% of tasks vs. 42%) but score 4.6 on a 5-point resistance scale. High adoption can mask performative compliance — using the tool to be seen using it, without integrating it into the actual workflow.

Decision: Map the workflow end-to-end. Find where the output of the AI-assisted step goes next. That is the new bottleneck. Redesign that step — or accept that this workflow does not benefit from AI at the system level. McKinsey (n=1,993, July 2025) finds high performers are 2.8x more likely to have fundamentally redesigned workflows. The tool is not the project. The workflow redesign is.

The 30-day test: If the bottleneck can be identified and the downstream step can be restructured within 30 days, restructure and re-evaluate at Day 90. If the bottleneck is structural — cross-departmental handoffs, regulatory approval chains, client review cycles — the pilot targeted the wrong workflow. Terminate at Day 90 and redeploy to a workflow where end-to-end redesign is within the pilot owner’s authority.

Scenario 5: Adoption Declining, Time Saved Positive, Cost Declining — The Champion Problem

The tool works for those who use it. Fewer people are using it.

This is the Sponsorship Fade or Culture Collision pattern manifesting at the adoption level. Early adopters captured real value. The broader team has not followed. Pertama Partners finds projects with sustained executive sponsorship achieve 68% success versus 11% without. BCG (n=10,635, June 2025) isolates leadership support as a 3.7x multiplier.

Decision: Check three things in this order:

  1. Has the sponsor used the tool this month? If no, the signal is clear. Employees watch what leaders do.
  2. Have the early adopters been visible? If the people saving time have not shared their experience, the broader team has no evidence the tool works.
  3. Is the training adequate? BCG finds employees with 5+ hours of hands-on AI training become regular users at 79% versus 67% with less. Deloitte confirms hands-on training produces 144% higher trust than passive instruction.

The 30-day test: Re-engage the sponsor (one real task, shared publicly). Run a 90-minute hands-on session for the lagging cohort focused on the 2-3 tasks where early adopters report the most time savings. If adoption does not recover within two weeks after these interventions, the resistance is structural — cultural, not informational.

Scenario 6: Adoption Declining, Time Saved Flat, Cost Rising — Prepare to Terminate

Fewer people using it. No measurable benefit for those who do. Costs increasing.

This is the clearest termination signal. The J-curve recovery has not materialized. The tool may be mismatched to the workflow, the workflow may not benefit from AI, or the organizational conditions for adoption are not present.

Decision: Do not extend. Use the remaining 30 days to document what was learned — which tasks the tool handled well, which it did not, what the adoption barriers were. This intelligence is more valuable than another month of declining metrics. Kill the pilot at Day 90. Reallocate the budget.

The discipline of termination is itself a signal of organizational maturity. Gartner (March 2026) advises treating AI as a portfolio of bets — routine productivity gains, targeted improvements, and selective transformational plays. Not every bet wins. The mature response is to cut losses early and redeploy, not to extend a failing experiment because the alternative is admitting the investment did not work.

Scenario 7: Adoption Flat, Metrics Mixed — The Plateau

Adoption stabilized in the 30-50% range. Time saved and cost metrics show marginal or inconsistent movement.

This is the most common overall pattern, and the most dangerous — because it produces neither a clear green light nor a clear red one. The pilot is not failing visibly. It is not succeeding measurably. It occupies the gray zone where projects consume 11 months through inertia.

Decision: Ask the pilot team the Day 60 diagnostic question (adapted from Pertama Partners): “If the next 30 days look exactly like the last 30, would the CFO approve production investment?”

If the answer is no, the remaining 30 days must change something specific — a workflow step, a training gap, a sponsorship action. Identify the single largest drag on the target metric and address it. If nothing changes in 30 days, terminate. The worst outcome is not a failed pilot. The worst outcome is a pilot that runs for nine months producing ambiguous data while consuming budget and organizational attention.


The Five-Minute Day 60 Diagnostic

Print this. Bring it to the Day 60 gate review. Answer each question with data, not impressions.

# Question Green Yellow Red
1 What is the adoption rate vs. Day 30? Rising, above 50% Flat, 30-50% Declining or below 30%
2 What is the net time saved per task (after review time)? Positive on 3+ tasks, improving Positive on 1-2 tasks, flat Zero or negative
3 Is cost per outcome declining from Month 1? Yes, trending toward baseline Flat Rising
4 Can the pilot owner, sponsor, and one team member independently state the success metric? All three align Two of three align No alignment
5 Has the executive sponsor used the tool in the last 30 days? Yes, on a real task Used it once at launch Has not used it

Scoring:

  • 5 green: Confirm the production path. Begin scaling preparation.
  • 3-4 green, 1-2 yellow: Restructure the lagging dimension. Identify the specific intervention. Set a 2-week check-in.
  • Any red: Address the red item immediately. If it is adoption or sponsorship, the fix is behavioral and can happen this week. If it is time-saved or cost, the fix is structural and may require workflow redesign.
  • 2+ red: Prepare to terminate at Day 90 unless a specific, actionable root cause is identified and addressable within 30 days.

Key Data Points

Metric Finding Source
Median cost and timeline of failed AI projects $4.2 million, 11 months Pertama Partners, n=2,400+, 2025-2026
Success rate with pre-defined metrics vs. without 54% vs. 12% (4.5x) Pertama Partners, n=2,400+, 2025-2026
Top performer response time to early warning vs. others 6 weeks vs. 5 months Pertama Partners, n=2,400+, 2025-2026
Individual task gains vs. organizational throughput 21% more tasks, 0% org improvement Faros AI, n=10,000+, July 2025
AI time savings consumed by review/correction 37-40% Workday, February 2026
High-anxiety employees AI usage vs. resistance 65% of tasks with AI, 4.6/5 resistance HBR, n=2,000+, Fall 2025
High performers that redesigned workflows 2.8x more likely McKinsey, n=1,993, July 2025
Success rate with sustained executive sponsorship 68% vs. 11% without Pertama Partners, n=2,400+, 2025-2026
Leadership support impact on employee AI sentiment 3.7x multiplier BCG, n=10,635, June 2025
CIOs breaking even or losing money on AI 72% Gartner, n=506, May 2025
Successful pilots that never reach production 73% MIT Sloan, 2024
True cost vs. license fee multiplier 23x AlterSquare, 2026

What This Means for Your Organization

The 60-day progress check is not a performance review — it is a diagnostic. The pilot is not on trial. The question is not “is AI working?” The question is “what specific pattern is our pilot running, and what does the pattern tell us to do next?”

The organizations in the top 5% do not produce perfect pilots. They produce pilots with honest feedback loops. Pertama Partners documents a 6-week median response time from early-warning signal to intervention among the highest performers, versus 5 months for the rest. The difference is not that they avoid problems — it is that they diagnose problems while 30 days remain to act, rather than 30 days after the budget is spent.

The nine scenarios above are not exhaustive, but they cover the patterns that account for the vast majority of Day 60 outcomes. Each points to a specific action — not a vague recommendation to “monitor closely.” If the diagnostic raised questions about which scenario fits your pilot, which intervention to prioritize, or how to structure the remaining 30 days before the Day 90 decision — that conversation is worth having now, not at Day 89. brandon@brandonsneider.com

Sources

  1. Pertama Partners — “AI Project Failure Statistics 2026.” n=2,400+ enterprise AI initiatives, 2025-2026. Source for $4.2M median sunk cost, 11-month median timeline, 54% vs. 12% metric success rate, 68% vs. 11% sponsorship impact, 6-week vs. 5-month response gap. Independent consulting analysis aggregating RAND, MIT Sloan, McKinsey, and Deloitte data. High credibility. https://www.pertamapartners.com/insights/ai-project-failure-statistics-2026

  2. Faros AI — “The AI Productivity Paradox.” n=10,000+ developers, 1,255 teams, July 2025. Source for 21% individual task gains with zero organizational throughput improvement, 98% more PRs with no delivery speed change. Vendor but observational telemetry data. High credibility. https://www.faros.ai/blog/ai-software-engineering

  3. McKinsey — “The State of AI in 2025.” n=1,993 respondents across 105 countries, June-July 2025. Source for 2.8x workflow redesign rate among high performers. Independent survey with consistent annual methodology. High credibility. https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai

  4. Workday — “Measure the ROI of AI With This One Weird Trick.” February 2026. Source for 37-40% review tax on AI time savings. Vendor-published platform data. Moderate-high credibility. https://www.workday.com/en-us/perspectives/finance/2026/02/measure-roi-of-ai.html

  5. HBR — “Why AI Adoption Stalls, According to Industry Data.” n=2,000+ respondents, Fall 2025. Source for high-anxiety employees using AI more (65% vs. 42%) with higher resistance (4.6 vs. 2.1), employee profile segmentation. Independent academic publication. High credibility. https://hbr.org/2026/02/why-ai-adoption-stalls-according-to-industry-data

  6. BCG — “AI at Work: Momentum Builds, but Gaps Remain.” n=10,635 across 11 countries, June 2025. Source for 3.7x leadership support multiplier, 5-hour training threshold (79% vs. 67% regular use). Independent survey. High credibility. https://www.bcg.com/publications/2025/ai-at-work-momentum-builds-but-gaps-remain

  7. BCG — “The Widening AI Value Gap: Build for the Future.” n=1,250 firms, September 2025. Source for 60% generating no material value, 5% creating substantial value at scale. Independent survey. High credibility. https://www.bcg.com/publications/2025/are-you-generating-value-from-ai-the-widening-gap

  8. Fortune — “From Pilot Mania to Portfolio Discipline.” March 19, 2026. Source for Eaton, Cisco, Cox Automotive 90-day prove-and-scale model, Cisco 4-week pilot to 6-week scale. Independent journalism. High credibility. https://fortune.com/2026/03/19/from-pilot-mania-to-portfolio-discipline-ai-purgatory/

  9. Gartner — “CFOs Need to Rethink the ROI of AI Investments.” March 24, 2026. Source for portfolio approach to AI investment, treating AI as a portfolio of bets rather than a single ROI problem. Independent analyst firm. High credibility. https://www.gartner.com/en/newsroom/press-releases/2026-03-24-gartner-says-cfos-need-to-rethink-the-roi-of-ai-investments

  10. Gartner — “5 AI Metrics That Actually Prove ROI to Your Board.” n=506 CIOs, May 2025. Source for 72% of CIOs breaking even or losing money on AI. Independent analyst firm. High credibility. https://www.gartner.com/en/articles/ai-value-metrics

  11. MIT Sloan — Pilot-to-production conversion and productivity J-curve research. 2024-2026. Source for 73% of successful pilots never reaching production, J-curve productivity dip during AI adoption. Academic institution. Very high credibility. https://mitsloan.mit.edu/ideas-made-to-matter/productivity-paradox-ai-adoption-manufacturing-firms

  12. Deloitte — “State of AI in the Enterprise 2026.” n=3,235 senior leaders, August-September 2025. Source for 37% surface-level AI use. Independent survey. High credibility. https://www.deloitte.com/us/en/what-we-do/capabilities/applied-artificial-intelligence/content/state-of-ai-in-the-enterprise.html

  13. AlterSquare — True cost analysis across 20+ client projects. 2026. Source for 23x cost multiplier. Practitioner data. Moderate credibility.

  14. Nebuly — “Defining Adoption Benchmarks for Enterprise AI.” 2025. Source for Day 30 and Day 60 activation thresholds. AI platform vendor benchmarks. Moderate credibility. https://www.nebuly.com/blog/defining-adoption-benchmarks-for-enterprise-ai-what-good-looks-like-at-30-60-and-90-days

  15. Worklytics — “2025 Benchmarks: What Percentage of Employees Use AI Tools Weekly.” 2025. Source for cross-industry adoption rate benchmarks. Analytics vendor aggregated data. Moderate credibility. https://www.worklytics.co/resources/2025-ai-adoption-benchmarks-employee-usage-statistics

  16. Erik Brynjolfsson / Stanford Digital Economy Lab — AI productivity analysis, February 2026. Source for 2.7% U.S. productivity increase in 2025 (double prior decade average), J-curve transition from experimentation to structural utility. Academic researcher. Very high credibility. https://fortune.com/2026/02/15/ai-productivity-liftoff-doubling-2025-jobs-report-transition-harvest-phase-j-curve/


Brandon Sneider | brandon@brandonsneider.com March 2026