Executive Summary
AI tools make individuals faster at solitary work. They do not, on their own, change how teams coordinate. The largest randomized field experiment on generative AI in real workplaces — 7,137 knowledge workers across six months at cross-industry firms — found a 31% reduction in email time for regular Copilot users, but zero measurable change in time spent in meetings, no shift in meeting types, and no change in document production volume. The authors’ conclusion is blunt: workers readily changed behaviors they could change alone, and did not change behaviors that required coordinating with colleagues.
The pattern repeats elsewhere in the corpus. The Faros Engineering Benchmark (2025) showed engineering teams with AI coding assistants produced 98% more pull requests but delivered no more product — the bottleneck moved from writing code to reviewing it. Microsoft’s own field study notes that “larger shifts in responsibilities require time and broad institutional efforts, not just local team coordination.”
For an operating executive, this means: deploying an AI tool and expecting team throughput to rise is the wrong mental model. The productivity savings show up in individual inboxes. Whether those savings translate into faster delivery, better decisions, or shorter meetings depends on whether leadership redesigns the coordination work itself — who owns which handoffs, what meetings exist, what approvals are required. That is organizational change, not a software rollout.
Key Data Points
| Metric | Value | Source | Date | Credibility |
|---|---|---|---|---|
| Workers in randomized field experiment | 7,137 | NBER WP 33795 (Dillon, Jaffe, Immorlica, Stanton) | May 2025 | HIGH — cross-industry RCT, 6-month panel, Microsoft-funded but independent analysis by HBS co-author |
| Study duration | 6 months | Same | May 2025 | HIGH |
| Email time reduction (regular Copilot users, LATE) | -3.56 hrs/week (-31%) | Same, Table 2 | May 2025 | HIGH |
| Email time reduction (ITT, all treated) | -1.29 hrs/week | Same | May 2025 | HIGH |
| Meeting time change | 0 (0.19 hrs, not significant) | Same, Table 2 | May 2025 | HIGH |
| Recurring meeting time change | 0 (0.058 hrs, not significant) | Same | May 2025 | HIGH |
| Documents completed change | 0 (0.065, not significant) | Same | May 2025 | HIGH |
| Emails replied to change | 0 (-0.24, not significant) | Same | May 2025 | HIGH |
| Reply speed change | No significant change | Same | May 2025 | HIGH |
| Collaborative document time-to-complete reduction | ~25% (but only in docs with a primary editor + secondary contributors) | Same | May 2025 | MEDIUM — significant at q<0.05, small subsample (n=1,910) |
| Firm-level Copilot usage range | 6.3% to 75% of weeks | Same | May 2025 | HIGH |
| Strongest predictor of adoption | Firm identity (not industry, not pre-experiment behavior, not coworker share) | Same, Table 1 | May 2025 | HIGH |
| Coworker share with Copilot — effect on own usage | Not significant after firm fixed effects | Same | May 2025 | HIGH |
| Engineering PRs produced with AI assistants | +98% | Faros Engineering Benchmark | 2025 | MEDIUM — vendor data, large sample, cross-validated by independent engineering leadership reports |
| Delivery throughput change with AI assistants | ~0 | Same | 2025 | MEDIUM |
Temporal weighting
Tier 1 evidence. The NBER paper was released May 2025 covering a study that ran through late 2024. The 6.3%–75% usage spread across firms means the headline averages understate what happens in high-adoption environments, but the pattern — individual savings, zero coordination shift — held across firms regardless of adoption level.
One important caveat the authors raise: during the study period, very few colleagues of treated workers also had Copilot. The authors test whether workers with more Copilot-equipped close coworkers changed coordination behaviors more than those without, and find “few substantive differences.” This is not evidence that full-team coverage would change things — but it is evidence that organic, bottom-up coordination change did not emerge in the 6-month window, even when multiple team members had the tool.
What This Means for Your Organization
Three operating principles follow from this evidence.
1. Price the pilot correctly. If you are evaluating an enterprise AI rollout and the business case rests on team-level throughput — shorter project cycles, faster cross-functional decisions, fewer handoffs — the evidence does not support that case without a parallel investment in workflow redesign. The defensible business case is individual time recovery: ~1 to 3.5 hours per week per user, concentrated in solitary tasks (email, document drafting, research). That is real. Scaling it to team-level outcomes requires separate work.
2. Identify your coordination bottlenecks before you deploy. If your engineering team has 8 engineers and 2 senior reviewers, and you deploy Copilot to the 8 engineers, you will produce more code and review it at the same rate. The bottleneck moves to review. Same logic applies to legal (junior drafts, partner reviews), sales (AE emails, manager approvals), and product (PM specs, engineering capacity). The AI makes the upstream step cheaper; the downstream step now constrains everything.
3. Treat workflow redesign as a prerequisite, not a sequel. The NBER authors are explicit: behaviors requiring coordination did not change. That includes meeting length, meeting frequency, recurring meeting cadence, and work allocation across team members. These are decisions leaders make, not tools users pick up. A 300-person company getting real team-level leverage from AI in 2026 has leadership that redesigned meeting structures, approval chains, and role definitions before or during the rollout — not after.
For mid-market companies running structured AI adoption against these patterns, the question for the operating team is which coordination points — specifically which meetings, approvals, and handoffs — get redesigned first. That sequencing work is what separates the firms reporting substantial gains from the majority reporting measured individual time savings and little else. If you want help scoping that sequencing work for your organization, reach Brandon at brandon@brandonsneider.com.
Sources
- Dillon, E. W., Jaffe, S., Immorlica, N., & Stanton, C. T. (2025). Shifting Work Patterns with Generative AI. NBER Working Paper No. 33795. https://www.nber.org/papers/w33795 — 7,137 knowledge workers, 6-month RCT, cross-industry, May 2025. HIGH credibility (rigorous design, published identification, author mix includes independent HBS researcher).
- Faros AI. (2025). Engineering Benchmark Report. https://www.faros.ai — engineering team productivity data showing +98% PRs with no delivery throughput change. MEDIUM credibility (vendor-published, but large cross-firm sample, corroborated by METR RCT and CMU complexity study).
- MIT CISR. (2025–2026). Enterprise AI Maturity research series, Woerner et al. https://cisr.mit.edu/publication/2025_0801_EnterpriseAIMaturityUpdate_WoernerSebastianWeillKaganer — identifies cross-functional coordination structures as a condition for Stage 2→3 maturity gains. HIGH credibility.
Brandon Sneider | brandon@brandonsneider.com
April 2026