See also (wiki): productivity-rcts · workflow-redesign · assistive-to-agentic-shift
Vendor caveat: These findings are Anthropic-published and represent the vendor’s own economic methodology. They have not been independently replicated. Cross-reference against: METR RCT (experienced developers 19% slower, July 2025), Fed Atlanta CFO Survey (0.6% measured vs. 1.8% perceived productivity gain, n=748, March 2026), Stanford AI Index 2026.
Executive Summary
- Users who have worked with Claude for 6+ months show 3–4 percentage points higher task success rates after controlling for what tasks they’re doing — meaning the gain is not just that experienced users pick easier tasks. AI proficiency is an accumulating skill, not a switch that flips on deployment day.
- The concentration finding: Computer and Mathematical occupations account for 35% of Claude usage and see the highest model-capability selection (34% of software developer tasks use Opus, the most capable tier). Economic value is not evenly distributed — it concentrates in technically complex work done by already-high-earning workers.
- Model selection tracks task value: for every $10 increase in the hourly wage of the task being performed, Opus selection rises 1.5 percentage points on Claude.ai and 2.8 percentage points on the API. Organizations treating AI model selection as a uniform IT cost decision are mispricing their highest-value workflows.
- The automation surface is shifting: business sales outreach automation and automated trading workflows both more than doubled (November 2025 → February 2026), and coding has migrated from chat interfaces to API/agent architectures. Organizations measuring AI adoption by seat count on a chat UI are measuring the wrong thing.
- This inverts the Brynjolfsson finding in a specific way: Brynjolfsson’s GPT-3-era customer support study showed AI compresses the novice-to-expert gap (novices gain 34%, experts gain little). The Anthropic Learning Curves data shows that on open-ended AI usage, the learning curve runs the other direction — experienced users keep accumulating advantage. The two findings are not contradictory: bounded, rule-following tasks favor the novice curve; open-ended, iterative AI work favors the experience curve.
Background: Two Different Studies, Two Different Curves
Two distinct empirical findings bear on the question of who captures AI value as capability scales:
Brynjolfsson, Li, and Raymond (2023/2025): 5,172 customer support agents, GPT-3-era tool, RCT with staggered rollout. Novices (under two months tenure) gained 34% productivity; the most experienced agents gained almost nothing. Mechanism: the AI encoded expert knowledge and delivered it to novices in real time. Bounded, rule-following task type. Published in QJE February 2025. Source credibility: HIGH.
Anthropic Economic Index: Learning Curves (March 2026): 1 million Claude conversations, February 5–12, 2026. Observational analysis via CLIO (privacy-preserving classification). Open-ended, user-initiated task variety. Experienced users (6+ months) show 3–4pp higher task success after full controls. Source credibility: MEDIUM (vendor-primary, transparent methodology, no independent replication).
The curves run in different directions because the task types are fundamentally different. Customer support in 2022 was a bounded, rule-following task where the AI could substitute for missing knowledge. Open-ended knowledge work in 2026 — coding, analysis, business operations — requires skill at prompting, iteration, and knowing when to trust or override the model. That skill accumulates.
Finding 1: The Learning Curve Is Real and Survives Controls
Anthropic’s analysis compared users who signed up for Claude 6+ months ago against newer users, controlling progressively for what tasks they are performing:
| Analysis Level | Success Rate Advantage |
|---|---|
| Raw (no controls) | +10% |
| Bivariate (simple tenure vs. success) | +5 pp |
| Controlling for specific task type | +3 pp |
| Full controls model | +4 pp |
The 3–4 percentage point residual means experienced users genuinely succeed at tasks more often, even on the same kinds of tasks. This is not explained by task selection alone.
Additional markers of experience accumulation:
- Prompts are more sophisticated: 6% higher measured education level in inputs from high-tenure users
- Work focus increases: 7 percentage points more work-focused usage vs. personal
- Task diversity increases: high-tenure users’ top-10 tasks represent 20.7% of their usage vs. 22.2% for newer users — they are not locked into a narrow repertoire
What this means for rollout design: Deployment is not the finish line. An organization that deployed in Q1 2025 and another that deploys in Q1 2026 are not operating the same tool. The early cohort has a compounding head start that does not close on its own. This reinforces BCG’s finding (n=21,000+, AI at Work 2025) that the 5-hour engagement threshold separates regular users (79% positive ROI) from occasional ones (67%) — but Anthropic’s data suggests the curve extends well past that initial threshold.
Finding 2: Value Is Concentrated, and Model Selection Reveals Where
Not all occupations experience AI equally. From the February 2026 data:
- Computer and Mathematical occupations: 35% of Claude.ai usage. Nearly half of all job categories (49%) have at least one-quarter of their tasks performed with Claude — but Computer/Math is not “typical” usage, it is the dominant use case.
- Opus selection rate by task type: Software Developer tasks — 34% Opus. Tutor tasks — 12% Opus. The model tier selected is a real-time proxy for task complexity and value.
- The $10-per-hour wage-to-Opus rule: For every $10 increase in the hourly wage of the underlying task, Opus selection rises 1.5 percentage points on Claude.ai and 2.8 percentage points on the API. At a $100/hour knowledge work task, that’s a 15–28 percentage point Opus premium over a $0/hour task — roughly doubling the Opus selection rate.
- API vs. Claude.ai correlation: API users show approximately 2x stronger correlation between task complexity and model selection. API users are systematically more intentional about matching model capability to task value.
The occupation-level concentration is not a temporary condition. The November 2025 aggregate productivity paper (same research group) found software developers account for 19% of the projected aggregate US productivity gain, despite being a small share of the workforce. Concentration in high-value technical work is the structural pattern.
For CIO/CFO evaluation: A uniform enterprise license that treats all users identically — same model tier, same seat price — is systematically underpriced for high-complexity workflows and overpriced for low-complexity ones. The data supports tiered procurement aligned to actual task value, not headcount.
Finding 3: The Automation Surface Is Shifting Faster Than Seat Counts Suggest
Between November 2025 and February 2026:
| Metric | Nov 2025 | Feb 2026 | Direction |
|---|---|---|---|
| Top 10 tasks share (Claude.ai) | 24% | 19% | Diversifying |
| Personal use share | 35% | 42% | Increasing |
| Coursework share | 19% | 12% | Decreasing |
| Business sales outreach automation | baseline | >2x growth | Accelerating |
| Automated trading/market operations | baseline | >2x growth | Accelerating |
| Coding on Claude.ai | declining | — | Migrated to API |
The shift that matters most: coding has migrated off the Claude.ai chat interface and into API/agent workflows. Claude Code is the canonical form. Business outreach and trading automation are growing at the API layer, not the chat layer.
This is consistent with the broader assistive-to-agentic transition visible across the corpus (McKinsey March 2025, Google Cloud ROI of AI 2025, MIT CISR Digital Colleagues 2026), but the Anthropic data provides a specific mechanism: as users become more experienced, they build automated pipelines, not just have better conversations. The learning curve and the automation migration are the same phenomenon at different stages.
The measurement error: An organization with 500 Claude.ai seats and 20 API integrations is probably generating most of its AI value at the API layer — but measuring adoption by the 500 seats. The 20 integrations are where the automation compounds. Any productivity accounting that ignores the API surface is measuring roughly 20% of what matters.
Finding 4: The Brynjolfsson Interaction — Two Curves, Not One
The question the queue item asks — how does the learning curve interact with Brynjolfsson’s skill-gap finding — has a specific answer:
Brynjolfsson’s curve (GPT-3 era, bounded tasks): AI compresses the novice-to-expert gap. Novices gain the most; experts gain little. The mechanism is AI as a knowledge-dissemination tool — it encodes what experts know and delivers it to everyone.
Anthropic’s curve (Claude 3.x/4 era, open-ended tasks): AI expands the expert-to-novice gap over time. Users with more experience accumulate a compounding advantage in success rate, prompt sophistication, and task diversity.
These two findings are not contradictory. They describe different:
- Task types: Brynjolfsson — bounded, rule-following (customer support scripts). Anthropic — open-ended knowledge work (coding, analysis, business operations).
- Model generations: GPT-3 vs. Claude 3.5/3.7.
- Time horizon: Brynjolfsson measures the short-run adoption effect. Anthropic measures the 6-month+ accumulation effect.
The practical implication: for organizations deploying AI in bounded workflows (compliance review against defined rules, form extraction, scripted customer service), the Brynjolfsson curve likely applies — deploy broadly, the novice cohort benefits most. For organizations deploying AI in open-ended knowledge work (software development, strategic analysis, complex writing), the Anthropic curve applies — early cohorts compound advantages that late adopters cannot easily close.
Most mid-market companies operate both types of workflows. The deployment strategy should differ by workflow type.
Key Data Points
| Finding | Value | Source | Date | Tier |
|---|---|---|---|---|
| High-tenure user success advantage (raw) | +10% | Anthropic Economic Index, n=1M | Mar 2026 | TIER 1 (recent) |
| High-tenure user success advantage (after full controls) | +3–4 pp | Anthropic Economic Index, n=1M | Mar 2026 | TIER 1 (recent) |
| Computer/Math share of Claude.ai usage | 35% | Anthropic Economic Index, n=1M | Mar 2026 | TIER 1 (recent) |
| Software Developer Opus selection rate | 34% | Anthropic Economic Index, n=1M | Mar 2026 | TIER 1 (recent) |
| Opus selection increase per $10 wage (Claude.ai) | +1.5 pp | Anthropic Economic Index, n=1M | Mar 2026 | TIER 1 (recent) |
| Opus selection increase per $10 wage (API) | +2.8 pp | Anthropic Economic Index, n=1M | Mar 2026 | TIER 1 (recent) |
| Business outreach automation growth (3 months) | >2x | Anthropic Economic Index, n=1M | Mar 2026 | TIER 1 (recent) |
| Prompt sophistication increase with tenure | +6% education level | Anthropic Economic Index, n=1M | Mar 2026 | TIER 1 (recent) |
| METR RCT: experienced developer speed | −19% (slower) | METR, n=16, 246 tasks | Jul 2025 | TIER 2 |
| Brynjolfsson: novice customer support gain | +34% | NBER/QJE, n=5,172 | 2023/2025 | TIER 4 (GPT-3 era) |
| Fed Atlanta: measured AI productivity gain | +0.6% revenue-based | Fed Atlanta/Richmond/Duke, n=748 | Mar 2026 | TIER 1 |
Source credibility: MEDIUM — Vendor-primary (Anthropic publishing data about its own product). Methodology is transparent (CLIO system, O*NET classification, fixed-effects regression, p<0.001). Data available on Huggingface for independent analysis. No independent replication as of April 2026. The direction of findings (experience compounding, concentration in high-skill work, API migration) is consistent with independent evidence; the magnitudes should be treated as upper bounds.
What This Means for Your Organization
Three questions determine whether these findings are actionable for your AI strategy.
First, which curve are you on? Bounded workflows (extraction, classification, scripted responses) show the Brynjolfsson pattern — broad deployment benefits the novice cohort most. Open-ended knowledge work (development, analysis, writing) shows the Anthropic pattern — early cohorts compound advantages. Most mid-market organizations have both. Mapping your workflows to the appropriate curve before selecting a deployment strategy is more consequential than the AI tool selection itself.
Second, are you measuring the right surface? If your AI measurement stops at chat UI seat counts, you are likely missing the highest-value workflows. API integrations, agent pipelines, and automated workflows running at the API layer will overtake chat UI usage in economic value faster than adoption metrics suggest — the Anthropic data indicates this is already happening at the platform level. A quarterly review of API vs. chat UI usage, broken out by workflow type, is a better adoption signal than seat count or license utilization.
Third, are you pricing model tiers correctly? The $10-per-hour Opus selection rule suggests that high-complexity, high-value workflows should be running on the most capable model tier available — and that the per-seat cost delta between tiers is routinely small relative to the task value differential. CFOs benchmarking AI licensing costs on a flat per-seat basis are underinvesting in high-value workflows and overinvesting in low-value ones. A task-value-based tier mapping takes about a half-day with your AI vendor; the ROI calculation is usually straightforward.
If this raised questions specific to your organization’s deployment strategy or model tier procurement, a short conversation is often the fastest way to make the analysis concrete — brandon@brandonsneider.com.
Sources
-
Anthropic Economic Index: Learning Curves — Massenkoff, Lyubich, McCrory, Appel, Heller. Published March 24, 2026. Data period: February 5–12, 2026. n=1,000,000 conversations from Claude.ai and first-party API. Methodology: CLIO privacy-preserving classification system, O*NET occupational taxonomy, fixed-effects regression. URL: https://www.anthropic.com/research/economic-index-march-2026-report. Dataset: Huggingface Anthropic/EconomicIndex. Credibility: MEDIUM — vendor-primary, transparent methodology, no independent replication.
-
METR AI Productivity RCT — METR. Published July 2025. n=16 experienced developers, 246 tasks, 2,100 hours. Result: −19% task completion speed (slower than unassisted). Models: Claude 3.5/3.7 Sonnet, GPT-4o. URL: https://metr.org/blog/2025-07-10-measuring-ai-productivity/. Credibility: HIGH — independent RCT, experienced population, real production tasks, pre-registered methodology. Cross-reference against perception gap: developers perceived themselves as +20% faster.
-
Brynjolfsson, Li, and Raymond — “Generative AI at Work” — NBER Working Paper w31161 (April 2023). Published in The Quarterly Journal of Economics, Vol. 140, Issue 2 (February 2025). n=5,172 customer support agents. GPT-3-era tool. RCT with staggered rollout. URL: https://doi.org/10.1093/qje/qjae043. Credibility: HIGH (for mechanism), TIER 4 (for magnitude) — independent academic RCT, peer-reviewed, no vendor funding. Model generation limits magnitude applicability to current deployments.
-
Fed Atlanta/Richmond/Duke Business Uncertainty Survey — March 2026. n=748 CFOs and senior finance executives. AI productivity gain: 1.8% perceived, 0.6% revenue-based measurement. URL: https://www.atlantafed.org/research/surveys/business-uncertainty. Credibility: TIER 1 — independent Federal Reserve data collection, no vendor affiliation.
-
Stanford AI Index 2026 — April 2026. URL: https://aiindex.stanford.edu/report/. Credibility: TIER 1 — independent academic institution, no vendor funding for core index.
Brandon Sneider | brandon@brandonsneider.com April 2026