Google’s Gemini Code Assist: The Case Study That Isn’t
Executive Summary
- Google CEO Sundar Pichai claims 30% of Google’s new code is AI-generated (Q1 2025 earnings call, April 24, 2025) and that AI delivers a 10% engineering velocity increase — but the methodology behind both numbers is opaque. Google measures “increased engineering capacity in hours per week,” not output quality or organizational delivery.
- External customer case studies are thin. The strongest published result is ComplyAdvantage’s 37% reduction in development time — a single-company pilot with no disclosed sample size, control group, or duration. Delivery Hero deployed to 4,000 engineers with no published productivity metrics. HCLTech reports 25% faster development and 60% unit test productivity gain for one middleware team.
- Google’s own DORA 2025 research (n~5,000) directly contradicts the headline claims: a 25% increase in AI adoption correlates with a 1.5% reduction in delivery throughput and a 7.2% reduction in delivery stability. Individual tasks increase 21%, but organizational delivery does not improve.
- Independent reviews find Gemini Code Assist lags behind GitHub Copilot and Cursor in code completion quality, speed, and developer experience. Its 1M-token context window and SWE-bench scores (63.8% for Gemini 2.5 Pro; 78% for Gemini 3 Flash) are competitive on benchmarks but have not translated into market share dominance.
- The gap between Google’s earnings-call narrative and its published research tells a familiar story: vendor marketing runs ahead of evidence. For procurement decisions, treat the 30% and 10% claims as unverified.
The Earnings Call Claims
“30% of New Code Is AI-Generated”
Sundar Pichai first stated during Google’s Q3 2024 earnings call (October 29, 2024) that “more than a quarter of all new code at Google is generated by AI, then reviewed and accepted by engineers.” By the Q1 2025 call (April 24, 2025), the figure had risen to “more than 30%.”
What this actually means is unclear. “Generated by AI” could mean:
- Code completions accepted — a developer accepts an inline autocomplete suggestion. This is the most generous interpretation and the most likely one. GitHub Copilot reports similar acceptance rates (~30% of suggestions accepted).
- Full functions or modules generated — AI writes substantial blocks of logic. This would be a much stronger claim but is almost certainly not what’s being measured.
- AI-assisted code — code written by a human with AI suggestions influencing the output. The loosest definition.
Pichai has never defined the term publicly. No methodology paper exists. No sample size or confidence intervals have been disclosed. The claim was made during an earnings call — a context where executives highlight favorable metrics to investors.
Source credibility: Vendor marketing claim during investor presentation. No independent verification. Treat as unverified.
“10% Engineering Velocity Increase”
In mid-2025, Pichai stated that Google measures “how much engineering velocity has increased as a result of AI” and pegged the number at 10%. A Google spokesperson clarified the methodology: the company tracks “the increase in engineering capacity created, in hours per week, from the use of AI-powered tools.”
This is a more credible claim than the 30% headline for three reasons:
- It is modest — 10% is consistent with independent field studies that converge around 5-15% organizational productivity gains.
- It measures time savings, not raw output volume.
- Pichai explicitly called it “the most important” metric Google tracks internally, suggesting it receives serious measurement attention.
However, “engineering capacity in hours per week” still raises questions. Hours saved doing what? Are those hours reinvested productively? Does the metric account for the DORA-documented increase in code review burden? Google has not published the methodology or made the data available for independent review.
Source credibility: Vendor self-report with plausible methodology. Consistent with independent research. Directionally credible but unverified.
Published Customer Case Studies
Google’s case study portfolio for Gemini Code Assist is notably thin compared to GitHub Copilot’s (which has its own problems — see our analysis of Copilot’s economic claims). Here is what exists as of March 2026:
ComplyAdvantage — 37% Development Time Reduction
The strongest published result. ComplyAdvantage, a financial crime prevention company, piloted Gemini Code Assist and reported a 42% improvement in development time for the pilot group, scaling to a 37% median reduction in development time across broader deployment.
What’s missing: No disclosed sample size (how many developers?), no control group, no duration, no information about what “development time” means (feature delivery? coding hours? cycle time?), no quality metrics, no long-term follow-up. A single company’s self-reported results after choosing to publicize them — survivorship bias is a concern.
Source credibility: Customer testimonial published on vendor platform. No independent verification. Positive results self-selected for publication.
Delivery Hero — 4,000 Engineers, No Productivity Numbers
Delivery Hero deployed Gemini Code Assist to over 4,000 software engineers and data scientists across 15,000+ GitHub repositories. They won a 2025 DORA Award for “improving the developer experience.” Developer experience surveys showed “a clear rise in developer satisfaction with the quality and speed of code reviews” after upgrading to the Gemini 2.5 model.
What’s missing: No quantified productivity metrics. No before/after delivery data. No bug rate comparison. “Developer satisfaction” is a perception metric — the METR RCT (n=16, 246 tasks, July 2025) demonstrated a 39-percentage-point gap between perceived and actual productivity with AI tools. Delivery Hero’s scale is impressive; its evidence is not.
Source credibility: Vendor-platform case study. Perception metrics only. No measured outcomes.
HCLTech — 25% Faster Development, 60% Test Productivity
HCLTech’s case study describes an enterprise middleware team that achieved 25% faster development when creating services from scratch and a 60% improvement in unit test coverage productivity.
What’s missing: Team size not disclosed. “Creating services from scratch” is a favorable condition for AI tools — greenfield development on well-understood patterns is exactly where code generation excels. The 60% test coverage gain is plausible (test generation is a strong suit for current AI tools) but no quality assessment of the generated tests is provided. No long-term maintenance data.
Source credibility: Customer testimonial via systems integrator. Limited scope. Favorable task selection.
Renault Ampere — Qualitative Only
Renault’s Ampere EV subsidiary deployed Gemini Code Assist Enterprise with RAG over private codebases for Android automotive development. The case study describes benefits in onboarding (“explaining functions, summarizing modules”) and Android development (“automating boilerplate code, suggesting APIs”).
What’s missing: No quantified results whatsoever. No metrics of any kind. This is a deployment announcement, not an outcomes study.
Source credibility: Vendor case study. No metrics. Deployment confirmation only.
Capgemini — “Early Results”
Capgemini is described as “using Code Assist to improve software engineering productivity, quality, security, and developer experience, with early results showing workload gains for coding and more stable code quality.” No numbers are attached to any of these claims.
Source credibility: Vendor reference. No metrics. Marketing language only.
What Google’s Own Research Says
The most credible data on AI coding tool effectiveness bearing Google’s name comes from DORA — the DevOps Research and Assessment team Google acquired in 2018. The DORA team maintains methodological independence, and their findings frequently contradict the marketing narrative.
DORA 2025: The Productivity Paradox
The 2025 State of AI-Assisted Software Development report (n~5,000 respondents, 78 in-depth interviews) finds:
- 90% of developers now use AI coding tools — adoption is nearly universal.
- Individual output increases: 21% more tasks completed, 98% more pull requests merged.
- Organizational delivery does not improve: A 25% increase in AI adoption correlates with a 1.5% reduction in delivery throughput and a 7.2% reduction in delivery stability. Both of these are negative correlations.
- Code review burden expands: 60.2% of organizations report lead time for changes exceeding one day. PR size grows 154%.
- Trust remains low: Only 4% of respondents trust AI output “a great deal.” 30% trust it “a little” or “not at all.”
DORA AI Capabilities Model
DORA identified seven capabilities that determine whether AI adoption produces positive outcomes:
- Clear and communicated AI stance
- Healthy data ecosystems
- AI-accessible internal data
- Strong version control practices
- Working in small batches
- User-centric focus
- Quality internal platforms
The finding: “AI doesn’t fix a team; it amplifies what’s already there.” Strong teams benefit. Struggling teams get worse. This directly undermines the vendor pitch that AI tools are a universal productivity multiplier.
Source credibility: Independent research team within Google. Rigorous methodology (n~5,000). High credibility. Note the irony: Google’s most credible research contradicts Google’s marketing claims.
Independent Assessment
Benchmark Performance
Gemini’s underlying models show competitive benchmark scores:
| Model | SWE-bench Verified | Date |
|---|---|---|
| Gemini 3 Flash | 78.0% | 2026 |
| Claude Code (Sonnet 4) | 72.7% | 2025 |
| Gemini 2.5 Pro | 63.8% | March 2025 |
| OpenAI Codex (o3) | 69.1% | 2025 |
Gemini 3 Flash’s 78% score leads the field. But SWE-bench measures isolated task resolution, not the enterprise development lifecycle. The DORA data shows this distinction matters enormously.
Code Completion Quality
Independent reviews consistently rate Gemini Code Assist’s real-world performance below competitors:
- The New Stack found code completions where “both Copilot and Augment gave superior results” and noted Gemini was “slightly tardy” in response speed. During refactoring, Gemini “would suggest putting the deleted lines back in” — a significant workflow disruption.
- InfoWorld found Gemini “doesn’t seem to go off the rails as often as some of its competitors” but flagged it as “a little slower to respond in chat” and noted it “lacks multi-file and whole-repository code generation.”
- Developer surveys show GitHub Copilot at 49% professional developer usage with 72% satisfaction, while Gemini Code Assist has limited survey presence in the developer community.
Pricing Position
Gemini Code Assist pricing (as of early 2026):
- Standard: ~$19-22.80/user/month
- Enterprise: ~$45-54/user/month (with code customization on private repos)
Compared to GitHub Copilot Business ($19/user/month) and Copilot Enterprise ($39/user/month), Gemini’s Enterprise tier runs 15-38% more expensive. The Standard tier is priced competitively. Gemini’s advantage: 180,000 free code completions per month and a 1M-token context window (vs. Copilot’s 128K).
Key Data Points
| Metric | Claim | Source | Credibility |
|---|---|---|---|
| Google new code AI-generated | >30% | Pichai, Q1 2025 earnings call | Vendor claim, no methodology disclosed |
| Google engineering velocity increase | 10% | Pichai, mid-2025 | Vendor self-report, plausible methodology |
| ComplyAdvantage dev time reduction | 37% median | Google Cloud case study | Single company, no control group |
| Delivery Hero deployment scale | 4,000 engineers, 15,000 repos | Google Cloud case study | Scale confirmed, no outcomes |
| HCLTech development speed | 25% faster (greenfield) | HCLTech case study | Single team, favorable conditions |
| AI adoption impact on throughput | -1.5% per 25% adoption increase | DORA 2025 (n~5,000) | Independent research, high credibility |
| AI adoption impact on stability | -7.2% per 25% adoption increase | DORA 2025 (n~5,000) | Independent research, high credibility |
| Gemini 3 Flash SWE-bench | 78.0% | Google benchmark | Vendor benchmark, reproducible |
| Developer trust in AI (“a great deal”) | 4% | DORA 2025 (n~5,000) | Independent survey |
What This Means for Your Organization
Google’s Gemini Code Assist story follows the same pattern as GitHub Copilot’s: strong earnings-call claims, thin customer evidence, and the vendor’s own independent research contradicting the marketing narrative. The 30% code generation claim and the 10% velocity claim are not independently verifiable, and the case study portfolio is weaker than Copilot’s (which is itself problematic — see our Copilot economic impact analysis).
The most actionable data comes from DORA, and the message is uncomfortable for anyone selling AI coding tools: individual throughput rises, but organizational delivery metrics decline or stay flat. The culprit is the expanding review burden — more code generated means more code to review, test, debug, and maintain. Organizations that deploy AI coding tools without addressing the review bottleneck will generate more code without delivering more value.
If you are evaluating Gemini Code Assist specifically, the honest assessment: the underlying models are competitive (Gemini 3 Flash leads SWE-bench), the 1M-token context window is a genuine differentiator for large codebases, and the Google Cloud integration is natural for GCP-native shops. But the code completion experience trails Copilot and Cursor in day-to-day developer satisfaction, and the Enterprise tier pricing runs higher than Copilot Enterprise without demonstrably superior outcomes. The strongest reason to choose Gemini Code Assist is existing GCP investment, not proven productivity advantage.
The DORA AI Capabilities Model offers the real procurement insight: before arguing about which AI coding tool to buy, invest in the seven foundational capabilities that determine whether any tool will produce positive outcomes. If your version control practices are weak, your code review processes are already bottlenecked, or your teams are not working in small batches, no AI tool — Gemini, Copilot, or Cursor — will fix that. AI amplifies what’s already there.
Sources
-
Sundar Pichai, Alphabet Q3 2024 Earnings Call, October 29, 2024 — “more than a quarter of all new code at Google is generated by AI.” Vendor earnings call. https://fortune.com/2024/10/30/googles-code-ai-sundar-pichai/
-
Sundar Pichai, Alphabet Q1 2025 Earnings Call, April 24, 2025 — “more than 30% of new code is AI-generated.” Vendor earnings call. https://analyticsindiamag.com/ai-news-updates/sundar-pichai-says-over-30-of-code-at-google-now-ai-generated/
-
Google spokesperson, June 2025 — Google measures “increased engineering capacity in hours per week from AI-powered tools”; 10% engineering velocity increase. Vendor self-report. https://tech.yahoo.com/ai/articles/sundar-pichai-says-ai-making-141602013.html
-
DORA, “2025 State of AI-Assisted Software Development” (n~5,000, 78 interviews), 2025 — AI adoption correlates with -1.5% throughput, -7.2% stability. Independent research team (Google-owned). High credibility. https://dora.dev/research/2025/dora-report/
-
DORA AI Capabilities Model, 2025 — Seven capabilities that determine AI adoption outcomes. Independent research. High credibility. https://cloud.google.com/blog/products/ai-machine-learning/introducing-doras-inaugural-ai-capabilities-model
-
ComplyAdvantage/Google Cloud case study — 37% median development time reduction. Customer testimonial on vendor platform. No methodology disclosed. https://cloud.google.com/customers/complyadvantage
-
Delivery Hero/Google Cloud case study — 4,000 engineers, 15,000+ repos, DORA Award. Vendor case study. No quantified outcomes. https://cloud.google.com/customers/delivery-hero-ai
-
HCLTech case study — 25% faster development, 60% test productivity improvement. SI partner case study. Single team. https://www.hcltech.com/case-study/accelerating-middleware-innovation-with-gemini-code-assist
-
Renault Ampere/Google Cloud — Gemini Code Assist Enterprise with RAG for automotive software. Vendor case study. No metrics. https://cloud.google.com/blog/products/application-development/renault-groups-software-defined-vehicles-built-on-google-cloud
-
InfoWorld review, 2025 — “Good at coding” but “a little slower” and “lacks multi-file and whole-repository code generation.” Independent technical review. Moderate credibility. https://www.infoworld.com/article/3829347/review-gemini-code-assist-is-good-at-coding.html
-
Google, “Company-Wide AI Coding Guidance,” June 30, 2025 — Internal guidance emphasizing “maintaining rigor across code review, security, and maintenance.” Vendor internal policy. https://9to5google.com/2025/06/30/google-engineers-ai-code/
-
METR Randomized Controlled Trial (n=16, 246 tasks, July 2025) — 39-percentage-point gap between perceived and actual AI productivity. Independent RCT. Small sample, rigorous methodology. Referenced in DORA findings.
Created by Brandon Sneider | brandon@brandonsneider.com March 2026