Enterprise IP Concerns with AI-Generated Code: The Ownership Gap Nobody Planned For

Executive Summary

  • AI-generated code sits in a legal no-man’s land. The U.S. Copyright Office confirmed in January 2025 that purely AI-generated material is not copyrightable. Prompt engineering alone does not establish authorship. Companies relying on AI-generated code without substantial human modification may own nothing protectable.
  • License contamination is at historic highs. Black Duck’s 2026 OSSRA report (n=947 commercial codebases) found 68% contain open-source license conflicts — the largest single-year jump ever recorded — driven partly by AI tools generating code snippets derived from copyleft sources without retaining license information.
  • Vendor indemnification is narrower than the marketing suggests. Microsoft and Google offer IP indemnification for their AI coding tools, but the conditions — enabled filters, unmodified suggestions, no intentional infringement — create exceptions large enough to swallow the protection for most real-world development scenarios.
  • The litigation landscape is accelerating. The $1.5 billion Bartz v. Anthropic settlement (August 2025) — the largest copyright settlement in U.S. history — signals that training-data IP claims carry real financial weight. Doe v. GitHub remains unresolved, with discovery ongoing into 2026.
  • Trade secrets, not copyright, are becoming the primary IP protection strategy for AI-assisted codebases. But AI tools themselves create new trade secret risks by potentially exposing proprietary code to model training pipelines.

The foundational problem: U.S. copyright law requires human authorship. The Copyright Office’s Part 2 report (January 29, 2025) made this explicit — copyright protection requires “human creative contribution,” and “material generated wholly by artificial intelligence is not eligible for copyright protection.”

The practical implications are stark. Register of Copyrights Shira Perlmutter stated: “Where that creativity is expressed through the use of AI systems, it continues to enjoy protection. Extending protection to material whose expressive elements are determined by a machine, however, would undermine rather than further the constitutional goals of copyright.”

Three categories emerge from the Copyright Office’s analysis:

Copyrightable: Code where a human developer uses AI as a tool, then substantially modifies, selects, arranges, or transforms the output. The human’s creative decisions — not the AI’s generation — are what receives protection. The Copyright Office analogized this to an artist who uses a specialized tool to create, provided the human controls the expressive elements.

Not copyrightable: Code generated entirely by AI from prompts, even detailed prompts. The Office found that “the gaps between prompts and resulting outputs demonstrate that the user lacks sufficient control over the conversion of their ideas into fixed expression.” A natural-language instruction like “create a modern dashboard with clean animations” conveys an unprotectable idea, not protectable expression.

Gray zone: Code generated by AI and partially edited by humans. The Office acknowledges this requires “case-by-case analysis” with no bright-line rule. This is where most enterprise AI-assisted development falls — and where the legal uncertainty is highest.

The D.C. Circuit Court of Appeals affirmed the human authorship requirement in March 2025, and denied rehearing en banc in May 2025. The legal principle is settled. The operational question — how much human involvement is enough — is not.

The Vibe Coding Problem

The rise of “vibe coding” — using high-level natural language to generate entire applications — makes this worse. A Vorys analysis (2025) argues that vibe-coded software may be functionally uncopyrightable because the human contribution consists primarily of conveying unprotectable ideas and high-level directives, not protectable expression. If a competitor reverse-engineers your vibe-coded application, you may have no legal recourse beyond trade secret claims.

This matters for competitive positioning. Companies that allow extensive AI code generation without documented human creative involvement are building assets they cannot legally protect.

The License Contamination Problem

AI coding tools introduce a novel IP risk: license laundering. Models trained on open-source code can generate snippets derived from copyleft-licensed sources (GPL, AGPL) without retaining the original license information. A developer accepts the suggestion, commits it, and the company now has undisclosed license obligations embedded in proprietary code.

Black Duck’s 2026 OSSRA report quantifies the problem:

  • 68% of audited codebases contain open-source license conflicts (up from 56% the prior year — a 12-percentage-point jump, the largest in OSSRA history)
  • One codebase contained 2,675 distinct license conflicts
  • Only 54% of organizations evaluate AI-generated code for IP and licensing risks
  • 76% of companies that prohibit AI coding tools acknowledge their developers use them anyway

The governance gap is specific: 76% of organizations check AI-generated code for security risks, but only 24% perform comprehensive IP, license, security, and quality evaluations. Companies are scanning for vulnerabilities while ignoring the license obligations baked into the same code.

Black Duck’s 2026 report also found that open-source component counts increased 30% year-over-year and files per codebase grew 74% — both trends accelerated by AI-assisted development generating more code faster, with less human review of what’s being introduced.

Vendor Indemnification: Read the Fine Print

Most major AI coding tool vendors now offer some form of IP indemnification. The coverage varies significantly.

Microsoft (GitHub Copilot, M365 Copilot): The broadest commitment. Microsoft’s Copilot Copyright Commitment (September 2023, still in effect) states the company will defend customers and pay damages for copyright claims arising from unmodified Copilot suggestions — provided the customer has content filters enabled. GitHub Copilot Business and Enterprise do not use customer code for model training, and include a duplicate detection filter to prevent suggestions matching public code.

The catch: Attorney Kate Downing’s analysis notes that the exceptions — requiring unmodified suggestions, enabled filters, and no knowledge of potential infringement — “are so large they threaten to swallow the indemnity whole.” Most enterprise developers modify AI suggestions before committing. Modified code falls outside the indemnification.

Google (Gemini Code Assist): Similar indemnification for outputs from paid Gemini products, with comparable conditions around enabled safety features and authorized use.

OpenAI (ChatGPT Enterprise): OpenAI’s “Copyright Shield” covers customers against copyright claims, but the company’s liability is capped at “the amount [a customer] paid for [an OpenAI] service that gave rise to [a] claim during the 12 months before the liability arose or $100.” For most enterprises, this is functionally zero protection relative to the potential exposure.

Anthropic (Claude Enterprise): Indemnifies enterprise customers for IP claims tied to authorized use of Claude, subject to exclusions for misuse or knowing infringement.

Open-source tools (Aider, Continue.dev, Cline): No indemnification. No vendor to indemnify. The IP risk falls entirely on the deploying organization.

The bottom line: vendor indemnification is a marketing feature, not a legal strategy. No vendor’s indemnification covers the full spectrum of how enterprises actually use AI-generated code — modified, combined with human code, integrated into proprietary systems.

The Litigation Landscape

Three active cases define the boundaries:

Bartz v. Anthropic (N.D. Cal.): Settled for $1.5 billion in August 2025 — the largest copyright settlement in U.S. history. Three authors sued Anthropic for training Claude on pirated books from shadow libraries (Library Genesis, Pirate Library Mirror). Judge Alsup ruled in June 2025 that training on copyrighted works acquired legally is fair use, but training on pirated copies is not. After class certification expanded the case to represent approximately 500,000 copyrighted works, Anthropic settled at roughly $3,000 per infringed work. Final approval hearing scheduled for April 23, 2026.

Doe v. GitHub (N.D. Cal.): Filed November 2022 against GitHub, Microsoft, and OpenAI. Plaintiffs allege Copilot was trained on their open-source code and strips copyright notices from outputs. As of January 2026, the court dismissed most claims — including DMCA Section 1202(b) violations and core copyright infringement — finding that Copilot’s code snippets are “not similar enough” to plaintiffs’ code. Surviving claims: breach of contract and open-source license violations. Discovery is ongoing.

Thomson Reuters v. ROSS Intelligence (D. Del.): In February 2025, the court ruled that ROSS’s use of Westlaw headnotes to train an AI legal research tool was not fair use, finding that the AI tool directly competed with the original product. Now on appeal to the Third Circuit. While not a coding case, the “competitive substitution” analysis has direct implications for AI tools trained on commercial codebases.

The pattern: Courts are drawing lines based on (a) how the training data was acquired (lawfully vs. pirated), (b) whether the AI output competes with the original work, and © whether outputs incorporate recognizable copyrighted expression. These distinctions matter for enterprise procurement: the training data provenance of your AI coding tools is now a material legal risk factor.

The Patent Dimension

Copyright is not the only IP concern. AI-generated code can independently create patent infringement risk. When an AI model generates code that implements a patented algorithm or method — drawn from training data containing patented implementations — the deploying company faces potential infringement liability without ever having intentionally copied the patent holder’s technology.

The opacity of AI systems compounds this. Traditional freedom-to-operate analyses require understanding what code does and where it came from. When an AI generates an implementation, neither the developer nor the organization can fully explain which training data influenced the output or whether it embodies a patented method.

No major patent infringement case involving AI-generated code has been decided as of early 2026. But rising patent allowance rates for software and AI technologies, combined with the explosion of AI-generated code entering production, make this collision inevitable.

The Trade Secret Pivot

With copyright protection uncertain and patent risk growing, trade secret law is becoming the primary IP protection strategy for AI-assisted codebases. Trade secrets require no registration, offer indefinite protection without disclosure, and protect both the code and the competitive advantage it creates.

But AI tools create new trade secret vulnerabilities:

Data leakage during training: Some AI coding tools use customer code to improve their models. If proprietary code enters a training pipeline, it may be reproducible by other customers. Enterprise tiers of major tools (Copilot Business/Enterprise, Claude Enterprise) explicitly commit to not training on customer data. Free or individual tiers often do not make this commitment.

Prompt injection and extraction: Generative AI systems can be manipulated into revealing information about their training data or prior interactions. If proprietary code patterns are embedded in a model, they may be extractable by adversaries.

Employee departure: Developers who have used AI tools to generate proprietary code may not understand that the prompts, configurations, and architectural decisions they carry in their heads constitute trade secrets. Traditional employment agreements may not cover AI-specific knowledge transfer scenarios.

To maintain trade secret protection, companies must: define protected AI assets precisely, restrict AI tool access to approved enterprise-tier products with data isolation guarantees, document competitive advantage stemming from proprietary code, and update NDAs and employment agreements to cover AI-specific scenarios.

Key Data Points

Metric Value Source
Codebases with license conflicts 68% (record high) Black Duck OSSRA 2026 (n=947)
Year-over-year license conflict increase 12 percentage points (56% to 68%) Black Duck OSSRA 2026
Orgs evaluating AI code for IP risks 54% Black Duck OSSRA 2026
Orgs doing comprehensive IP/license/security/quality review 24% Black Duck OSSRA 2026
Companies banning AI tools whose developers use them anyway 76% Black Duck OSSRA 2026
Largest AI copyright settlement $1.5 billion (Bartz v. Anthropic) Federal court filing, August 2025
Per-work settlement amount ~$3,000 per copyrighted work Bartz v. Anthropic settlement terms
Copyright Office Part 2 publication January 29, 2025 U.S. Copyright Office
D.C. Circuit human authorship affirmation March 2025, rehearing denied May 2025 D.C. Circuit Court of Appeals
Open source components per codebase increase 30% year-over-year Black Duck OSSRA 2026
Mean vulnerabilities per codebase 581 (107% increase) Black Duck OSSRA 2026
Codebases with high-risk vulnerabilities 78% Black Duck OSSRA 2026

What This Means for Your Organization

The IP risk from AI-generated code is not hypothetical and not distant. It is present in your codebase today if your developers use AI coding tools — and Black Duck’s data suggests 85% of organizations do, including 76% of those that officially prohibit it.

The protection gap is the urgent problem. If more than a trivial percentage of your codebase was generated primarily by AI without documented human creative involvement, you may be building competitive assets you cannot legally protect through copyright. This does not mean the code is worthless — trade secret protection still applies — but it means your enforcement options narrow significantly if a competitor reverse-engineers your work. The practical response is to establish a “creative audit trail” documenting human selection, arrangement, modification, and architectural decisions for code that carries competitive value.

License contamination requires active scanning, not passive policy. Writing an AI usage policy is necessary but insufficient. Black Duck’s data shows that most organizations check AI code for security vulnerabilities but not for license compliance — a 52-percentage-point gap between security scanning (76%) and comprehensive IP review (24%). Given that AI tools can introduce copyleft license obligations silently, every AI-generated code contribution should pass through the same license-scanning pipeline as third-party open-source dependencies. Tools like Black Duck, Threatrix, and Tabnine’s built-in license checking can automate this, but they must be configured and enforced.

Vendor indemnification is not a substitute for IP governance. Read the terms. Understand the exceptions. Budget for the realistic possibility that your actual usage patterns fall outside the indemnification scope. The safest approach: treat vendor indemnification as a partial hedge, not a complete shield, and build your own IP compliance controls regardless of what your vendors promise.

The $1.5 billion Bartz settlement and ongoing Doe v. GitHub litigation signal that AI training-data provenance is a material risk factor in vendor selection. When evaluating AI coding tools, ask: What was the model trained on? How does the vendor handle copyrighted training data? What indemnification do they offer — and what are the exceptions?

Sources


Created by Brandon Sneider | brandon@brandonsneider.com March 2026