See also (wiki): wiki/ai-vendor-contracts.md
Executive Summary
- Most “open-source” AI models are not open source. Llama, DeepSeek, and earlier Gemma/Qwen versions use custom licenses with field-of-use restrictions, MAU thresholds, and unilateral amendment rights that enterprise legal teams routinely fail to catch.
- Meta’s Llama license is a bilateral commercial contract under California law — not a copyright license. It aggregates monthly active users across affiliates, permits Meta to change acceptable-use terms unilaterally, and extends obligations to synthetic data generated from Llama outputs.
- The licensing landscape is bifurcating: Google (Gemma 4) and Alibaba (Qwen3+) have moved to genuine Apache 2.0. Meta and DeepSeek retain custom licenses with enterprise-hostile clauses. The Linux Foundation’s OpenMDW license offers a potential standard but lacks adoption.
- Enterprises that fine-tune, distill, or generate training data from restricted-license models create cascading contractual obligations that follow every downstream derivative — a liability chain most procurement teams never map.
The License Is Not What Marketing Says It Is
The term “open source” has become marketing language in the AI model ecosystem. The Open Source Initiative requires that licenses permit use “in any field of endeavor.” By that standard, Llama, DeepSeek’s model license, and pre-2026 Gemma and Qwen versions are not open source — they are source-available models with commercial restrictions.
This matters for procurement because most enterprise approved-license lists include Apache 2.0, MIT, and BSD. A model marketed as “open” that uses a custom license triggers a full legal review — or should. TechCrunch’s March 2025 analysis found that custom AI licenses create a “chilling effect” on commercial adoption, with smaller firms avoiding them entirely because they lack counsel to evaluate bespoke terms.
The Llama Trap: A California Commercial Contract
Meta’s Llama Community License deserves the closest scrutiny because Llama models are the most widely deployed “open” models in enterprise settings.
What legal teams miss:
| Clause | What It Does | Enterprise Risk |
|---|---|---|
| 700M MAU threshold (Section 2) | Companies exceeding 700M monthly active users across all affiliates must obtain separate Meta permission | Acquisition of a Llama-dependent startup by a large platform company triggers immediate non-compliance |
| Affiliate aggregation | MAU counts include entities with 50%+ ownership ties, no deduplication standard | A conglomerate with four regional services at 200M MAU each unknowingly exceeds the threshold |
| Output obligations (Section 1.b.i) | License terms extend to “any outputs or results of the Llama Materials” | Synthetic datasets created from Llama outputs carry forward licensing obligations to every model trained on them |
| Unilateral AUP amendment (Section 6) | Meta can modify the Acceptable Use Policy at any time; compliance is immediate | A policy change after deployment can retroactively restrict an existing use case |
| Attribution cascade | Derivative works require “Built with Llama” branding and “Llama” prefix in model names | White-label AI products built on Llama derivatives must carry Meta’s brand |
| California law (Section 7) | Disputes resolved exclusively in California courts | Non-U.S. enterprises lose home-jurisdiction protections |
| EU multimodal restriction (Llama 3.2 AUP) | Multimodal models explicitly unavailable to EU residents/companies | European deployments of Llama 3.2 vision models violate the license on download |
The synthetic-data liability chain is the least-understood risk. An enterprise that generates training data using Llama, then fine-tunes a different model on that data, has potentially extended Meta’s license terms to a model Meta did not build. No court has tested this interpretation, but the contract language supports it.
DeepSeek: MIT Code, Proprietary Model
DeepSeek’s dual-license structure creates a different trap. The source code ships under MIT — genuinely permissive. The model weights ship under a custom DeepSeek license that prohibits military applications, “false or harmful content” generation, and violations of personal rights. Black Duck’s license review flags the “false or harmful content” standard as inherently subjective — comparable to the JSON license’s notorious “good, not evil” clause that enterprise legal teams universally reject.
Derivative works must carry forward the same use-based restrictions. Some DeepSeek variants incorporate Meta’s Llama, stacking a second set of restrictions on top.
Beyond licensing, Ropes & Gray’s January 2025 alert flags data sovereignty as the primary enterprise concern: DeepSeek’s architecture routes through Chinese-jurisdiction infrastructure, triggering CFIUS, export control, and data residency questions that the MIT code license does nothing to address.
The Apache 2.0 Convergence
The clearest trend in 2026 is the migration of major model families toward genuine Apache 2.0 licensing:
| Model Family | Current License | MAU Threshold | Output Restrictions | Enterprise Approval |
|---|---|---|---|---|
| Gemma 4 (Google, 2026) | Apache 2.0 | None | None | Standard process |
| Qwen3/3.5 (Alibaba, 2025-2026) | Apache 2.0 | None | None | Standard process |
| Llama 3/4 (Meta, 2024-2026) | Custom community license | 700M MAU | Outputs restricted | Full legal review required |
| DeepSeek-R1 (DeepSeek, 2025) | MIT (code) + Custom (model) | None | Use-based restrictions | Full legal review + data sovereignty review |
| Mistral (various) | Apache 2.0 (most models) | None | None | Standard process |
Google’s shift with Gemma 4 is strategic. Earlier Gemma versions used a custom “Gemma Terms of Use” that prohibited uses harming minors or facilitating infrastructure attacks — reasonable in spirit but legally ambiguous enough that large-company legal teams could not sign off. Apache 2.0 eliminated that friction entirely.
Alibaba followed the same path. Qwen’s earlier license imposed a 100M MAU threshold and required commercial users to apply for permission. Qwen3 (April 2025) dropped all restrictions.
The OpenMDW Standard: Early but Worth Watching
The Linux Foundation published the Open Model, Data and Weights (OpenMDW) License to address the fundamental mismatch between traditional software licenses and AI model artifacts. OpenMDW covers the full model stack — architecture, weights, training code, data, documentation — under a single attribution-based license with no copyleft requirements.
Two provisions matter for enterprise procurement:
- Output freedom: Outputs generated from OpenMDW-licensed models carry no licensing restrictions from the provider. This directly addresses the Llama output-obligation problem.
- Patent protection: A patent-litigation termination clause prevents licensees from filing offensive patent suits — the same defensive mechanism that made Apache 2.0 enterprise-friendly.
OpenMDW aligns with the EU AI Act’s references to “free and open-source licenses” and the OSI’s 10 principles. Adoption remains early, but procurement teams should add it to their approved-license lists now to avoid re-review later.
The Redline Pattern: What GCs Are Actually Doing
Enterprise general counsel are converging on a practical framework for AI model licensing review:
- Approved by default: Apache 2.0, MIT, BSD — no additional review needed
- Review required: Any custom AI license — Llama Community License, DeepSeek Model License, pre-2026 Gemma Terms — gets the same scrutiny as a SaaS vendor contract
- Derivative-chain audit: Before fine-tuning or distilling, legal maps the full license chain back to base model weights. If any link in the chain carries use-based restrictions, those restrictions propagate
- Synthetic data quarantine: Training data generated from restricted-license models is flagged and tracked separately. Models trained on that data inherit the restrictions
- Acquisition due diligence: AI model licenses now appear in M&A IP schedules. A target company’s dependence on Llama-licensed models is a contractual liability that survives closing
Key Data Points
| Data Point | Source | Date | Credibility |
|---|---|---|---|
| Llama 700M MAU threshold aggregates across affiliates with 50%+ ownership | Meta Llama Community License, Section 2 | 2024 | HIGH — primary source |
| Meta can amend Llama AUP unilaterally, compliance immediate | Meta Llama Community License, Section 6 | 2024 | HIGH — primary source |
| Llama output obligations extend to synthetic datasets and derivative models | Meta Llama Community License, Section 1.b.i | 2024 | HIGH — primary source |
| DeepSeek “false or harmful content” restriction comparable to JSON “good, not evil” clause | Black Duck license review | 2025 | HIGH — independent analysis |
| Gemma 4 shifted to genuine Apache 2.0 from custom terms | Google Gemma 4 release | 2026 | HIGH — primary source |
| Qwen3 dropped 100M MAU threshold, moved to Apache 2.0 | Alibaba Qwen3 release | April 2025 | HIGH — primary source |
| OpenMDW license grants output freedom — no provider restrictions on generated content | Linux Foundation OpenMDW draft | 2025 | HIGH — primary source |
| Custom AI licenses create “chilling effect”; smaller firms avoid entirely | TechCrunch analysis | March 2025 | MEDIUM — journalism, expert-sourced |
| Trump AI Action Plan describes open-weight models as having “geostrategic value” | U.S. AI Action Plan | July 2025 | HIGH — government source |
What This Means for Your Organization
The immediate action is an audit of every AI model in your stack — production, pilot, and shadow deployments — against the license comparison table above. Most enterprises adopted Llama or DeepSeek models during 2024-2025 when “open source” marketing obscured the contract terms. Those deployments may be operating under restrictions that procurement never reviewed.
The derivative-chain risk is the one that catches organizations off guard. If your data science team generated synthetic training data using Llama, and then used that data to fine-tune a different model, Meta’s license terms may extend to your proprietary model. Map the chain now, before an acquisition or audit surfaces the exposure.
For new model selection, the decision is straightforward: Apache 2.0-licensed models (Gemma 4, Qwen3.5, Mistral) carry no enterprise licensing risk. Custom-licensed models (Llama, DeepSeek) require the same contract review as any vendor agreement — because that is exactly what they are.
If this raised questions about how your current AI model licenses interact with your procurement standards, I’d welcome the conversation — brandon@brandonsneider.com
Sources
- TechCrunch, “‘Open’ AI model licenses often carry concerning restrictions,” March 14, 2025. https://techcrunch.com/2025/03/14/open-ai-model-licenses-often-carry-concerning-restrictions/
- Crowell & Moring, “Artificial Intelligence and Open Source Data and Software: Contrasting Perspectives, Legal Risks, and Observations,” 2025. https://www.crowell.com/en/insights/client-alerts/artificial-intelligence-and-open-source-data-and-software-contrasting-perspectives-legal-risks-and-observations
- Open Source Guy, “Significant Risks in Using AI Models Governed by the Llama License,” January 27, 2025. https://shujisado.org/2025/01/27/significant-risks-in-using-ai-models-governed-by-the-llama-license/
- Linux Foundation, “The Open Source Legacy and AI’s Licensing Challenge,” 2025. https://www.linuxfoundation.org/blog/the-open-source-legacy-and-ais-licensing-challenge
- Black Duck, “DeepSeek Model License Review,” 2025. https://www.blackduck.com/blog/deepseek-license.html
- Meta, Llama Community License Agreement. https://github.com/meta-llama/llama-models/blob/main/LICENSE
- Google, Gemma 4 Apache 2.0 License, 2026. https://www.mindstudio.ai/blog/what-is-gemma-4-apache-2-license-commercial-ai-deployment
- Alibaba, Qwen3 Apache 2.0 release, April 2025. https://en.wikipedia.org/wiki/Qwen
See also (wiki): ai-vendor-contracts
Brandon Sneider | brandon@brandonsneider.com April 2026