← Procurement Contracting 🕐 7 min read
Procurement Contracting

Open-Source AI Licensing: The Hidden Contract Risk in "Free" Models

The term "open source" has become marketing language in the AI model ecosystem.

See also (wiki): wiki/ai-vendor-contracts.md


Executive Summary

  • Most “open-source” AI models are not open source. Llama, DeepSeek, and earlier Gemma/Qwen versions use custom licenses with field-of-use restrictions, MAU thresholds, and unilateral amendment rights that enterprise legal teams routinely fail to catch.
  • Meta’s Llama license is a bilateral commercial contract under California law — not a copyright license. It aggregates monthly active users across affiliates, permits Meta to change acceptable-use terms unilaterally, and extends obligations to synthetic data generated from Llama outputs.
  • The licensing landscape is bifurcating: Google (Gemma 4) and Alibaba (Qwen3+) have moved to genuine Apache 2.0. Meta and DeepSeek retain custom licenses with enterprise-hostile clauses. The Linux Foundation’s OpenMDW license offers a potential standard but lacks adoption.
  • Enterprises that fine-tune, distill, or generate training data from restricted-license models create cascading contractual obligations that follow every downstream derivative — a liability chain most procurement teams never map.

The License Is Not What Marketing Says It Is

The term “open source” has become marketing language in the AI model ecosystem. The Open Source Initiative requires that licenses permit use “in any field of endeavor.” By that standard, Llama, DeepSeek’s model license, and pre-2026 Gemma and Qwen versions are not open source — they are source-available models with commercial restrictions.

This matters for procurement because most enterprise approved-license lists include Apache 2.0, MIT, and BSD. A model marketed as “open” that uses a custom license triggers a full legal review — or should. TechCrunch’s March 2025 analysis found that custom AI licenses create a “chilling effect” on commercial adoption, with smaller firms avoiding them entirely because they lack counsel to evaluate bespoke terms.

The Llama Trap: A California Commercial Contract

Meta’s Llama Community License deserves the closest scrutiny because Llama models are the most widely deployed “open” models in enterprise settings.

What legal teams miss:

Clause What It Does Enterprise Risk
700M MAU threshold (Section 2) Companies exceeding 700M monthly active users across all affiliates must obtain separate Meta permission Acquisition of a Llama-dependent startup by a large platform company triggers immediate non-compliance
Affiliate aggregation MAU counts include entities with 50%+ ownership ties, no deduplication standard A conglomerate with four regional services at 200M MAU each unknowingly exceeds the threshold
Output obligations (Section 1.b.i) License terms extend to “any outputs or results of the Llama Materials” Synthetic datasets created from Llama outputs carry forward licensing obligations to every model trained on them
Unilateral AUP amendment (Section 6) Meta can modify the Acceptable Use Policy at any time; compliance is immediate A policy change after deployment can retroactively restrict an existing use case
Attribution cascade Derivative works require “Built with Llama” branding and “Llama” prefix in model names White-label AI products built on Llama derivatives must carry Meta’s brand
California law (Section 7) Disputes resolved exclusively in California courts Non-U.S. enterprises lose home-jurisdiction protections
EU multimodal restriction (Llama 3.2 AUP) Multimodal models explicitly unavailable to EU residents/companies European deployments of Llama 3.2 vision models violate the license on download

The synthetic-data liability chain is the least-understood risk. An enterprise that generates training data using Llama, then fine-tunes a different model on that data, has potentially extended Meta’s license terms to a model Meta did not build. No court has tested this interpretation, but the contract language supports it.

DeepSeek: MIT Code, Proprietary Model

DeepSeek’s dual-license structure creates a different trap. The source code ships under MIT — genuinely permissive. The model weights ship under a custom DeepSeek license that prohibits military applications, “false or harmful content” generation, and violations of personal rights. Black Duck’s license review flags the “false or harmful content” standard as inherently subjective — comparable to the JSON license’s notorious “good, not evil” clause that enterprise legal teams universally reject.

Derivative works must carry forward the same use-based restrictions. Some DeepSeek variants incorporate Meta’s Llama, stacking a second set of restrictions on top.

Beyond licensing, Ropes & Gray’s January 2025 alert flags data sovereignty as the primary enterprise concern: DeepSeek’s architecture routes through Chinese-jurisdiction infrastructure, triggering CFIUS, export control, and data residency questions that the MIT code license does nothing to address.

The Apache 2.0 Convergence

The clearest trend in 2026 is the migration of major model families toward genuine Apache 2.0 licensing:

Model Family Current License MAU Threshold Output Restrictions Enterprise Approval
Gemma 4 (Google, 2026) Apache 2.0 None None Standard process
Qwen3/3.5 (Alibaba, 2025-2026) Apache 2.0 None None Standard process
Llama 3/4 (Meta, 2024-2026) Custom community license 700M MAU Outputs restricted Full legal review required
DeepSeek-R1 (DeepSeek, 2025) MIT (code) + Custom (model) None Use-based restrictions Full legal review + data sovereignty review
Mistral (various) Apache 2.0 (most models) None None Standard process

Google’s shift with Gemma 4 is strategic. Earlier Gemma versions used a custom “Gemma Terms of Use” that prohibited uses harming minors or facilitating infrastructure attacks — reasonable in spirit but legally ambiguous enough that large-company legal teams could not sign off. Apache 2.0 eliminated that friction entirely.

Alibaba followed the same path. Qwen’s earlier license imposed a 100M MAU threshold and required commercial users to apply for permission. Qwen3 (April 2025) dropped all restrictions.

The OpenMDW Standard: Early but Worth Watching

The Linux Foundation published the Open Model, Data and Weights (OpenMDW) License to address the fundamental mismatch between traditional software licenses and AI model artifacts. OpenMDW covers the full model stack — architecture, weights, training code, data, documentation — under a single attribution-based license with no copyleft requirements.

Two provisions matter for enterprise procurement:

  1. Output freedom: Outputs generated from OpenMDW-licensed models carry no licensing restrictions from the provider. This directly addresses the Llama output-obligation problem.
  2. Patent protection: A patent-litigation termination clause prevents licensees from filing offensive patent suits — the same defensive mechanism that made Apache 2.0 enterprise-friendly.

OpenMDW aligns with the EU AI Act’s references to “free and open-source licenses” and the OSI’s 10 principles. Adoption remains early, but procurement teams should add it to their approved-license lists now to avoid re-review later.

The Redline Pattern: What GCs Are Actually Doing

Enterprise general counsel are converging on a practical framework for AI model licensing review:

  1. Approved by default: Apache 2.0, MIT, BSD — no additional review needed
  2. Review required: Any custom AI license — Llama Community License, DeepSeek Model License, pre-2026 Gemma Terms — gets the same scrutiny as a SaaS vendor contract
  3. Derivative-chain audit: Before fine-tuning or distilling, legal maps the full license chain back to base model weights. If any link in the chain carries use-based restrictions, those restrictions propagate
  4. Synthetic data quarantine: Training data generated from restricted-license models is flagged and tracked separately. Models trained on that data inherit the restrictions
  5. Acquisition due diligence: AI model licenses now appear in M&A IP schedules. A target company’s dependence on Llama-licensed models is a contractual liability that survives closing

Key Data Points

Data Point Source Date Credibility
Llama 700M MAU threshold aggregates across affiliates with 50%+ ownership Meta Llama Community License, Section 2 2024 HIGH — primary source
Meta can amend Llama AUP unilaterally, compliance immediate Meta Llama Community License, Section 6 2024 HIGH — primary source
Llama output obligations extend to synthetic datasets and derivative models Meta Llama Community License, Section 1.b.i 2024 HIGH — primary source
DeepSeek “false or harmful content” restriction comparable to JSON “good, not evil” clause Black Duck license review 2025 HIGH — independent analysis
Gemma 4 shifted to genuine Apache 2.0 from custom terms Google Gemma 4 release 2026 HIGH — primary source
Qwen3 dropped 100M MAU threshold, moved to Apache 2.0 Alibaba Qwen3 release April 2025 HIGH — primary source
OpenMDW license grants output freedom — no provider restrictions on generated content Linux Foundation OpenMDW draft 2025 HIGH — primary source
Custom AI licenses create “chilling effect”; smaller firms avoid entirely TechCrunch analysis March 2025 MEDIUM — journalism, expert-sourced
Trump AI Action Plan describes open-weight models as having “geostrategic value” U.S. AI Action Plan July 2025 HIGH — government source

What This Means for Your Organization

The immediate action is an audit of every AI model in your stack — production, pilot, and shadow deployments — against the license comparison table above. Most enterprises adopted Llama or DeepSeek models during 2024-2025 when “open source” marketing obscured the contract terms. Those deployments may be operating under restrictions that procurement never reviewed.

The derivative-chain risk is the one that catches organizations off guard. If your data science team generated synthetic training data using Llama, and then used that data to fine-tune a different model, Meta’s license terms may extend to your proprietary model. Map the chain now, before an acquisition or audit surfaces the exposure.

For new model selection, the decision is straightforward: Apache 2.0-licensed models (Gemma 4, Qwen3.5, Mistral) carry no enterprise licensing risk. Custom-licensed models (Llama, DeepSeek) require the same contract review as any vendor agreement — because that is exactly what they are.

If this raised questions about how your current AI model licenses interact with your procurement standards, I’d welcome the conversation — brandon@brandonsneider.com

Sources

  1. TechCrunch, “‘Open’ AI model licenses often carry concerning restrictions,” March 14, 2025. https://techcrunch.com/2025/03/14/open-ai-model-licenses-often-carry-concerning-restrictions/
  2. Crowell & Moring, “Artificial Intelligence and Open Source Data and Software: Contrasting Perspectives, Legal Risks, and Observations,” 2025. https://www.crowell.com/en/insights/client-alerts/artificial-intelligence-and-open-source-data-and-software-contrasting-perspectives-legal-risks-and-observations
  3. Open Source Guy, “Significant Risks in Using AI Models Governed by the Llama License,” January 27, 2025. https://shujisado.org/2025/01/27/significant-risks-in-using-ai-models-governed-by-the-llama-license/
  4. Linux Foundation, “The Open Source Legacy and AI’s Licensing Challenge,” 2025. https://www.linuxfoundation.org/blog/the-open-source-legacy-and-ais-licensing-challenge
  5. Black Duck, “DeepSeek Model License Review,” 2025. https://www.blackduck.com/blog/deepseek-license.html
  6. Meta, Llama Community License Agreement. https://github.com/meta-llama/llama-models/blob/main/LICENSE
  7. Google, Gemma 4 Apache 2.0 License, 2026. https://www.mindstudio.ai/blog/what-is-gemma-4-apache-2-license-commercial-ai-deployment
  8. Alibaba, Qwen3 Apache 2.0 release, April 2025. https://en.wikipedia.org/wiki/Qwen

See also (wiki): ai-vendor-contracts


Brandon Sneider | brandon@brandonsneider.com April 2026