OWASP, NIST, and CSA on AI Coding Tool Security: What the Standards Bodies Actually Say

Executive Summary

  • OWASP, NIST, and CSA have each published distinct but complementary frameworks for AI security in 2025-2026 — organizations need all three, not one
  • OWASP’s Top 10 for LLMs (2025) and new MCP Top 10 provide application-level security checklists; prompt injection remains the #1 risk, and the new MCP framework addresses 2026’s fastest-growing attack surface
  • NIST released its Cyber AI Profile (NIST IR 8596, December 2025) covering three overlapping domains: securing AI systems, AI-enabled defense, and countering AI-enabled attacks — with a final version expected mid-2026
  • CSA’s AI Controls Matrix (AICM, July 2025) provides the most granular governance tool: 243 control objectives across 18 security domains, mapped to the EU AI Act, ISO 42001, and NIST AI RMF
  • The gap between these frameworks and enterprise practice is severe: 73% of organizations lack confidence in executing secure AI strategies (CSA/Google Cloud, n=300, December 2025), while Veracode finds 45% of AI-generated code contains security flaws (100+ LLMs tested, 80 coding tasks, July 2025)

1. OWASP: Application Security for LLMs and MCP

1.1 OWASP Top 10 for LLM Applications (2025)

OWASP’s LLM Top 10, developed by 500+ international experts, is the de facto application security standard for AI-powered systems. The 2025 version reflects the shift toward agentic AI.

The 2025 List:

# Risk What It Means for AI Coding Tools
LLM01 Prompt Injection Malicious inputs in code repositories, pull requests, or issue trackers hijack AI assistant behavior. The “IDEsaster” disclosure (December 2025) found 30+ prompt injection vulnerabilities across Cursor, Windsurf, GitHub Copilot, Kiro.dev, and Cline — 24 received CVE identifiers.
LLM02 Sensitive Information Disclosure AI coding tools can leak proprietary code, API keys, and credentials from context windows. The “EchoLeak” vulnerability demonstrated zero-click exfiltration without user interaction.
LLM03 Supply Chain AI models recommend nonexistent packages ~20% of the time (756,000 code samples tested). Attackers register these hallucinated names with malicious payloads — the “slopsquatting” attack vector.
LLM04 Data and Model Poisoning Corrupted training or fine-tuning data produces systematically vulnerable code. CrowdStrike found that politically sensitive prompts pushed DeepSeek-R1’s vulnerability rate from 19% to 27.2% (6,050 prompts per model, 30,250 total).
LLM05 Improper Output Handling AI-generated code accepted without validation. Veracode’s study of 100+ LLMs found 86% of code samples failed to defend against cross-site scripting (CWE-80), and 88% were vulnerable to log injection (CWE-117).
LLM06 Excessive Agency AI agents with file system access, terminal execution, or API permissions acting without human approval. The IDEsaster research showed most AI IDEs auto-approve file writes by default, enabling undetected malicious configuration changes.
LLM07 System Prompt Leakage System prompts reveal tool configurations, policy boundaries, and workflow logic. This was the most common attacker objective in Q4 2025.
LLM08 Vector and Embedding Weaknesses RAG systems and enterprise knowledge bases introduce new attack paths for manipulating AI coding context.
LLM09 Misinformation Code hallucinations — syntactically plausible but functionally wrong — are the most frequent failure mode. Models produce confident assertions about “best practices” while generating insecure PHP, missing authentication, and absent password hashing.
LLM10 Unbounded Consumption Credit-based pricing models (Cursor, Windsurf, JetBrains) create cost explosion risk when AI agents enter recursive loops or excessive token consumption.

Source credibility: High. OWASP is a vendor-neutral nonprofit. The LLM Top 10 is peer-reviewed by 500+ contributors across industry and academia. It does not sell products.

1.2 OWASP MCP Top 10 (2025, Beta)

The Model Context Protocol — the standard for connecting AI models to external tools and data — has become the fastest-growing attack surface in enterprise AI. OWASP launched a dedicated MCP Top 10, currently in Phase 3 (beta testing).

# Risk Why It Matters
MCP01 Token Mismanagement & Secret Exposure Hard-coded credentials in MCP server configurations; tokens stored in model memory or logs.
MCP02 Privilege Escalation via Scope Creep MCP server permissions expand over time without review.
MCP03 Tool Poisoning Adversaries inject malicious descriptions into MCP tool registries, steering AI agents toward unsafe actions. 5% of open-source MCP servers already contain tool poisoning (Invariant Labs, 2025).
MCP04 Supply Chain Attacks & Dependency Tampering A malicious MCP server masquerading as a “Postmark MCP Server” was caught BCC-ing all email communications to an attacker’s address.
MCP05 Command Injection & Execution AI agents construct and execute system commands from untrusted input. JFrog disclosed CVE-2025-6514, a critical OS command-injection flaw in mcp-remote.
MCP06 Intent Flow Subversion Malicious instructions embedded in context redirect agent objectives. Invariant Labs demonstrated how a poisoned GitHub issue hijacked a GitHub MCP agent to exfiltrate private repository data into a public PR.
MCP07 Insufficient Authentication & Authorization 43% of MCP servers have flaws in OAuth authentication flows (eSentire, 2025).
MCP08 Lack of Audit and Telemetry Most MCP deployments lack logging for tool invocations and context changes.
MCP09 Shadow MCP Servers Developers spin up unapproved MCP servers with default credentials, bypassing security governance.
MCP10 Context Injection & Over-Sharing Shared context windows expose sensitive data across tasks, users, or tenants.

Source credibility: Medium-High. The MCP Top 10 is in beta and has not yet achieved the maturity of the LLM Top 10. It reflects emerging real-world incidents but is still under community refinement.


2. NIST: Federal Risk Management Frameworks

2.1 AI Risk Management Framework (AI RMF 1.0, January 2023)

The NIST AI RMF is the U.S. government’s foundational voluntary framework for AI risk management. Its four-function structure — GOVERN, MAP, MEASURE, MANAGE — has become the dominant language for enterprise AI risk discussions.

The Four Functions Applied to AI Coding Tools:

  • GOVERN: Establish AI-specific risk policies, define acceptable use, assign cross-functional ownership. The framework emphasizes that AI risk is not purely a technology problem — it spans legal, ethical, and organizational dimensions.
  • MAP: Identify and categorize AI systems by risk level, document intended uses and limitations, map data flows from developer workstations through AI vendor infrastructure.
  • MEASURE: Quantitative and qualitative risk assessment, including red-team testing, bias auditing, and accuracy benchmarks. NIST explicitly calls for adversarial testing of AI systems.
  • MANAGE: Risk treatment, monitoring, and AI-specific incident response. Track effectiveness of controls over time.

Adoption status: Federal agencies are adopting AI RMF under executive order requirements. Sector regulators (SEC, FTC, CFPB, FDA, EEOC) increasingly reference NIST AI RMF principles in enforcement guidance. RMF 1.1 updates are expected through 2026.

2.2 Generative AI Risk Profile (NIST AI 600-1, July 2024)

NIST AI 600-1 is the companion profile specifically addressing generative AI risks. It identifies 12 risk categories with over 200 recommended actions.

The risks most relevant to AI coding tools:

  • Confabulation/Hallucination: AI models generate plausible but incorrect code. This maps directly to OWASP LLM09 and the slopsquatting supply chain vector.
  • Information Security: Training data leakage, prompt/completion data exposure, and model memorization of sensitive inputs.
  • Data Privacy: Personal or proprietary data processed by AI models without adequate consent or control mechanisms.
  • Information Integrity: AI-generated code that introduces vulnerabilities, bypasses validation, or produces insecure defaults.
  • CBRN/Dual-Use: While primarily concerned with weapons, this category extends to code that could be used for offensive cyber operations.

Source credibility: High. NIST is a U.S. federal standards body with no commercial interest. The AI RMF was developed with input from over 6,500 stakeholders. However, NIST frameworks are voluntary — they carry moral authority but not legal force unless adopted by regulators.

2.3 Cybersecurity Framework Profile for AI (NIST IR 8596, December 2025, Preliminary Draft)

Released December 16, 2025, this is NIST’s most recent and most directly applicable guidance for securing AI systems. It applies the CSF 2.0 structure to AI-specific cybersecurity challenges.

Three Focus Areas:

  1. Securing AI Systems: Protecting AI infrastructure, models, and data from adversarial attacks, unauthorized access, and supply chain compromise. Covers the full AI lifecycle from development through deployment and retirement.
  2. AI-Enabled Cyber Defense: Using AI to strengthen detection, investigation, and response capabilities. Addresses the challenge of trusting AI outputs in security-critical decisions.
  3. Thwarting AI-Enabled Cyberattacks: Building organizational resilience against threats that use AI for reconnaissance, exploit development, social engineering, and attack automation.

Timeline: The preliminary draft had a 45-day comment window through January 30, 2026. A workshop was held January 14, 2026. The initial public draft is expected in 2026.

Source credibility: High. Developed over a year with input from 6,500+ community members. However, this is still a preliminary draft — the final version may differ materially.


3. CSA: Enterprise Governance and Controls

3.1 AI Controls Matrix (AICM, July 2025)

The CSA AI Controls Matrix is the most operationally detailed framework available. It provides 243 control objectives across 18 security domains — a level of granularity that neither OWASP nor NIST currently offers.

The 18 Security Domains:

The AICM covers both traditional security domains adapted for AI and AI-specific domains:

Traditional Security (AI-Adapted):

  • Identity & Access Management (AI-specific privileged access)
  • Data Security & Privacy Lifecycle Management (training and inference data)
  • Network & Communications Security (AI service architectures)
  • Audit Assurance & Compliance (AI regulatory requirements)
  • Encryption & Key Management
  • Business Continuity & Disaster Recovery
  • Change Control & Configuration Management
  • Infrastructure & Virtualization Security

AI-Specific Domains:

  • Model Security (adversarial attacks, model integrity)
  • AI Supply Chain Management (model and data provenance)
  • Transparency & Accountability (explainable AI)
  • Human Oversight & Control (high-risk AI systems)
  • AI Application Security
  • Threat & Vulnerability Management (AI-specific threats)
  • Logging, Monitoring & Incident Management
  • Governance, Risk & Compliance (AI GRC integration)
  • Interoperability & Portability
  • Security Engineering & Architecture

Five Analytical Pillars per Control: Each of the 243 controls is assessed across: Control Type, Control Applicability and Ownership, Architectural Relevance, LLM Lifecycle Relevance, and Threat Category. The shared responsibility model maps controls across Cloud Service Providers, Model Providers, Orchestrated Service Providers, and Application Providers.

Regulatory Mappings: The AICM maps to ISO 42001, ISO 27001, NIST AI RMF 1.0, BSI AIC4, and the EU AI Act. The ISO 42001 and EU AI Act mappings were released August 2025.

Source credibility: High. CSA is a vendor-neutral nonprofit with deep enterprise security credibility. The AICM won the 2026 CSO Award. However, CSA’s December 2025 survey was sponsored by Google Cloud (n=300), which introduces potential bias in the governance maturity findings.

3.2 The State of AI Security and Governance (CSA/Google Cloud, December 2025)

CSA’s most recent survey (n=300 IT and security professionals, Summer 2025) reveals the gap between framework availability and enterprise readiness.

Key Findings:

  • Governance maturity is the strongest predictor of AI readiness — not budget, not tool selection, not headcount
  • Organizations with comprehensive governance policies show 46% early agentic AI adoption, vs. 25% with partial guidelines and 12% still developing policies
  • 70% of governed organizations have tested AI security capabilities, vs. 43% with partial governance and 39% still developing
  • 52% cite data exposure as their top AI security concern
  • 73% lack confidence in executing secure AI strategies — even organizations that have policies in place
  • 90%+ of security teams are exploring AI for detection, investigation, and response
  • Only 12% of organizations rank model integrity compromise as a top concern — a dangerous blind spot given real-world poisoning attacks

AI Model Adoption: GPT (70%), Gemini (48%), Claude (29%), LLaMA (20%).

Source credibility: Medium. Small sample size (n=300), Google Cloud sponsorship, and self-reported data. Directionally useful but not definitive.

3.3 CSA CISO Implementation Roadmap

CSA provides a four-phase implementation path for the AICM:

  1. Self-Assessment: Catalog all AI systems including shadow deployments and vendor services. Use the AI-CAIQ (Consensus Assessment Initiative Questionnaire for AI) for structured evaluation.
  2. Supply Chain Integration: Develop vendor questionnaires based on AICM controls, incorporate AI security requirements into procurement contracts, establish continuous monitoring of AI tool providers.
  3. STAR for AI Certification: Progress from Level 1 (self-assessment documentation) through Level 2 (third-party validation). CSA STAR for AI is the first AI-specific assurance certification program.
  4. Advanced Operations: Establish an AI Governance Center of Excellence with cross-functional oversight. Integrate AI-specific monitoring into security operations centers.

4. The Evidence on AI-Generated Code Quality

The three frameworks exist for good reason. The evidence on AI-generated code security is stark:

  • 45% of AI-generated code contains security flaws aligned with OWASP Top 10 categories (Veracode, 100+ LLMs, 80 coding tasks, July 2025). Independent testing lab. High credibility.
  • Java is the riskiest language: 70%+ security failure rate across tasks. 86% of samples failed to defend against cross-site scripting (CWE-80). 88% were vulnerable to log injection (CWE-117). (Veracode, 2025)
  • Larger models do not meaningfully outperform smaller ones on security — Veracode found this is “a systemic issue rather than an LLM scaling problem.” This means you cannot buy your way to secure AI-generated code.
  • AI-generated code contains 2.74x more vulnerabilities than human-written code, with 322% more privilege escalation paths, 153% more design flaws, and a 40% increase in secrets exposure (SoftwareSeni analysis, 2025). Methodology not independently verified.
  • Supply chain attacks are accelerating: IBM X-Force reports a nearly 4x increase in supply chain and third-party compromises since 2020, driven by CI/CD automation abuse and trust relationship exploitation (IBM X-Force 2026 Threat Index). Over 300,000 ChatGPT credentials were exposed via infostealer malware in 2025.
  • 30+ vulnerabilities in AI IDEs disclosed in a single research effort (Ari Marzouk, “IDEsaster,” December 2025), affecting Cursor, Windsurf, GitHub Copilot, Kiro.dev, Zed.dev, Roo Code, Junie, and Cline. 24 received CVE identifiers.
  • Georgetown CSET found that all five LLMs they tested produced “similar and severe bugs” when prompted with MITRE Top 25 CWE scenarios. Their recommendation: “the burden of ensuring AI-generated code is secure should not rest solely on individual users” (Georgetown CSET, November 2024). Academic source. High credibility.

Key Data Points

Metric Value Source Credibility
AI-generated code with security flaws 45% Veracode (100+ LLMs, 80 tasks), Jul 2025 High — independent lab
Java security failure rate 70%+ Veracode, Jul 2025 High
XSS defense failure rate (CWE-80) 86% Veracode, Jul 2025 High
Log injection vulnerability rate (CWE-117) 88% Veracode, Jul 2025 High
Vulnerability multiplier vs. human code 2.74x SoftwareSeni, 2025 Medium — methodology unclear
Developers using AI coding tools 84% (51% daily) Stack Overflow (n=49,000), 2025 High
Employees using unsanctioned AI tools 41-49% Cisco Security / survey (n=2,000), 2025 Medium-High
Organizations lacking AI security confidence 73% CSA/Google Cloud (n=300), Dec 2025 Medium — small sample, sponsor bias
MCP servers with OAuth flaws 43% eSentire, 2025 Medium
Open-source MCP servers with tool poisoning 5% Invariant Labs, 2025 Medium-High
CVEs from IDEsaster disclosure 24 Ari Marzouk, Dec 2025 High — verified CVEs
ChatGPT credentials exposed via infostealers 300,000+ IBM X-Force 2026 High
Supply chain compromise increase since 2020 ~4x IBM X-Force 2026 High
CSA AICM control objectives 243 across 18 domains CSA, Jul 2025 High
NIST AI RMF community size 6,500+ NIST, Dec 2025 High

What This Means for Your Organization

The three frameworks serve different purposes, and organizations need a layered approach:

OWASP tells you what to test. The LLM Top 10 and MCP Top 10 are checklists for application security teams. If your developers use AI coding tools — and 84% of developers do — your security team should be testing for prompt injection, sensitive information disclosure, and supply chain vulnerabilities in your AI tool configurations. The MCP Top 10 is particularly urgent: if your organization uses Cursor, Claude Code, or any tool with MCP server connections, the 43% OAuth flaw rate and 5% tool poisoning rate demand immediate audit.

NIST tells you how to think about risk. The AI RMF’s GOVERN-MAP-MEASURE-MANAGE structure provides the strategic framework. The Cyber AI Profile (IR 8596) adds AI-specific cybersecurity guidance. For regulated industries — financial services, healthcare, government — NIST alignment is increasingly expected, not optional, as sector regulators reference these frameworks in enforcement actions and examination priorities.

CSA tells you what controls to implement. The AICM’s 243 control objectives across 18 domains provide the most granular operational framework available. The shared responsibility model is particularly valuable: it clarifies who owns each control across your cloud providers, model providers, and your own application teams. For organizations pursuing ISO 42001 certification or EU AI Act compliance, the AICM’s regulatory mappings provide a practical bridge.

The uncomfortable reality: 73% of organizations lack confidence in their ability to execute secure AI strategies, yet 84% of developers are already using AI tools daily. The frameworks exist. The guidance is clear. The gap is execution — which is where most organizations stall. Shadow AI, auto-approved agent actions, unaudited MCP servers, and untested AI-generated code represent immediate, measurable risk. The question is not whether your organization has an AI security problem. The question is whether you know the shape and size of the one you already have.


Sources

OWASP

NIST

CSA

AI-Generated Code Security Evidence


Created by Brandon Sneider | brandon@brandonsneider.com March 2026