OWASP, NIST, and CSA on AI Coding Tool Security: What the Standards Bodies Actually Say
Executive Summary
- OWASP, NIST, and CSA have each published distinct but complementary frameworks for AI security in 2025-2026 — organizations need all three, not one
- OWASP’s Top 10 for LLMs (2025) and new MCP Top 10 provide application-level security checklists; prompt injection remains the #1 risk, and the new MCP framework addresses 2026’s fastest-growing attack surface
- NIST released its Cyber AI Profile (NIST IR 8596, December 2025) covering three overlapping domains: securing AI systems, AI-enabled defense, and countering AI-enabled attacks — with a final version expected mid-2026
- CSA’s AI Controls Matrix (AICM, July 2025) provides the most granular governance tool: 243 control objectives across 18 security domains, mapped to the EU AI Act, ISO 42001, and NIST AI RMF
- The gap between these frameworks and enterprise practice is severe: 73% of organizations lack confidence in executing secure AI strategies (CSA/Google Cloud, n=300, December 2025), while Veracode finds 45% of AI-generated code contains security flaws (100+ LLMs tested, 80 coding tasks, July 2025)
1. OWASP: Application Security for LLMs and MCP
1.1 OWASP Top 10 for LLM Applications (2025)
OWASP’s LLM Top 10, developed by 500+ international experts, is the de facto application security standard for AI-powered systems. The 2025 version reflects the shift toward agentic AI.
The 2025 List:
| # | Risk | What It Means for AI Coding Tools |
|---|---|---|
| LLM01 | Prompt Injection | Malicious inputs in code repositories, pull requests, or issue trackers hijack AI assistant behavior. The “IDEsaster” disclosure (December 2025) found 30+ prompt injection vulnerabilities across Cursor, Windsurf, GitHub Copilot, Kiro.dev, and Cline — 24 received CVE identifiers. |
| LLM02 | Sensitive Information Disclosure | AI coding tools can leak proprietary code, API keys, and credentials from context windows. The “EchoLeak” vulnerability demonstrated zero-click exfiltration without user interaction. |
| LLM03 | Supply Chain | AI models recommend nonexistent packages ~20% of the time (756,000 code samples tested). Attackers register these hallucinated names with malicious payloads — the “slopsquatting” attack vector. |
| LLM04 | Data and Model Poisoning | Corrupted training or fine-tuning data produces systematically vulnerable code. CrowdStrike found that politically sensitive prompts pushed DeepSeek-R1’s vulnerability rate from 19% to 27.2% (6,050 prompts per model, 30,250 total). |
| LLM05 | Improper Output Handling | AI-generated code accepted without validation. Veracode’s study of 100+ LLMs found 86% of code samples failed to defend against cross-site scripting (CWE-80), and 88% were vulnerable to log injection (CWE-117). |
| LLM06 | Excessive Agency | AI agents with file system access, terminal execution, or API permissions acting without human approval. The IDEsaster research showed most AI IDEs auto-approve file writes by default, enabling undetected malicious configuration changes. |
| LLM07 | System Prompt Leakage | System prompts reveal tool configurations, policy boundaries, and workflow logic. This was the most common attacker objective in Q4 2025. |
| LLM08 | Vector and Embedding Weaknesses | RAG systems and enterprise knowledge bases introduce new attack paths for manipulating AI coding context. |
| LLM09 | Misinformation | Code hallucinations — syntactically plausible but functionally wrong — are the most frequent failure mode. Models produce confident assertions about “best practices” while generating insecure PHP, missing authentication, and absent password hashing. |
| LLM10 | Unbounded Consumption | Credit-based pricing models (Cursor, Windsurf, JetBrains) create cost explosion risk when AI agents enter recursive loops or excessive token consumption. |
Source credibility: High. OWASP is a vendor-neutral nonprofit. The LLM Top 10 is peer-reviewed by 500+ contributors across industry and academia. It does not sell products.
1.2 OWASP MCP Top 10 (2025, Beta)
The Model Context Protocol — the standard for connecting AI models to external tools and data — has become the fastest-growing attack surface in enterprise AI. OWASP launched a dedicated MCP Top 10, currently in Phase 3 (beta testing).
| # | Risk | Why It Matters |
|---|---|---|
| MCP01 | Token Mismanagement & Secret Exposure | Hard-coded credentials in MCP server configurations; tokens stored in model memory or logs. |
| MCP02 | Privilege Escalation via Scope Creep | MCP server permissions expand over time without review. |
| MCP03 | Tool Poisoning | Adversaries inject malicious descriptions into MCP tool registries, steering AI agents toward unsafe actions. 5% of open-source MCP servers already contain tool poisoning (Invariant Labs, 2025). |
| MCP04 | Supply Chain Attacks & Dependency Tampering | A malicious MCP server masquerading as a “Postmark MCP Server” was caught BCC-ing all email communications to an attacker’s address. |
| MCP05 | Command Injection & Execution | AI agents construct and execute system commands from untrusted input. JFrog disclosed CVE-2025-6514, a critical OS command-injection flaw in mcp-remote. |
| MCP06 | Intent Flow Subversion | Malicious instructions embedded in context redirect agent objectives. Invariant Labs demonstrated how a poisoned GitHub issue hijacked a GitHub MCP agent to exfiltrate private repository data into a public PR. |
| MCP07 | Insufficient Authentication & Authorization | 43% of MCP servers have flaws in OAuth authentication flows (eSentire, 2025). |
| MCP08 | Lack of Audit and Telemetry | Most MCP deployments lack logging for tool invocations and context changes. |
| MCP09 | Shadow MCP Servers | Developers spin up unapproved MCP servers with default credentials, bypassing security governance. |
| MCP10 | Context Injection & Over-Sharing | Shared context windows expose sensitive data across tasks, users, or tenants. |
Source credibility: Medium-High. The MCP Top 10 is in beta and has not yet achieved the maturity of the LLM Top 10. It reflects emerging real-world incidents but is still under community refinement.
2. NIST: Federal Risk Management Frameworks
2.1 AI Risk Management Framework (AI RMF 1.0, January 2023)
The NIST AI RMF is the U.S. government’s foundational voluntary framework for AI risk management. Its four-function structure — GOVERN, MAP, MEASURE, MANAGE — has become the dominant language for enterprise AI risk discussions.
The Four Functions Applied to AI Coding Tools:
- GOVERN: Establish AI-specific risk policies, define acceptable use, assign cross-functional ownership. The framework emphasizes that AI risk is not purely a technology problem — it spans legal, ethical, and organizational dimensions.
- MAP: Identify and categorize AI systems by risk level, document intended uses and limitations, map data flows from developer workstations through AI vendor infrastructure.
- MEASURE: Quantitative and qualitative risk assessment, including red-team testing, bias auditing, and accuracy benchmarks. NIST explicitly calls for adversarial testing of AI systems.
- MANAGE: Risk treatment, monitoring, and AI-specific incident response. Track effectiveness of controls over time.
Adoption status: Federal agencies are adopting AI RMF under executive order requirements. Sector regulators (SEC, FTC, CFPB, FDA, EEOC) increasingly reference NIST AI RMF principles in enforcement guidance. RMF 1.1 updates are expected through 2026.
2.2 Generative AI Risk Profile (NIST AI 600-1, July 2024)
NIST AI 600-1 is the companion profile specifically addressing generative AI risks. It identifies 12 risk categories with over 200 recommended actions.
The risks most relevant to AI coding tools:
- Confabulation/Hallucination: AI models generate plausible but incorrect code. This maps directly to OWASP LLM09 and the slopsquatting supply chain vector.
- Information Security: Training data leakage, prompt/completion data exposure, and model memorization of sensitive inputs.
- Data Privacy: Personal or proprietary data processed by AI models without adequate consent or control mechanisms.
- Information Integrity: AI-generated code that introduces vulnerabilities, bypasses validation, or produces insecure defaults.
- CBRN/Dual-Use: While primarily concerned with weapons, this category extends to code that could be used for offensive cyber operations.
Source credibility: High. NIST is a U.S. federal standards body with no commercial interest. The AI RMF was developed with input from over 6,500 stakeholders. However, NIST frameworks are voluntary — they carry moral authority but not legal force unless adopted by regulators.
2.3 Cybersecurity Framework Profile for AI (NIST IR 8596, December 2025, Preliminary Draft)
Released December 16, 2025, this is NIST’s most recent and most directly applicable guidance for securing AI systems. It applies the CSF 2.0 structure to AI-specific cybersecurity challenges.
Three Focus Areas:
- Securing AI Systems: Protecting AI infrastructure, models, and data from adversarial attacks, unauthorized access, and supply chain compromise. Covers the full AI lifecycle from development through deployment and retirement.
- AI-Enabled Cyber Defense: Using AI to strengthen detection, investigation, and response capabilities. Addresses the challenge of trusting AI outputs in security-critical decisions.
- Thwarting AI-Enabled Cyberattacks: Building organizational resilience against threats that use AI for reconnaissance, exploit development, social engineering, and attack automation.
Timeline: The preliminary draft had a 45-day comment window through January 30, 2026. A workshop was held January 14, 2026. The initial public draft is expected in 2026.
Source credibility: High. Developed over a year with input from 6,500+ community members. However, this is still a preliminary draft — the final version may differ materially.
3. CSA: Enterprise Governance and Controls
3.1 AI Controls Matrix (AICM, July 2025)
The CSA AI Controls Matrix is the most operationally detailed framework available. It provides 243 control objectives across 18 security domains — a level of granularity that neither OWASP nor NIST currently offers.
The 18 Security Domains:
The AICM covers both traditional security domains adapted for AI and AI-specific domains:
Traditional Security (AI-Adapted):
- Identity & Access Management (AI-specific privileged access)
- Data Security & Privacy Lifecycle Management (training and inference data)
- Network & Communications Security (AI service architectures)
- Audit Assurance & Compliance (AI regulatory requirements)
- Encryption & Key Management
- Business Continuity & Disaster Recovery
- Change Control & Configuration Management
- Infrastructure & Virtualization Security
AI-Specific Domains:
- Model Security (adversarial attacks, model integrity)
- AI Supply Chain Management (model and data provenance)
- Transparency & Accountability (explainable AI)
- Human Oversight & Control (high-risk AI systems)
- AI Application Security
- Threat & Vulnerability Management (AI-specific threats)
- Logging, Monitoring & Incident Management
- Governance, Risk & Compliance (AI GRC integration)
- Interoperability & Portability
- Security Engineering & Architecture
Five Analytical Pillars per Control: Each of the 243 controls is assessed across: Control Type, Control Applicability and Ownership, Architectural Relevance, LLM Lifecycle Relevance, and Threat Category. The shared responsibility model maps controls across Cloud Service Providers, Model Providers, Orchestrated Service Providers, and Application Providers.
Regulatory Mappings: The AICM maps to ISO 42001, ISO 27001, NIST AI RMF 1.0, BSI AIC4, and the EU AI Act. The ISO 42001 and EU AI Act mappings were released August 2025.
Source credibility: High. CSA is a vendor-neutral nonprofit with deep enterprise security credibility. The AICM won the 2026 CSO Award. However, CSA’s December 2025 survey was sponsored by Google Cloud (n=300), which introduces potential bias in the governance maturity findings.
3.2 The State of AI Security and Governance (CSA/Google Cloud, December 2025)
CSA’s most recent survey (n=300 IT and security professionals, Summer 2025) reveals the gap between framework availability and enterprise readiness.
Key Findings:
- Governance maturity is the strongest predictor of AI readiness — not budget, not tool selection, not headcount
- Organizations with comprehensive governance policies show 46% early agentic AI adoption, vs. 25% with partial guidelines and 12% still developing policies
- 70% of governed organizations have tested AI security capabilities, vs. 43% with partial governance and 39% still developing
- 52% cite data exposure as their top AI security concern
- 73% lack confidence in executing secure AI strategies — even organizations that have policies in place
- 90%+ of security teams are exploring AI for detection, investigation, and response
- Only 12% of organizations rank model integrity compromise as a top concern — a dangerous blind spot given real-world poisoning attacks
AI Model Adoption: GPT (70%), Gemini (48%), Claude (29%), LLaMA (20%).
Source credibility: Medium. Small sample size (n=300), Google Cloud sponsorship, and self-reported data. Directionally useful but not definitive.
3.3 CSA CISO Implementation Roadmap
CSA provides a four-phase implementation path for the AICM:
- Self-Assessment: Catalog all AI systems including shadow deployments and vendor services. Use the AI-CAIQ (Consensus Assessment Initiative Questionnaire for AI) for structured evaluation.
- Supply Chain Integration: Develop vendor questionnaires based on AICM controls, incorporate AI security requirements into procurement contracts, establish continuous monitoring of AI tool providers.
- STAR for AI Certification: Progress from Level 1 (self-assessment documentation) through Level 2 (third-party validation). CSA STAR for AI is the first AI-specific assurance certification program.
- Advanced Operations: Establish an AI Governance Center of Excellence with cross-functional oversight. Integrate AI-specific monitoring into security operations centers.
4. The Evidence on AI-Generated Code Quality
The three frameworks exist for good reason. The evidence on AI-generated code security is stark:
- 45% of AI-generated code contains security flaws aligned with OWASP Top 10 categories (Veracode, 100+ LLMs, 80 coding tasks, July 2025). Independent testing lab. High credibility.
- Java is the riskiest language: 70%+ security failure rate across tasks. 86% of samples failed to defend against cross-site scripting (CWE-80). 88% were vulnerable to log injection (CWE-117). (Veracode, 2025)
- Larger models do not meaningfully outperform smaller ones on security — Veracode found this is “a systemic issue rather than an LLM scaling problem.” This means you cannot buy your way to secure AI-generated code.
- AI-generated code contains 2.74x more vulnerabilities than human-written code, with 322% more privilege escalation paths, 153% more design flaws, and a 40% increase in secrets exposure (SoftwareSeni analysis, 2025). Methodology not independently verified.
- Supply chain attacks are accelerating: IBM X-Force reports a nearly 4x increase in supply chain and third-party compromises since 2020, driven by CI/CD automation abuse and trust relationship exploitation (IBM X-Force 2026 Threat Index). Over 300,000 ChatGPT credentials were exposed via infostealer malware in 2025.
- 30+ vulnerabilities in AI IDEs disclosed in a single research effort (Ari Marzouk, “IDEsaster,” December 2025), affecting Cursor, Windsurf, GitHub Copilot, Kiro.dev, Zed.dev, Roo Code, Junie, and Cline. 24 received CVE identifiers.
- Georgetown CSET found that all five LLMs they tested produced “similar and severe bugs” when prompted with MITRE Top 25 CWE scenarios. Their recommendation: “the burden of ensuring AI-generated code is secure should not rest solely on individual users” (Georgetown CSET, November 2024). Academic source. High credibility.
Key Data Points
| Metric | Value | Source | Credibility |
|---|---|---|---|
| AI-generated code with security flaws | 45% | Veracode (100+ LLMs, 80 tasks), Jul 2025 | High — independent lab |
| Java security failure rate | 70%+ | Veracode, Jul 2025 | High |
| XSS defense failure rate (CWE-80) | 86% | Veracode, Jul 2025 | High |
| Log injection vulnerability rate (CWE-117) | 88% | Veracode, Jul 2025 | High |
| Vulnerability multiplier vs. human code | 2.74x | SoftwareSeni, 2025 | Medium — methodology unclear |
| Developers using AI coding tools | 84% (51% daily) | Stack Overflow (n=49,000), 2025 | High |
| Employees using unsanctioned AI tools | 41-49% | Cisco Security / survey (n=2,000), 2025 | Medium-High |
| Organizations lacking AI security confidence | 73% | CSA/Google Cloud (n=300), Dec 2025 | Medium — small sample, sponsor bias |
| MCP servers with OAuth flaws | 43% | eSentire, 2025 | Medium |
| Open-source MCP servers with tool poisoning | 5% | Invariant Labs, 2025 | Medium-High |
| CVEs from IDEsaster disclosure | 24 | Ari Marzouk, Dec 2025 | High — verified CVEs |
| ChatGPT credentials exposed via infostealers | 300,000+ | IBM X-Force 2026 | High |
| Supply chain compromise increase since 2020 | ~4x | IBM X-Force 2026 | High |
| CSA AICM control objectives | 243 across 18 domains | CSA, Jul 2025 | High |
| NIST AI RMF community size | 6,500+ | NIST, Dec 2025 | High |
What This Means for Your Organization
The three frameworks serve different purposes, and organizations need a layered approach:
OWASP tells you what to test. The LLM Top 10 and MCP Top 10 are checklists for application security teams. If your developers use AI coding tools — and 84% of developers do — your security team should be testing for prompt injection, sensitive information disclosure, and supply chain vulnerabilities in your AI tool configurations. The MCP Top 10 is particularly urgent: if your organization uses Cursor, Claude Code, or any tool with MCP server connections, the 43% OAuth flaw rate and 5% tool poisoning rate demand immediate audit.
NIST tells you how to think about risk. The AI RMF’s GOVERN-MAP-MEASURE-MANAGE structure provides the strategic framework. The Cyber AI Profile (IR 8596) adds AI-specific cybersecurity guidance. For regulated industries — financial services, healthcare, government — NIST alignment is increasingly expected, not optional, as sector regulators reference these frameworks in enforcement actions and examination priorities.
CSA tells you what controls to implement. The AICM’s 243 control objectives across 18 domains provide the most granular operational framework available. The shared responsibility model is particularly valuable: it clarifies who owns each control across your cloud providers, model providers, and your own application teams. For organizations pursuing ISO 42001 certification or EU AI Act compliance, the AICM’s regulatory mappings provide a practical bridge.
The uncomfortable reality: 73% of organizations lack confidence in their ability to execute secure AI strategies, yet 84% of developers are already using AI tools daily. The frameworks exist. The guidance is clear. The gap is execution — which is where most organizations stall. Shadow AI, auto-approved agent actions, unaudited MCP servers, and untested AI-generated code represent immediate, measurable risk. The question is not whether your organization has an AI security problem. The question is whether you know the shape and size of the one you already have.
Sources
OWASP
- OWASP Top 10 for LLM Applications 2025 — Vendor-neutral, peer-reviewed by 500+ experts. High credibility.
- OWASP MCP Top 10 (Beta) — Phase 3 beta. Medium-high credibility; still under community refinement.
- OWASP Top 10 2025 Detailed Analysis (Confident AI) — Secondary analysis.
NIST
- NIST AI Risk Management Framework — Federal standards body. High credibility.
- NIST AI 600-1: Generative AI Risk Profile, July 2024 — 200+ recommended actions across 12 risk categories.
- Draft NIST IR 8596: Cybersecurity Framework Profile for AI, December 2025 — Preliminary draft; final version expected 2026. High credibility.
- NIST IR 8596 Public Comment Portal
CSA
- CSA AI Controls Matrix (AICM) — 243 controls, 18 domains. 2026 CSO Award winner. High credibility.
- Strategic Implementation of the CSA AICM: A CISO’s Guide
- CSA/Google Cloud: The State of AI Security and Governance, December 2025 — n=300, Google-sponsored. Medium credibility.
- CSA AI Organizational Responsibilities, January 2025
AI-Generated Code Security Evidence
- Veracode 2025 GenAI Code Security Report — 100+ LLMs, 80 tasks. Independent lab. High credibility.
- IBM 2026 X-Force Threat Index — Annual enterprise threat report. High credibility.
- CrowdStrike: Hidden Vulnerabilities in AI-Coded Software — 6,050 prompts per model, 30,250 total. High credibility.
- Georgetown CSET: Cybersecurity Risks of AI-Generated Code, November 2024 — Academic. High credibility.
- IDEsaster: 30+ Flaws in AI Coding Tools (The Hacker News, December 2025) — 24 CVEs issued.
- Top 5 Real-World AI Security Threats (CSO Online, December 2025)
- AI Coding Tools Security Exploits (Fortune, December 2025)
- MCP Security Vulnerabilities Timeline (AuthZed)
- MCP Attack Vectors (Palo Alto Unit 42)
Created by Brandon Sneider | brandon@brandonsneider.com March 2026