← Findings 🕐 9 min read
Findings

AI and Customer Data: What Already Left the Building

This is not a security audit. It is a triage exercise designed for a CISO or IT director at a company without a SIEM to run with two or three people in a conference room.


Executive Summary

  • 46% of sensitive data entered into AI tools is customer data — billing information, authentication credentials, and account details. Employee data accounts for another 27%. The single largest category of AI data exposure is the data your customers trusted you with (Harmonic Security, Q4 2024, tens of thousands of enterprise prompts analyzed).
  • 77% of employees have pasted company information into AI tools, and 82% of those used personal accounts outside any enterprise monitoring. GenAI now accounts for 32% of all corporate-to-personal data exfiltration — making it the number-one channel for data leaving sanctioned environments (LayerX Enterprise AI & SaaS Data Security Report, 2025).
  • Shadow AI breaches cost $670,000 more than standard incidents and take longer to detect (247 days vs. 241 days). Customer PII is compromised in 65% of shadow AI breaches, compared to 53% in non-AI breaches (IBM/Ponemon Cost of a Data Breach Report, 600 organizations, 3,470 interviews, July 2025).
  • Whether you must notify customers depends on a legal distinction most CISOs have not evaluated. State breach notification laws generally require “unauthorized access” plus a “reasonable risk of harm.” An employee pasting customer records into ChatGPT may not technically satisfy either threshold — but contractual obligations with business partners and clients often have broader triggers. The gap between regulatory and contractual definitions is where mid-market companies get surprised (Debevoise & Plimpton, April 2025).

The 15-Minute Diagnostic

This is not a security audit. It is a triage exercise designed for a CISO or IT director at a company without a SIEM to run with two or three people in a conference room. The goal: determine whether customer data has already been processed by AI tools and whether anyone needs to be told.

Part 1: Has Customer Data Left the Building?

Answer these five questions. If the answer to any is “yes” or “I don’t know,” customer data has likely been processed by AI tools.

# Question Why It Matters
1 Do any employees process customer records — billing, support tickets, account details, correspondence — as part of their daily work? These employees are the highest-probability source of AI-processed customer data. Harmonic’s analysis found that insurance claims, billing disputes, and support transcripts are the document types most frequently pasted into AI tools.
2 Do those employees have access to ChatGPT, Claude, Gemini, or Copilot through personal accounts on their work devices? 82% of AI data paste events happen through personal accounts (LayerX, 2025). Enterprise plans with training opt-outs do not matter if employees use the free tier. 54% of sensitive prompts in Harmonic’s study went to ChatGPT’s free tier. 58.2% of Claude usage and 60.9% of Perplexity usage happens through personal accounts (Cyberhaven, February 2026).
3 Has anyone used AI to summarize meeting notes, draft client communications, or generate reports that reference specific customers by name? Samsung’s 2023 incident started exactly this way — an employee converted a meeting recording to a document and pasted it into ChatGPT for minutes. Three separate data exposure incidents occurred within 20 days (Bloomberg, May 2023).
4 Does your customer support team use AI for drafting responses, categorizing tickets, or analyzing complaint patterns? Support teams handle the densest concentration of customer PII: names, account numbers, complaint details, contact information. If any support tool has an AI feature enabled — or if support staff use separate AI tools — customer data is being processed.
5 Do you have any technical controls (DLP, browser extensions, endpoint monitoring) that detect or block sensitive data from being pasted into AI tools? 83% of organizations lack automated controls for AI data flows (Reco, 2025). Only 17% have implemented automated blocking combined with DLP scanning. If you do not have controls, the answer to questions 1-4 is effectively “yes.”

Scoring: If you answered “yes” or “I don’t know” to questions 1-4 and “no” to question 5, customer data has almost certainly been processed by AI tools. The question is no longer whether it happened — it is how much and what kind.

Part 2: What Kind of Data?

Determine which categories of customer data are most likely exposed. This matters for the notification analysis in Part 3.

Data Category Where It Lives AI Exposure Risk
Names + contact information CRM, support tickets, email High — appears in virtually every customer-facing document
Billing / payment data Invoicing systems, AR, payment processors High — Harmonic found billing data is the #1 category of customer data in AI prompts
Account credentials / authentication Support escalations, onboarding docs Medium-high — enters AI tools during password reset workflows and access troubleshooting
Health information (PHI) Benefits administration, insurance, wellness programs Medium — HIPAA adds a separate notification requirement with a 60-day deadline
Financial records Contracts, proposals, statements of work Medium — enters AI tools when employees use AI to draft or summarize agreements
Proprietary business information Strategy documents, pricing, product roadmaps shared by clients Medium — 40% of shadow AI breaches involve intellectual property (IBM, 2025)

Part 3: The Notification Decision Tree

The legal question is not “did data go into an AI tool?” The question is “does this trigger a notification obligation?” The answer depends on three factors.

Factor 1: Was there unauthorized access?

Most state breach notification laws require unauthorized access to or acquisition of personal information. An employee who is authorized to view customer records and pastes them into ChatGPT may not have committed “unauthorized access” under many state statutes — the employee had permission to access the data, even though they used it improperly.

However, this distinction narrows significantly if: (a) the AI tool’s terms of service allow training on user inputs (most free tiers do), which means the data is now accessible to the AI provider; (b) data was exposed to other users through a platform vulnerability (ChatGPT’s March 2023 bug exposed users’ chat histories to other users); or © the data included credentials or authentication tokens that could enable downstream access.

Factor 2: Is there a reasonable risk of harm?

Many states apply a risk-of-harm standard. Debevoise & Plimpton’s April 2025 analysis concludes that this condition “may not be satisfied in many cases of unauthorized uploads to an AI system” because there is no reason to believe any human at the AI provider will see the data. The data sits in a model’s training corpus, not in an accessible database.

This analysis weakens if: the AI provider suffers its own breach (OpenAI has experienced multiple security incidents); the data includes financial or authentication information with direct exploitation value; or the volume is large enough that regulatory attention is likely regardless of technical risk.

Factor 3: What do your contracts say?

This is where mid-market companies get caught. Contractual notification obligations are typically broader than regulatory ones. Debevoise notes that contractual triggers are “usually activated by unauthorized access to any confidential corporate information that was provided by the third party.” If a client shared proprietary data with your company and your employee pasted it into an AI tool, the contractual notification threshold is almost certainly met — regardless of whether state law requires it.

The decision:

Customer PII entered AI tool?
├── YES → Was it on an enterprise plan with training opt-out?
│   ├── YES → Lower risk. Document the incident. Review platform audit logs.
│   │         Contractual notification may still apply.
│   └── NO (free tier / personal account) → Higher risk.
│       ├── Does the data include financial, health, or authentication info?
│       │   ├── YES → Consult counsel. Notification likely required under
│       │   │         contract and possibly under state law.
│       │   └── NO (names/contact only) → Document. Assess contract language.
│       │         Notification may not be legally required but may be
│       │         prudent if volume is significant.
│       └── Do your client contracts define "breach" or "security incident"
│           broadly enough to include unauthorized processing?
│           ├── YES → Notify per contract terms.
│           └── NO / UNCLEAR → Legal review recommended before deciding.
└── NO → Document that you checked. This is your audit trail.

Key Data Points

Metric Value Source
Share of AI prompts containing sensitive data 8.5% Harmonic Security, Q4 2024
Customer data as share of all sensitive AI prompts 46% Harmonic Security, Q4 2024
Employees who have pasted data into AI tools 77% LayerX, 2025
AI paste events via personal accounts 82% LayerX, 2025
Sensitive prompts on ChatGPT free tier 54% Harmonic Security, Q4 2024
AI interactions involving sensitive data overall 39.7% Cyberhaven AI Adoption & Risk Report, February 2026
Shadow AI breach additional cost +$670,000 IBM/Ponemon, 600 orgs, July 2025
Shadow AI breach detection time 247 days IBM/Ponemon, July 2025
Customer PII compromised in shadow AI breaches 65% IBM/Ponemon, July 2025
Organizations lacking AI data flow controls 83% Reco, 2025
GenAI share of corporate-to-personal data exfiltration 32% LayerX, 2025
State breach notification deadline range 30-60 days 20 states with numeric deadlines

What This Means for Your Organization

The uncomfortable finding in this data is not that employees are putting customer data into AI tools. That was predictable the moment ChatGPT reached 100 million users. The uncomfortable finding is that most companies cannot answer the basic question: did it happen, how much, and do I need to tell anyone?

The diagnostic above takes 15 minutes. It will not give you a precise count of exposed records. What it will give you is a defensible starting point: you asked the right questions, documented what you found, and made a reasoned decision about notification. That documentation matters because the regulatory environment is shifting from “did you prevent it?” to “did you know and what did you do?”

The companies that handle this well share three characteristics. First, they run the diagnostic before a client or regulator asks — because the worst time to discover exposure is during an audit or a breach response. Second, they separate the legal question (must I notify?) from the relationship question (should I notify?). Proactive disclosure to a key client builds trust. Waiting until they find out from a third party destroys it. Third, they use the findings to build the case for controls — browser-level DLP, enterprise AI plans with training opt-outs, and acceptable use policies that address AI specifically.

If this diagnostic raised questions about your specific contractual obligations or notification thresholds, I’d welcome the conversation — brandon@brandonsneider.com.

Sources

  1. Harmonic Security, “The Data Leaking into GenAI Tools,” Q4 2024 analysis. Analyzed tens of thousands of prompts across ChatGPT, Copilot, Gemini, Claude, and Perplexity. Found 8.5% of prompts contain sensitive data, with customer data (billing, authentication) at 46% of sensitive prompts. Independent security vendor research. Credibility: Independent vendor with direct telemetry access — moderate-high. Sample size disclosed qualitatively (“tens of thousands”) but not precisely.

  2. LayerX, “Enterprise AI & SaaS Data Security Report,” 2025. Found 77% of employees paste data into AI prompts, 82% via personal accounts. GenAI accounts for 32% of corporate-to-personal data exfiltration. Based on real enterprise browsing telemetry. Credibility: Vendor-funded but based on observed telemetry, not surveys — moderate-high.

  3. IBM/Ponemon Institute, “Cost of a Data Breach Report,” July 2025. 600 organizations globally, 3,470 interviews, breaches from March 2024 through February 2025. Shadow AI breaches cost $670K more, take 247 days to detect, compromise customer PII in 65% of cases. Credibility: High — independent research institute, large sample, established methodology, 20th annual report.

  4. Cyberhaven, “2026 AI Adoption & Risk Report,” February 2026. Found 39.7% of AI interactions involve sensitive data. Personal account usage: 32.3% of ChatGPT, 58.2% of Claude, 60.9% of Perplexity. Credibility: Vendor-funded but based on endpoint telemetry — moderate.

  5. Debevoise & Plimpton, “An Employee Just Uploaded Sensitive Data to a Consumer AI Tool — Now What?,” April 2025. Legal analysis of breach notification triggers for AI data exposure. Concludes that most regulatory notification thresholds may not be met, but contractual obligations are often broader. Credibility: High — Am Law 50 firm, data privacy practice group, no commercial interest in the conclusion.

  6. Samsung/ChatGPT data exposure incidents, March-May 2023. Three separate incidents of employees pasting source code and meeting notes into ChatGPT within 20 days. Samsung subsequently banned ChatGPT use. Credibility: High — confirmed by Bloomberg and multiple outlets, Samsung acknowledged the incidents.

  7. Reco, “AI & Cloud Security Breaches: 2025 Year in Review.” Analysis of 80 organizations found ChatGPT accounts for 53% of shadow AI activity. 83% of organizations lack controls to detect sensitive data uploads to AI platforms. Credibility: Vendor research — moderate. Useful for operational benchmarks.


Brandon Sneider | brandon@brandonsneider.com March 2026