← Adoption Challenges 🕐 6 min read
Adoption Challenges

Legacy Data Remediation TCO: What It Actually Costs to Make Your Data AI-Ready

The evidence is consistent across every credible analysis of enterprise AI ROI: model performance is downstream of data quality.


Executive Summary

  • Data preparation consumes 40–60% of total AI project cost and 60–80% of project time, according to O’Reilly analysis. Most budgets underprice this line item by an order of magnitude.
  • 96% of companies begin AI projects without sufficient high-quality training data (Optimus AI, 2025), triggering unplanned remediation spend of $10K–$90K per project — typically absorbed as scope creep rather than a planned capital expense.
  • Industry context matters: healthcare EHR migration (Epic, Oracle Health) runs $1M–$500M depending on system size; manufacturing ERP/MES normalization and financial services multi-core-banking integration carry their own distinct cost curves and failure rates.
  • Build-vs-buy: Fivetran starts at $500/month for the first 1M rows but escalates sharply at scale. MuleSoft and Informatica deals land $100K–$2M ARR at enterprise scope. dbt Cloud starts at $100/developer/month on top of a free open-source core.
  • 83% of data migration projects fail or exceed budget/timeline (Gartner). The predictable cause is underscoping remediation before AI workloads begin.

The Data Problem Is the AI Problem

The evidence is consistent across every credible analysis of enterprise AI ROI: model performance is downstream of data quality. O’Reilly reports data preparation consumes 60–80% of AI project time and 40–60% of project cost. Anaconda’s more recent survey of data scientists puts the time share at 45% — lower, but still the single largest line item in any serious deployment.

The framing that matters for a CFO: when a vendor quotes a $200K enterprise license for an AI platform, the honest TCO is closer to $400K–$500K once legacy data is actually ready to feed it. The majority of AI programs (only 5% capture substantial financial gains per BCG AI at Work 2025, n=10,635) typically stall here — at integration, not at inference.

Cost Structure by Industry

The published benchmarks are uneven. Healthcare has the most transparent cost data because EHR migrations are capital projects with RFP paper trails. Financial services and manufacturing cost curves are harder to pin down because most remediation happens inside existing ERP/core-banking programs and never gets isolated as a line item.

Healthcare (EPIC, Oracle Health / Cerner)

Epic commands 42.3% of the acute-care EHR market and manages over 305 million patient records (IntuitionLabs, 2025). Implementation costs range from $1M for small practices to $500M for large integrated delivery networks. Annual maintenance runs 15–20% of initial license cost.

Legacy data migration inside Epic uses Data Courier and HL7 batch loads; Oracle Health / Cerner uses HL7 ADT and FHIR where the source system supports it. Both require significant custom transformation work — there is no “lift and shift.” AI feature readiness (ambient scribe, clinical documentation automation) sits downstream of a clean migration, not alongside it.

Financial Services

Innoflexion’s 2025 Data Readiness Index places financial services organizations in the 0.65–0.74 range — strong source-system data quality, weak cross-system coherence due to multi-core-banking and multi-ERP sprawl. Above 0.75 is considered autonomous-AI-ready. 0.50–0.74 supports AI with human review checkpoints. Below 0.50 requires structured remediation before production AI.

The practical implication: most FS firms need a data fabric or data product layer before agent-based AI is safe to deploy. That layer is typically a $500K–$3M multi-quarter program before the AI tooling clock even starts.

Manufacturing

Manufacturing’s data-readiness problem is ERP/MES integration — OEE data in the MES, financial data in SAP or Oracle, quality data in a third system, none of them natively joined. Published TCO benchmarks are scarce because remediation tends to be bundled inside ERP modernization projects. Gartner’s 2025 data classifies 25% of ERP implementations as catastrophic failures; running parallel systems during a migration costs $50K–$200K/month in duplicated infrastructure alone.

The primary legacy-data problem in legal is document management system (DMS) preparation — iManage and NetDocuments as source systems, with inconsistent matter-level metadata, conflicting tagging schemas across practice groups, and privilege markers that must survive any AI ingestion. Remediation here is less about platform cost and more about matter-partner time to normalize taxonomy before ingestion. Published benchmarks are rare; real-world scope typically runs 6–12 months of part-time knowledge management work plus a $50K–$250K integration spend.

Build vs. Buy: The Integration Platform Landscape

Platform Entry Pricing Typical Enterprise ARR Best Fit
Fivetran $500/mo for 1M MAR $100K–$500K High-volume SaaS source ingestion
dbt Cloud $100/developer/mo $50K–$300K Analytics/transformation layer; pairs with Fivetran
MuleSoft No public pricing $100K–$500K Complex API orchestration, Salesforce stack
Informatica Enterprise-only $250K–$2M Regulated industries, governance-heavy

Fivetran’s 2026 pricing change bills at the connector level, which means cost grows faster than data volume as source systems multiply. This is the single largest hidden cost in mid-market Fivetran deployments — budget it explicitly.

Where Remediation Budgets Go Wrong

Three failure modes account for most of the 83% of data migrations Gartner reports over budget or failing outright:

  1. Remediation treated as an IT expense, not a capital project. Data cleanup without an executive sponsor and a dated charter drifts indefinitely. The work is tedious, thankless, and never ranked above whatever ticket came in yesterday.
  2. Buying the integration platform before scoping the data. Teams sign the Fivetran or Informatica contract, then discover the source systems require six months of normalization before the platform adds value. The meter runs; the value doesn’t.
  3. No target-state data model. Remediation with no reference architecture produces clean data in the shape of the legacy system. AI workloads then can’t use it without another remediation pass.

Key Data Points

Metric Value Source Date
Data prep share of AI project cost 40–60% O’Reilly 2025
Data prep share of AI project time 60–80% O’Reilly 2025
Data scientists’ time on data prep 45% Anaconda 2024
Companies starting AI without quality data 96% Optimus AI 2025
Unplanned remediation per project $10K–$90K Optimus AI 2025
Epic implementation range $1M–$500M DashTech 2025
Data migration project failure rate 83% Gartner 2025
ERP implementation catastrophic failure rate 25% Gartner 2025
Parallel ERP run cost during migration $50K–$200K/mo Software Modernization Services 2025
FS AI readiness range (DRI score) 0.65–0.74 Innoflexion 2025
AI-ready threshold 0.75+ Innoflexion 2025

What This Means for Your Organization

The question a CIO should ask before approving any AI platform contract: what is our data readiness score, and what would it cost to move it above the 0.75 threshold? If the answer is “we don’t know,” the AI project is not ready to start. Starting anyway is how pilots stall in POC — McKinsey’s November 2025 survey (n=1,933) puts two-thirds of enterprises still at that stage with only 6% capturing >5% EBIT impact.

The cost model to build internally: a 3-line TCO that separates (1) integration platform spend, (2) remediation labor (internal and external), and (3) target-state data product investment. Most teams collapse these into one line and lose the ability to forecast when each cost comes online. Healthcare, financial services, and manufacturing all have distinct remediation curves; pretending they are the same produces bad budgets.

If this raised questions about how to scope remediation specific to your industry and data environment, I welcome the conversation — brandon@brandonsneider.com.

Sources


Brandon Sneider | brandon@brandonsneider.com April 2026