Executive Summary
- The medallion architecture (bronze = raw, silver = cleaned, gold = business-ready) is the de facto standard pattern for preparing data for AI and analytics. Microsoft, Databricks, Snowflake, and every major systems integrator pitch it.
- Vendor decks promise “48 hours from bronze to gold” on a no-code platform. Real mid-market implementations take 6 to 18 months, and most companies stall at silver — the layer where deduplication, entity resolution, and business-rule definition actually live.
- The hard work is not the pipeline. It is deciding what a “customer” is across four systems, agreeing on which revenue number is the revenue number, and writing business rules that survive the CFO’s review. Tools do not decide these things. People do.
- Gartner predicts 60% of AI projects unsupported by AI-ready data will be abandoned through 2026 (Feb 2025, n=248). The cause of death is almost always the silver layer.
- For a 300–500 person company, a credible medallion program is 2 to 4 priority domains, 6 to 12 months to reliable gold on the first domain, and $400K–$1.2M in first-year cost across tooling, integration, and internal time. The honest answer is sequenced, not staged.
What Each Layer Actually Is
The medallion architecture is a data design pattern, not a product. Microsoft’s own documentation is explicit: “Following the medallion architecture is a recommended best practice but not a requirement” (Azure Databricks, updated March 2026).
Bronze (raw ingestion). Data lands in its original form from source systems — Salesforce, NetSuite, the warehouse management system, ticketing tool, website. No cleanup. No validation. Stored as append-only history so you can reprocess later. Databricks recommends storing fields as strings or VARIANT to survive schema changes upstream. Cost: mostly storage. Technical lift: moderate. This is the layer vendors demo in 30 minutes.
Silver (validated, deduplicated, joined). This is where the real work happens. The operations required: schema enforcement, null handling, deduplication, resolution of out-of-order and late-arriving records, type casting, joins across domains, and data-quality rule enforcement (Microsoft Learn, 2026). Entity resolution — deciding that “Acme Corp,” “ACME Corporation,” and “Acme, LLC” are the same customer — lives here. So does the business logic of “which Salesforce opportunity counts as pipeline.” These decisions are not technical. They require the CFO, the sales ops lead, and the CRO in the same room.
Gold (business-ready, aggregated). Dimensional modeling, aggregates, and performance optimization. Weekly sales, monthly recurring revenue, customer health scores, the metrics that drive dashboards and feed AI models. Gold often splits by domain — one gold layer for finance, one for sales, one for operations. Building gold is comparatively fast once silver is trustworthy. The constraint is always silver.
Realistic Timelines
Vendor marketing claims 48 hours from bronze to gold on the right platform (Nexla, 2025 blog). That claim is literal — you can stand up a three-layer pipeline in two days. What you cannot stand up in two days is a silver layer that your CFO will let you run the month-end close against.
Independent pattern, synthesized across Databricks/Microsoft documentation and the public consulting literature: traditional medallion implementations require “weeks or even months” even by vendor admission. For a mid-market company (200–2,000 employees, 10–50 source systems), the honest ranges are:
- Bronze up and running across priority sources: 4–8 weeks. This is mostly connectors, schemas, and storage. Relatively commoditized.
- Silver to reliable-enough-for-BI on first domain: 3–6 months. This is the grind. Entity resolution alone on customer data can take a full quarter when source systems disagree on what a customer ID is.
- Gold for first domain (finance OR sales OR operations): 1–3 months after silver is stable. Fast compared to silver.
- Full coverage across 4+ business domains: 18–24 months. Most mid-market companies never finish — they reach good-enough gold on their two most important domains and move on.
A 300-person company that starts in Q1 and expects AI-ready gold data across the business by end of year is off by a factor of two to three.
Where Projects Actually Stall
Gartner’s February 2025 research (n=248 data management leaders) predicts 60% of AI projects unsupported by AI-ready data will be abandoned through 2026. Underlying survey data shows where projects actually break:
- 61% cite data quality as the top challenge
- 62% report incomplete data
- 58% cite capture inconsistencies across source systems
- 57% complain about data integration issues
- 75% of leaders do not trust their data for decision-making
These are all silver-layer problems. Not bronze. Not gold. The layer where humans have to decide what the data means.
The second failure mode: over-engineering. The critique from practitioners (Reliable Data Engineering, 2026 analysis): three layers multiplies storage 3x, multiplies ETL jobs, multiplies monitoring surface area, and multiplies failure points. For a simple pipeline — a single source feeding a single dashboard — a two-layer approach ships in weeks and the medallion approach ships in months with no improvement in output quality. Mid-market companies frequently buy lakehouse architecture before they need it because a vendor sold them the ceiling instead of the floor.
What Gold Tier Actually Requires
The silver-to-gold checklist, from Microsoft Learn and Databricks documentation:
| Silver requirements | Gold requirements |
|---|---|
| Schema enforcement and evolution | Dimensional model (facts, dimensions, measures) |
| Deduplication | Aggregates and materialized views |
| Null/missing value handling | Performance optimization (partitioning, Z-order) |
| Entity resolution across systems | Business-domain alignment |
| Late-arriving and out-of-order data resolution | Multiple domain-specific gold layers (HR, finance, sales) |
| Type casting and joins | Semantic layer for self-service analytics |
| Data quality rule enforcement | Materialized metrics definitions |
Tools accelerate most of the left column. Tools cannot do the right column alone — business-logic definition requires the people who own the business process. This is why “we bought Databricks” and “our data is AI-ready” are two different statements six months apart.
Tooling Realities for Mid-Market
The pattern works at every scale, but the stack should not. A 300-person company does not need Databricks Unity Catalog, a dbt Cloud team account, Monte Carlo observability, and Atlan governance on day one. The honest mid-market stack:
- Floor (under 500 employees, <10 core systems): Well-structured data warehouse (Snowflake, BigQuery, Redshift) + dbt Core + simple landing zone in S3/GCS = bronze/silver/gold logical tiers in one platform. First domain to reliable gold in 4–6 months, under $200K including internal time.
- Middle (500–2,000 employees, 10–50 systems): Add a lakehouse (Databricks SQL, Microsoft Fabric, Snowflake with external tables), dbt, a data-quality tool (Great Expectations or Soda), and lightweight observability. $500K–$1.5M first year. Two to three domains in 12 months.
- Ceiling (2,000+ employees, 50+ systems, multi-region): Full platform with catalog (Atlan/Collibra), observability (Monte Carlo), lineage, governance. The full stack most vendors pitch. Most mid-market companies do not belong here.
Key Data Points
| Metric | Value | Source | Date |
|---|---|---|---|
| Gen AI projects abandoned if not supported by AI-ready data | 60% | Gartner | Feb 2025 |
| Gen AI POCs abandoned by end of 2025 | 30%+ | Gartner | Jul 2024 |
| Data leaders citing data quality as top challenge | 61% | Gartner survey n=248 | 2024 |
| Leaders who don’t trust their data for decisions | 75% | Gartner | 2024 |
| Enterprises using 100+ data sources | 79% | Nexla industry ref | 2025 |
| Data leaders “barely keeping the lights on” | 37% | Nexla industry ref | 2025 |
| Storage multiplier for 3-layer medallion | 3x | Reliable Data Engineering | 2026 |
| Vendor-claimed fastest deployment (no-code platform) | 48 hours | Nexla (vendor) | 2025 |
| Realistic silver layer build for first domain (mid-market) | 3–6 months | Synthesized from practitioner literature | 2026 |
| AI spending delayed into 2027 from data reorientation | 25% | Forrester | 2025 |
Source credibility: Gartner data — HIGH (independent analyst, large n). Microsoft Learn / Databricks documentation — MEDIUM (authoritative on the pattern, vendor on the tooling). Nexla blog numbers — LOW (vendor marketing; use only for industry context, not as operational benchmarks). The 3–6 month silver range is a synthesized estimate from the practitioner literature and is offered with appropriate caveat — the single biggest variable is how many source systems disagree on entity identity.
What This Means for Your Organization
The conversation to have before signing any data platform contract is not “bronze/silver/gold” — it is “which two business domains matter enough that we will fund the silver layer all the way through, and which domains are we explicitly deciding to leave in bronze for now?” Companies that pick two domains and finish them in a year look like AI winners eighteen months later. Companies that try to medallion the whole business end up with a half-built lakehouse, a tired data team, and no AI wins to show the board.
The silver layer is where your business decisions get encoded. That is not something you outsource to a systems integrator without committing the internal people — the CFO, the sales ops lead, the operations VP — who own the definitions. The tool decisions are reversible. The entity-resolution decisions are load-bearing for every downstream AI use case for years.
If you are being pitched a medallion architecture by a vendor or SI and the proposal promises “bronze to gold in 90 days” without a line item for CFO and business-unit time, the proposal is selling you a pipeline, not a capability. If you are looking at your existing data environment and trying to decide whether the answer is a new platform, a data-quality investment on top of what you have, or a focused silver-layer sprint on two domains — that is a conversation worth having before the budget decision, not after. If this raised questions specific to your situation, I’d welcome the conversation — brandon@brandonsneider.com.
Sources
- Microsoft Learn / Azure Databricks, “What is the medallion lakehouse architecture?” (updated March 23, 2026). https://learn.microsoft.com/en-us/azure/databricks/lakehouse/medallion — MEDIUM credibility (authoritative pattern documentation, vendor on tooling).
- Gartner, “Lack of AI-Ready Data Puts AI Projects at Risk” (February 26, 2025, n=248). https://www.gartner.com/en/newsroom/press-releases/2025-02-26-lack-of-ai-ready-data-puts-ai-projects-at-risk — HIGH credibility.
- Gartner, “30% of Gen AI Projects Abandoned After POC by End of 2025” (July 29, 2024). https://www.gartner.com/en/newsroom/press-releases/2024-07-29-gartner-predicts-30-percent-of-generative-ai-projects-will-be-abandoned-after-proof-of-concept-by-end-of-2025 — HIGH credibility.
- Reliable Data Engineering, “Medallion Architecture: Is It Still Relevant in 2026?” Medium, 2026. https://medium.com/@reliabledataengineering/medallion-architecture-bronze-silver-gold-is-it-still-relevant-in-2026-5e616fc03245 — MEDIUM credibility (practitioner analysis, independent).
- Nexla, “From Raw to Gold in 48 Hours: Building a Modern Medallion Architecture” (2025). https://nexla.com/blog/building-a-modern-medallion-architecture/ — LOW credibility (vendor marketing; cited for industry-context numbers only).
- Databricks, “What is Medallion Architecture?” glossary. https://www.databricks.com/glossary/medallion-architecture — MEDIUM credibility (vendor on pattern).
Brandon Sneider | brandon@brandonsneider.com April 2026