Bronze, Silver, Gold: What the Medallion Architecture Actually Takes to Build

Brandon Sneider · April 2026

The medallion architecture is a data design pattern, not a product.

Executive Summary

The medallion architecture (bronze = raw, silver = cleaned, gold = business-ready) is the de facto standard pattern for preparing data for AI and analytics. Microsoft, Databricks, Snowflake, and every major systems integrator pitch it.
Vendor decks promise “48 hours from bronze to gold” on a no-code platform. Real mid-market implementations take 6 to 18 months, and most companies stall at silver — the layer where deduplication, entity resolution, and business-rule definition actually live.
The hard work is not the pipeline. It is deciding what a “customer” is across four systems, agreeing on which revenue number is the revenue number, and writing business rules that survive the CFO’s review. Tools do not decide these things. People do.
Gartner predicts 60% of AI projects unsupported by AI-ready data will be abandoned through 2026 (Feb 2025, n=248). The cause of death is almost always the silver layer.
For a 300–500 person company, a credible medallion program is 2 to 4 priority domains, 6 to 12 months to reliable gold on the first domain, and $400K–$1.2M in first-year cost across tooling, integration, and internal time. The honest answer is sequenced, not staged.

What Each Layer Actually Is

The medallion architecture is a data design pattern, not a product. Microsoft’s own documentation is explicit: “Following the medallion architecture is a recommended best practice but not a requirement” (Azure Databricks, updated March 2026).

Bronze (raw ingestion). Data lands in its original form from source systems — Salesforce, NetSuite, the warehouse management system, ticketing tool, website. No cleanup. No validation. Stored as append-only history so you can reprocess later. Databricks recommends storing fields as strings or VARIANT to survive schema changes upstream. Cost: mostly storage. Technical lift: moderate. This is the layer vendors demo in 30 minutes.

Silver (validated, deduplicated, joined). This is where the real work happens. The operations required: schema enforcement, null handling, deduplication, resolution of out-of-order and late-arriving records, type casting, joins across domains, and data-quality rule enforcement (Microsoft Learn, 2026). Entity resolution — deciding that “Acme Corp,” “ACME Corporation,” and “Acme, LLC” are the same customer — lives here. So does the business logic of “which Salesforce opportunity counts as pipeline.” These decisions are not technical. They require the CFO, the sales ops lead, and the CRO in the same room.

Gold (business-ready, aggregated). Dimensional modeling, aggregates, and performance optimization. Weekly sales, monthly recurring revenue, customer health scores, the metrics that drive dashboards and feed AI models. Gold often splits by domain — one gold layer for finance, one for sales, one for operations. Building gold is comparatively fast once silver is trustworthy. The constraint is always silver.

Realistic Timelines

Vendor marketing claims 48 hours from bronze to gold on the right platform (Nexla, 2025 blog). That claim is literal — you can stand up a three-layer pipeline in two days. What you cannot stand up in two days is a silver layer that your CFO will let you run the month-end close against.

Independent pattern, synthesized across Databricks/Microsoft documentation and the public consulting literature: traditional medallion implementations require “weeks or even months” even by vendor admission. For a mid-market company (200–2,000 employees, 10–50 source systems), the honest ranges are:

Bronze up and running across priority sources: 4–8 weeks. This is mostly connectors, schemas, and storage. Relatively commoditized.
Silver to reliable-enough-for-BI on first domain: 3–6 months. This is the grind. Entity resolution alone on customer data can take a full quarter when source systems disagree on what a customer ID is.
Gold for first domain (finance OR sales OR operations): 1–3 months after silver is stable. Fast compared to silver.
Full coverage across 4+ business domains: 18–24 months. Most mid-market companies never finish — they reach good-enough gold on their two most important domains and move on.

A 300-person company that starts in Q1 and expects AI-ready gold data across the business by end of year is off by a factor of two to three.

Where Projects Actually Stall

Gartner’s February 2025 research (n=248 data management leaders) predicts 60% of AI projects unsupported by AI-ready data will be abandoned through 2026. Underlying survey data shows where projects actually break:

61% cite data quality as the top challenge
62% report incomplete data
58% cite capture inconsistencies across source systems
57% complain about data integration issues
75% of leaders do not trust their data for decision-making

These are all silver-layer problems. Not bronze. Not gold. The layer where humans have to decide what the data means.

The second failure mode: over-engineering. The critique from practitioners (Reliable Data Engineering, 2026 analysis): three layers multiplies storage 3x, multiplies ETL jobs, multiplies monitoring surface area, and multiplies failure points. For a simple pipeline — a single source feeding a single dashboard — a two-layer approach ships in weeks and the medallion approach ships in months with no improvement in output quality. Mid-market companies frequently buy lakehouse architecture before they need it because a vendor sold them the ceiling instead of the floor.

What Gold Tier Actually Requires

The silver-to-gold checklist, from Microsoft Learn and Databricks documentation:

Silver requirements	Gold requirements
Schema enforcement and evolution	Dimensional model (facts, dimensions, measures)
Deduplication	Aggregates and materialized views
Null/missing value handling	Performance optimization (partitioning, Z-order)
Entity resolution across systems	Business-domain alignment
Late-arriving and out-of-order data resolution	Multiple domain-specific gold layers (HR, finance, sales)
Type casting and joins	Semantic layer for self-service analytics
Data quality rule enforcement	Materialized metrics definitions

Tools accelerate most of the left column. Tools cannot do the right column alone — business-logic definition requires the people who own the business process. This is why “we bought Databricks” and “our data is AI-ready” are two different statements six months apart.

Tooling Realities for Mid-Market

The pattern works at every scale, but the stack should not. A 300-person company does not need Databricks Unity Catalog, a dbt Cloud team account, Monte Carlo observability, and Atlan governance on day one. The honest mid-market stack:

Floor (under 500 employees, <10 core systems): Well-structured data warehouse (Snowflake, BigQuery, Redshift) + dbt Core + simple landing zone in S3/GCS = bronze/silver/gold logical tiers in one platform. First domain to reliable gold in 4–6 months, under $200K including internal time.
Middle (500–2,000 employees, 10–50 systems): Add a lakehouse (Databricks SQL, Microsoft Fabric, Snowflake with external tables), dbt, a data-quality tool (Great Expectations or Soda), and lightweight observability. $500K–$1.5M first year. Two to three domains in 12 months.
Ceiling (2,000+ employees, 50+ systems, multi-region): Full platform with catalog (Atlan/Collibra), observability (Monte Carlo), lineage, governance. The full stack most vendors pitch. Most mid-market companies do not belong here.

Key Data Points

Metric	Value	Source	Date
Gen AI projects abandoned if not supported by AI-ready data	60%	Gartner	Feb 2025
Gen AI POCs abandoned by end of 2025	30%+	Gartner	Jul 2024
Data leaders citing data quality as top challenge	61%	Gartner survey n=248	2024
Leaders who don’t trust their data for decisions	75%	Gartner	2024
Enterprises using 100+ data sources	79%	Nexla industry ref	2025
Data leaders “barely keeping the lights on”	37%	Nexla industry ref	2025
Storage multiplier for 3-layer medallion	3x	Reliable Data Engineering	2026
Vendor-claimed fastest deployment (no-code platform)	48 hours	Nexla (vendor)	2025
Realistic silver layer build for first domain (mid-market)	3–6 months	Synthesized from practitioner literature	2026
AI spending delayed into 2027 from data reorientation	25%	Forrester	2025

Source credibility: Gartner data — HIGH (independent analyst, large n). Microsoft Learn / Databricks documentation — MEDIUM (authoritative on the pattern, vendor on the tooling). Nexla blog numbers — LOW (vendor marketing; use only for industry context, not as operational benchmarks). The 3–6 month silver range is a synthesized estimate from the practitioner literature and is offered with appropriate caveat — the single biggest variable is how many source systems disagree on entity identity.

What This Means for Your Organization

The conversation to have before signing any data platform contract is not “bronze/silver/gold” — it is “which two business domains matter enough that we will fund the silver layer all the way through, and which domains are we explicitly deciding to leave in bronze for now?” Companies that pick two domains and finish them in a year look like AI winners eighteen months later. Companies that try to medallion the whole business end up with a half-built lakehouse, a tired data team, and no AI wins to show the board.

The silver layer is where your business decisions get encoded. That is not something you outsource to a systems integrator without committing the internal people — the CFO, the sales ops lead, the operations VP — who own the definitions. The tool decisions are reversible. The entity-resolution decisions are load-bearing for every downstream AI use case for years.

If you are being pitched a medallion architecture by a vendor or SI and the proposal promises “bronze to gold in 90 days” without a line item for CFO and business-unit time, the proposal is selling you a pipeline, not a capability. If you are looking at your existing data environment and trying to decide whether the answer is a new platform, a data-quality investment on top of what you have, or a focused silver-layer sprint on two domains — that is a conversation worth having before the budget decision, not after. If this raised questions specific to your situation, I’d welcome the conversation — brandon@brandonsneider.com.

Sources

Microsoft Learn / Azure Databricks, “What is the medallion lakehouse architecture?” (updated March 23, 2026). https://learn.microsoft.com/en-us/azure/databricks/lakehouse/medallion — MEDIUM credibility (authoritative pattern documentation, vendor on tooling).
Gartner, “Lack of AI-Ready Data Puts AI Projects at Risk” (February 26, 2025, n=248). https://www.gartner.com/en/newsroom/press-releases/2025-02-26-lack-of-ai-ready-data-puts-ai-projects-at-risk — HIGH credibility.
Gartner, “30% of Gen AI Projects Abandoned After POC by End of 2025” (July 29, 2024). https://www.gartner.com/en/newsroom/press-releases/2024-07-29-gartner-predicts-30-percent-of-generative-ai-projects-will-be-abandoned-after-proof-of-concept-by-end-of-2025 — HIGH credibility.
Reliable Data Engineering, “Medallion Architecture: Is It Still Relevant in 2026?” Medium, 2026. https://medium.com/@reliabledataengineering/medallion-architecture-bronze-silver-gold-is-it-still-relevant-in-2026-5e616fc03245 — MEDIUM credibility (practitioner analysis, independent).
Nexla, “From Raw to Gold in 48 Hours: Building a Modern Medallion Architecture” (2025). https://nexla.com/blog/building-a-modern-medallion-architecture/ — LOW credibility (vendor marketing; cited for industry-context numbers only).
Databricks, “What is Medallion Architecture?” glossary. https://www.databricks.com/glossary/medallion-architecture — MEDIUM credibility (vendor on pattern).

Brandon Sneider | brandon@brandonsneider.com April 2026