Alloy: A New Architecture for Declarative Data Engineering

Today, DataForge is excited to introduce the Alloy Architecture, the most significant upgrade to our platform since we first launched. Alloy represents a reengineered foundation for how data pipelines are structured, executed, and refined - addressing long-standing challenges with hidden layers, unpredictable performance, and inconsistent transformation patterns.

For years, data teams have been asked to deliver reliable, governed, high-quality data products using architectural patterns that look clean in diagrams but break apart in practice. Medallion-style models are often framed as three simple layers: Bronze, Silver, Gold - but real pipelines contain dozens or even hundreds of hidden transformation stages buried inside:

  • Common Table Expressions

  • Temp tables

  • dbt models and sub-models

  • Spark/Dataset transformations

  • User-defined staging tables

  • Ad-hoc orchestration logic

These implicit layers introduce operational risk, slow down teams, and make it nearly impossible to achieve end-to-end lineage or enforce consistent transformation logic across domains.

The Alloy Architecture solves this problem by introducing a structured, statically defined processing flow that eliminates hidden layers entirely.


A Five-Layer Refinement Model - Designed for Clarity and Consistency

At the heart of Alloy is a deterministic refinement sequence:

Each stage has one clear purpose, and every pipeline follows the same structured flow.

ORE

Unrefined data captured exactly as delivered from the source.

MINERAL

Purpose-built change detection isolates only the new or updated records.

ALLOY

Business logic, joins, and enrichment are applied only to the incremental batch - improving both performance and predictability.

INGOT

The enriched batch is merged into the full dataset through a consistent refinement process that ensures clean, canonical results.

PRODUCT

Final data outputs are materialized for analytics, operational systems, and downstream consumers.

This architecture makes the refinement process explicit, stable, and uniformly repeatable across every domain and every data source.


Incremental Processing - Built In, Not Bolted On

Traditional data engineering frameworks treat incremental loads as an optimization problem left to developers:

  • Custom merge logic

  • Timestamp parsing

  • Soft/hard delete handling

  • State tracking tables

  • Conditional pipeline branches

  • Fragile assumptions buried in code

This approach creates unnecessary variability and technical risk across domains.

Alloy takes a fundamentally different approach.

Incremental behavior is architecturally enforced, not optional.
MINERAL and ALLOY are designed to process only new or updated data, while INGOT performs the structured, full-dataset refinement.

This design provides:

  • Predictable scaling

  • No per-table incremental logic

  • Consistent change handling

  • Simplified failure recovery

  • Less custom code and fewer edge-case rules

Incrementalism isn’t an optimization. It’s a first-class architectural principle of Alloy.


Modernizing the Foundation - A Major Upgrade from DataForge’s Legacy Architecture

Prior versions of DataForge used a file-based refinement model, generating RAW, CDC, ENR, and HUB datasets as files in cloud storage.
This approach enabled rapid early adoption but introduced challenges around:

  • Navigability

  • Governance and metadata consistency

  • Managing partition evolution

  • Cross-layer relationships

  • Efficient incremental processing patterns

  • Integration with native lakehouse governance tools

Alloy modernizes this foundation by transitioning the entire refinement flow to a fully table-native architecture.

This upgrade improves:

  • Layer-to-layer transparency

  • Debugging and troubleshooting

  • Partitioning strategy consistency

  • Observability across the refinement process

  • Incremental execution reliability

  • Cross-domain structural alignment

This modernization is a major upgrade for DataForge customers and represents a generational shift in how the platform models and processes data.


How Alloy Relates to Medallion

The Medallion Architecture is intentionally broad and flexible, which is why it has been widely adopted across the industry. Its three-tier model (Bronze, Silver, Gold) can describe a wide range of pipeline patterns depending on how teams choose to implement it.

The Alloy Architecture is something different.

While Alloy can conceptually map to Medallion, it introduces:

  • Explicit, enforceable refinement stages

  • A deterministic sequence without hidden layers

  • Incremental processing built directly into the architecture

  • A uniform pattern that applies consistently across domains

Alloy is not a simple interpretation of Medallion, it is an evolution.

It is a new architectural model designed to solve a different set of problems: reducing hidden complexity, improving consistency, and enabling predictable, declarative data engineering.

We will publish a dedicated follow-on article that provides a deeper comparison between Alloy and Medallion, including where they align and where they meaningfully diverge.


Why Alloy Matters

1. No More Hidden Layers

A static, five-step process replaces scattered logical operations inside notebooks, CTEs, or ephemeral tables.

2. Predictable Performance from Structural Incrementalism

Incremental behavior is part of the architecture itself - not a pattern left to individual developers.

3. Consistency Across Every Pipeline

Each source, domain, and transformation path follows the same refinement model.

4. Better Governance and Debugging

Structured refinement produces cleaner, more discoverable lineage and easier troubleshooting.

5. A Foundation for Declarative Transformation

Less code. More structure. More repeatability. More resilience.


The Beginning of a New Chapter

The Alloy Architecture brings structure to what was previously ambiguous, simplicity to what was previously fragmented, and determinism to what was previously unpredictable.

This release marks a major milestone for DataForge - and the foundation for everything that comes next.

More updates later this week.

Previous
Previous

Introducing Ember: A Structured Data Catalog Built for the Alloy Architecture

Next
Next

DataForge Launches Talos AI and Cloud 9.0