Introducing Ember: A Structured Data Catalog Built for the Alloy Architecture

Earlier this week, we introduced the Alloy Architecture - a new, structured refinement model designed to eliminate hidden layers and bring predictability to data engineering.

Today, we’re excited to introduce the system that makes Alloy possible:

Ember is DataForge’s structured data catalog and declarative metadata engine.

Ember is more than a catalog of tables and fields. It is the definition layer that tells Alloy exactly how each step of the refinement process should behave.

Where Alloy provides the architectural flow, Ember provides the detailed instructions that drive it - clearly, consistently, and without ambiguity.


Why Ember Is Different From Traditional Data Catalogs

Most data catalogs work the same way:
They scan SQL, code, notebooks, or dbt models to infer what pipelines are doing and attempt to document them after the fact.

This creates challenges most data teams know well:

  • Hidden logic scattered across CTEs, temp tables, scripts, and models

  • Catalogs drifting out of sync as code evolves

  • Lineage that describes what happened, not what should happen

  • Difficulty enforcing consistency across domains

  • Metadata that reflects intentions rather than guarantees

Ember is designed for the opposite model.

Instead of scanning pipelines, Ember stores the rules that create them.

It doesn’t observe developer logic.
It defines it — up front, in a structured, relational model.

This shift from descriptive to prescriptive metadata is what makes Ember fundamentally different from observability tools and traditional catalogs.


Ember as a New Kind of Data Catalog

Ember replaces the traditional “after-the-fact” catalog with a catalog that defines behavior, not just documentation.

Ember is:

  • A structured data catalog

  • A metadata engine

  • A definition layer for transformation rules

  • A control plane for the Alloy Architecture

  • A single source of truth for how data should be refined

It bridges the familiarity of a catalog with the precision of a declarative execution model.

This makes Ember both recognizable and entirely new.


What Ember Stores and Manages

Ember contains a set of metadata objects that describe:

  • What each data source looks like

  • How new and changed records should be detected

  • How attributes relate to one another

  • What validation or quality rules apply

  • Which enrichment steps should occur

  • How data should be merged and refined

  • What outputs should be produced for downstream systems

Each object has a well-defined purpose.
Each relationship is encoded explicitly.
Each rule is unambiguous.

Ember becomes the source of truth for how data is processed end-to-end — without relying on handwritten pipeline code or scattered configuration files.


How Ember Enables Alloy’s “No Hidden Layers” Approach

A major benefit of the Alloy Architecture is that it eliminates the hidden intermediate layers typically found in notebooks, models, and procedural pipelines.

This is only possible because Ember defines transformation behavior in a way that:

  • Keeps related logic together rather than spreading it across tables

  • Stores rules at the attribute level instead of burying them inside CTE chains

  • Separates incremental processing rules from full-dataset refinements

  • Provides clear instructions for how to enrich and merge data

  • Removes the need for ad-hoc staging tables and temporary constructs

In other words:

Ember centralizes transformation rules so that Alloy can execute them cleanly — without creating additional layers behind the scenes.

This is the foundation of Alloy’s predictability, and the core reason Ember needed to be invented.


Ember + Alloy: Definition + Execution

Alloy establishes a structured model for data refinement.
Ember provides the definitions that make that model consistent across every domain.

Together, they deliver:

  • Clear, predictable refinement behavior

  • Built-in incremental processing

  • Uniform patterns across all sources

  • Zero hidden layers or ad-hoc transformations

  • Faster onboarding of new data domains

  • Stronger governance and lineage

  • A foundation for true declarative data engineering

This pairing represents a major step forward in how data pipelines can be modeled and executed.


What’s Coming Next

If you’d like to explore the foundations behind Ember and the Alloy Architecture, we’ve published two in-depth introductions:

These articles provide background on the principles that guided the design of Ember’s metadata structures and Alloy’s refinement flow.

Next week, we’ll begin detailing the physical implementation of Alloy and Ember inside DataForge version 10.0 — starting with our Databricks experience early in the week, and concluding with a second major announcement on Thursday.

More to come soon.

Next
Next

Alloy: A New Architecture for Declarative Data Engineering