Bringing Alloy and Ember to Databricks in DataForge 10.0
Last week, we introduced the Alloy Architecture: a structured refinement model designed to eliminate hidden layers and create predictable, repeatable workflows. We also unveiled Ember, the metadata catalog that defines Alloy’s behavior using clear, declarative configuration.
Today, we’re announcing how Alloy and Ember come to life inside DataForge 10.0 for Databricks. This release reflects a major reinvestment in our Databricks architecture, delivering Unity Catalog integration, table-native refinement, federated metadata access, and a modernized compute layer.
Table-Native Refinement Inside Databricks
Older versions of DataForge relied on file-based refinement stages. While functional, this approach made it harder for teams to understand how data moved through the system or to align with Databricks-native governance tools.
In DataForge 10.0, each stage of the Alloy Architecture (ORE, MINERAL, ALLOY, INGOT, and PRODUCT) is now represented as a Delta table or view. This change makes refinement far more transparent and intuitive. Teams can inspect intermediate results directly, query refinement stages using Databricks SQL, and leverage Unity Catalog controls without needing to interpret abstract internal flows.
The end result is an Alloy execution model that feels native on Databricks, with refinement exposed through standard lakehouse primitives rather than internal file structures.
Ember: The Structured Definition Layer Behind Alloy
Alloy’s consistency is enabled by Ember - the relational metadata repository that defines how refinement should behave across domains. Ember stores explicit definitions for:
Source interpretation and normalization
Change detection logic
Attribute relationships and enrichment rules
Merge behavior in INGOT
Output shaping and delivery patterns
This metadata is stored in a Postgres-backed relational schema designed specifically for Alloy’s five-layer model. Rather than inferring logic from code, Ember provides an authoritative, declarative description of how data should be transformed.
Ember Available Directly in Unity Catalog via Lakehouse Federation
One of the most meaningful enhancements in DataForge 10.0 is that Ember’s metadata can now be queried directly in Databricks. Through Lakehouse Federation, Ember’s catalog appears as Unity Catalog tables without requiring extra connectors or APIs. Teams can join metadata to operational data, build governance dashboards, or analyze domain relationships using standard SQL, all inside Databricks.
Incremental Processing Built Into the Architecture
Incremental refinement is traditionally one of the most complex aspects of building pipelines. Alloy makes it a native part of the design. MINERAL isolates new and changed records; ALLOY enriches this reduced dataset before it scales; and INGOT merges updates back into the full table. Because Ember defines attribute behavior across these layers, Alloy can push heavy transformations earlier in the flow and reduce the amount of data processed at each stage.
This built-in incremental pattern reduces compute costs, speeds up refresh cycles, and eliminates the custom branching logic that often accumulates in traditional pipelines.
Performance and Developer Experience Improvements
DataForge 10.0 includes a series of enhancements designed to improve reliability and performance on Databricks:
Faster hub table checks using metadata-driven pruning
Smarter Delta merge strategies with reduced conflicts
More responsive connection tests to cut down on debugging time
Quicker Data Profile introspection across new sources
Improved job orchestration with clearer stage boundaries
These improvements shorten iteration loops and deliver a smoother developer experience.
Deep Alignment with Unity Catalog
Unity Catalog has become the governance backbone of Databricks deployments, and DataForge 10.0 aligns tightly with that model. New workspaces build hub tables natively under UC, enabling consistent lineage, access control, and oversight. Existing Hive deployments can be upgraded smoothly, bringing old environments into the new governance model without major disruptions.
DataForge now fits naturally into UC-driven lakehouse architectures, offering:
Unified governance across both data (Delta) and metadata (Ember)
Easier auditing and exploration
Consistent behavior across environments
Modernized Compute Layer
We’ve also refreshed the compute experience:
Support for DBR 16.4 LTS
Updated instance type recommendations
Ability to restart Talos directly from the UI
Terminology aligned with Databricks (“Compute” replacing “Clusters”)
These updates simplify administration and ensure a modern operational foundation.
A Unified Databricks Experience
Alloy provides the structure. Ember provides the metadata. Databricks provides the runtime and governance environment. Together, they create a version of DataForge that is more transparent, more predictable, and easier to operate than ever before.
Every stage is inspectable.
Every definition is queryable.
Every refinement behaves consistently across domains.
And this is only the first half of the 10.0 rollout.
What’s Coming Next
Later this week, we’ll extend Alloy and Ember to a new ecosystem as part of the second major platform announcement for DataForge 10.0.
Thursday’s announcement completes the release. Stay tuned!