Matthew Kosovec Matthew Kosovec

Bringing Alloy and Ember to Snowflake: DataForge Expands to a New Ecosystem

With DataForge 10.0, the Alloy Architecture and Ember metadata catalog now run natively on Snowflake. This release gives Snowflake users a predictable, governed refinement model, built-in incremental processing, and Snowpark-based extensibility while maintaining a unified development experience across platforms.

Read More
Matthew Kosovec Matthew Kosovec

Bringing Alloy and Ember to Databricks in DataForge 10.0

With DataForge 10.0, the Alloy Architecture and Ember metadata catalog are now implemented natively on Databricks. Refinement flows through Delta tables, metadata is queryable through Unity Catalog via Lakehouse Federation, and the entire pipeline becomes more transparent, governable, and scalable.

Read More
Matthew Kosovec Matthew Kosovec

Introducing Ember: A Structured Data Catalog Built for the Alloy Architecture

Ember reimagines the traditional data catalog as a declarative definition layer for modern pipelines. By storing explicit rules for refinement, enrichment, and merging, Ember drives Alloy’s structured five-layer architecture without relying on handwritten transformation code. The result is predictable execution, simpler governance, and no hidden intermediate logic. Ember is the metadata core of DataForge’s new architecture.

Read More
Matthew Kosovec Matthew Kosovec

Alloy: A New Architecture for Declarative Data Engineering

The Alloy Architecture introduces a structured, five-layer refinement model that eliminates hidden pipeline complexity. By replacing ad-hoc transformation logic with a consistent, predictable flow, Alloy brings clarity, performance, and governance to modern data engineering.

Read More
Alec Judd Alec Judd

DataForge Launches Talos AI and Cloud 9.0

DataForge today unveiled Cloud 9.0, a major platform update powered by Talos—its embedded AI agent that enables users to build data models, pipelines, and workflows using natural language. With Cloud 9.0, teams can now go from business question to production-ready data infrastructure in minutes—no code required.

Read More
Vadim Orlov Vadim Orlov

Refresh Strategies in DataForge

Discover the power of DataForge Cloud's refresh patterns to streamline your data pipelines. In this video, you'll learn about six key refresh methods: full refresh for initial dataset ingestion, append-only for incremental data updates, and advanced options like timestamp, sequence, and custom patterns for handling time-series data or unique scenarios. Watch as we demonstrate configurations, simulate dataset changes, and explore features like watermarks for tracking updates, historical data preservation, and atomic processing. Whether managing small datasets or complex time-series data, DataForge Cloud empowers you to optimize data transformations with precision and flexibility.

Read More
Joe Swanson Joe Swanson

Engineering Choices and Stage Design with Traditional ETL

In this demo, Joe Swanson, Co-founder and Lead Developer at DataForge, guides viewers through building a BI data model using the Coalesce ETL platform. He explains key stages of the process, such as defining data types, grouping customer data, and unpivoting item data for better reporting. Joe discusses crucial decision points, like when to use typed staging tables, group stages, or CTEs to optimize data transformations. He concludes by hinting at Part 2, where he will show how DataForge simplifies and automates these steps, making data modeling more efficient and reusable.

Read More
Vadim Orlov Vadim Orlov

Data Transformation at Scale: Rule Templates & Cloning

Vadim Orlov, CTO of DataForge, tackles common data transformation challenges like repetitive coding and platform complexity in this video. He introduces DataForge Cloud’s rule templates and cloning features to streamline data management through a DRY (Don’t Repeat Yourself) approach.

Vadim walks through setting up data connections, creating reusable rule templates across datasets, and calculating metrics like sale prices and totals. He then demonstrates configuring an output table for reporting and, when the company adds a subsidiary, shows how the cloning feature replicates configurations for new platforms effortlessly.

This demonstration reveals how DataForge Cloud’s tools save time and centralize code management, enabling efficient, scalable, and reusable data engineering without constant rewrites.

Read More
Vadim Orlov Vadim Orlov

Mastering Schema Evolution & Type Safety with DataForge

Schema changes are a common cause of pipeline failures. DataForge addresses this by focusing on type safety and schema evolution.

Type safety ensures reliable transformations through compile-time validation, preventing unexpected errors. Schema evolution automates handling of changes like new columns, data type updates, and nested structures.

With DataForge’s configurable strategies, such as upcasting and cloning, pipelines adapt smoothly to schema changes, reducing manual effort and improving reliability.

Read More
Joe Swanson Joe Swanson

Introducing Stream Processing in DataForge: Real-Time Data Integration and Enrichment

DataForge introduces Stream Processing, enabling seamless integration of real-time and batch data for dynamic, scalable pipelines. Leveraging Lambda Architecture, users can enrich streaming data with historical insights, facilitating comprehensive real-time analytics. Key features include Kafka integration, batch enrichment, and downstream processing. This advancement simplifies real-time data management, enhances analytics capabilities, and accelerates AI/ML applications, all within a fully managed, automated platform.

Read More
Vadim Orlov Vadim Orlov

Sub-Sources: Simplifying Complex Data Structures with DataForge

In DataForge Cloud 8.1, we introduced Sub-Sources, simplifying the handling of nested complex arrays (NCAs) like ARRAY<STRUCT<..>>. This feature allows you to use standard SQL syntax on NCAs without needing to normalize or modify the underlying data. Sub-Sources act as "virtual" tables, enabling easy transformations while preserving the original structure. This innovation saves time and effort for data engineers working with complex, semi-structured data.

Read More
Vadim Orlov Vadim Orlov

DataForge vs. Databricks Delta Live Tables for Change Data Capture

Check out our latest video where Vadim Orlov, CTO of DataForge, compares automating Change Data Capture (CDC) in DataForge Cloud versus Databricks Delta Live Tables. Discover how DataForge simplifies CDC processes, saving time and effort with automation, and watch a live demo showcasing its efficiency in real-world use cases.

Read More
Matthew Kosovec Matthew Kosovec

Introducing Our New Plus Subscription Plan: Elevate Your Data Engineering Capabilities

We’re excited to unveil our new Plus plan, tailored for startups and small enterprises. At just $400 per month, this plan offers a comprehensive suite of features including a dedicated DataForge workspace, up to 50 data sources, automated orchestration, and a browser-based IDE. Enjoy a 30-day free trial to experience its benefits firsthand. The Plus plan provides an excellent balance of functionality and affordability to support your data engineering needs and drive growth. Start your trial today and see how Plus can elevate your data operations!

Read More
Matthew Kosovec Matthew Kosovec

Introduction to the DataForge Framework Object Model

Part 2 of the DataForge blog series explores the implementation of the DataForge Core framework, which enhances data transformation through the use of column-pure and row-pure functions. It introduces the core components, such as Raw Attributes, Rules, Sources, and Relations, that streamline data engineering workflows and ensure code purity, extensibility, and easier management compared to traditional SQL-based approaches.

Read More
Matthew Kosovec Matthew Kosovec

Introduction to the DataForge Declarative Transformation Framework

Discover how to build better data pipelines with DataForge. Our latest article explores breaking down monolithic data engineering solutions with modular, declarative programming. Explore the power of column-pure and row-pure functions for more manageable and scalable data transformations.

Read More
Paula David Paula David

Introducing DataForge Core: The first functional code framework for data engineering

In the fast-paced world of data engineering, agility and efficiency are paramount. However, traditional approaches often fall short, leading to convoluted pipelines, skyrocketing costs, and endless headaches for data engineers. Enter DataForge Core – a game-changing open-source framework designed to streamline data transformations while adhering to modern software engineering best practices.

Read More