Gold Step Reference¶

Gold steps produce consumption-ready models, usually via SQL. Semantic applies table properties/metadata.

Modes¶

Mode	Description
memory	View-only result; no table is written.
append	Append rows to the target table.
complete	Full refresh/overwrite of the target table.
update	Merge/upsert semantics (typically used with CDC).
invoke	Run a notebook instead of SQL (configure via `invoker_options`).

See Gold CDC input fields below for required __ columns when using SCD.

Quick CDC overview - Strategies: nocdc, scd1, scd2 - SCD1: uses soft-delete flags __is_current, __is_deleted - SCD2: uses validity windows __valid_from, __valid_to - Gold inputs and operations: __operation values are 'upsert' | 'delete' | 'reload'. If omitted for SCD in mode: update, Fabricks injects 'reload' and enables rectification automatically

See the CDC reference for details and examples: Change Data Capture (CDC)

Options at a glance¶

Option	Purpose
type	`default` vs `manual` (manual: you manage persistence yourself).
mode	One of: `memory`, `append`, `complete`, `update`, `invoke`.
change_data_capture	CDC strategy: `nocdc` \| `scd1` \| `scd2`.
update_where	Additional predicate limiting updates during merges.
parents	Explicit upstream dependencies for scheduling/recomputation.
deduplicate	Drop duplicate keys in the result before writing.
persist_last_timestamp	Persist the last processed timestamp for incremental loads.
correct_valid_from	Adjust SCD2 timestamps that would otherwise start at a sentinel date.
table	Target table override (useful for semantic/table-copy scenarios).
table_options	Delta table options and metadata (identity, clustering, properties, comments).
spark_options	Per-job Spark session/SQL options mapping.
udfs	Path to UDFs registry to load before executing the job.
check_options	Configure DQ checks (pre_run, post_run, max_rows, min_rows, count_must_equal, skip).
notebook	Mark job to run as a notebook (used with `mode: invoke`).
invoker_options	Configure `pre_run` / `run` / `post_run` notebook execution and arguments.
requirements	If true, install/resolve additional requirements for this job.
timeout	Per-job timeout seconds (overrides step defaults).

Option matrix (types • defaults • required)¶

Option	Type	Default	Required	Description
type	enum: default, manual	default	no	`manual` disables auto DDL/DML; you manage persistence yourself.
mode	enum: memory, append, complete, update, invoke	—	yes	Processing behavior.
change_data_capture	enum: nocdc, scd1, scd2	nocdc	no	CDC strategy applied when writing (effective in `mode: update`).
update_where	string (SQL predicate)	—	no	Additional predicate to limit rows affected during merge/upsert.
parents	array[string]	—	no	Upstream dependencies to enforce scheduling/order.
deduplicate	boolean	false	no	Drop duplicate keys in the result prior to write.
persist_last_timestamp	boolean	false	no	Persist last processed timestamp for incremental loads.
correct_valid_from	boolean	—	no	Adjust SCD2 start timestamps when needed to avoid sentinel starts.
table	string	—	no	Target table override (useful for semantic/table-copy scenarios).
table_options	map	—	no	Delta table options/metadata (identity, clustering, properties, comments).
spark_options	map	—	no	Per-job Spark session/SQL options.
udfs	string (path)	—	no	Path to UDFs registry to load before executing the job.
check_options	map	—	no	Data quality checks (see Checks & Data Quality).
notebook	boolean	—	no	Mark job to run as a notebook (used with `mode: invoke`).
invoker_options	map	—	no	Configure `pre_run` / `run` / `post_run` notebook execution and arguments.
requirements	boolean	false	no	Resolve/install additional requirements for this job.
timeout	integer (seconds)	—	no	Per-job timeout; overrides step defaults.

Field reference¶

Core
- type: default vs manual. Manual means Fabricks will not auto-generate DDL/DML; you control persistence.
- mode: Processing behavior (memory, append, complete, update, invoke).
- timeout: Per-job timeout seconds; overrides step defaults.
- requirements: Resolve/install additional requirements for this job.
CDC
- change_data_capture: nocdc, scd1, or scd2. Governs merge semantics when mode: update.
Incremental & merge
- update_where: Predicate that constrains rows affected during merge/upsert.
- deduplicate: Drop duplicate keys prior to write.
- persist_last_timestamp: Persist last processed timestamp to support incremental loads.
- correct_valid_from: Adjusts start timestamps for SCD2 validity windows when needed.
Dependencies & targets
- parents: Explicit upstream jobs that must complete before this job runs.
- table: Target table override (commonly used when coordinating with Semantic/table-copy flows).
Table & Spark
- table_options: Delta table options and metadata (e.g., identity, clustering, properties, comments).
- spark_options: Per-job Spark session/SQL options mapping.
UDFs
- udfs: Path to UDFs registry to load before executing the job.
Checks
- check_options: Configure DQ checks (e.g., pre_run, post_run, max_rows, min_rows, count_must_equal, skip).
Notebook invocation
- notebook: When coupled with mode: invoke, mark this job to run a notebook.
- invoker_options: Configure pre_run, run, and post_run notebooks and pass arguments.

Notes:

Memory outputs ignore columns starting with __ (e.g., __it_should_not_be_found).

Minimal examples

Gold SCD1 update

- job:
    step: gold
    topic: scd1
    item: update
    options:
      change_data_capture: scd1
      mode: update

Gold full refresh

- job:
    step: gold
    topic: demo
    item: hello_world
    options:
      mode: complete

Scoped merge with update_where

- job:
    step: gold
    topic: scd1
    item: scoped_update
    options:
      mode: update
      change_data_capture: scd1
      update_where: "region = 'EMEA'"   # limit merge to a subset

Dependencies

with cte as (select d.time, d.hour from gold.dim_time d)
select
  udf_key(array(f.id, d.time)) as __key,
  f.id as id,
  f.monarch as monarch,
  s.__source as role,
  f.value as value,
  d.time as time
from cte d
cross join transf.fact_memory f
left join silver.king_and_queen_scd1__current s on f.id = s.id
where d.hour = 10

Invoke (notebooks)

- job:
    step: gold
    topic: invoke
    item: post_run
    options: { mode: memory }
    invoker_options:
      post_run:
          - notebook: gold/gold/invoke/post_run/exe
          arguments: { arg1: 1, arg2: 2, arg3: 3 }
- job:
    step: gold
    topic: invoke
    item: notebook
    options: { mode: invoke }
    invoker_options:
      run:
        notebook: gold/gold/invoke/notebook
        arguments: { arg1: 1, arg2: 2, arg3: 3 }

Checks and DQ

- job:
    step: gold
    topic: check
    item: max_rows
    options: { mode: complete }
    check_options: { max_rows: 2 }

Check SQL contracts

-- fail.pre_run.sql
select "fail" as __action, "Please don't fail on me :(" as __message

-- warning.post_run.sql
select "I want you to warn me !" as __message, "warning" as __action

Additional examples

# Fact with table options and Spark options
- job:
    step: gold
    topic: fact
    item: option
    options:
      mode: complete
    table_options:
      identity: true
      liquid_clustering: true
      cluster_by: [monarch]
      properties:
        country: Belgium
      comment: Strength lies in unity
    spark_options:
      sql:
        spark.databricks.delta.properties.defaults.autoOptimize.optimizeWrite: false
        spark.databricks.delta.properties.defaults.enableChangeDataFeed: true

-- order_duplicate.sql
select 1 as __key, 1 as dummy, 1 as __order_duplicate_by_desc
union all
select 1 as __key, 2 as dummy, 2 as __order_duplicate_by_desc

Gold options¶

See Options at a glance and Field reference above.

CDC input fields for gold jobs¶

scd2 (required): __key, __timestamp, __operation ('upsert'|'delete'|'reload').
scd1 (required): __key; optional __timestamp/__operation ('upsert'|'delete'|'reload') for delete/rectify handling. If __operation is omitted for SCD in mode: update, Fabricks injects 'reload' and enables rectification.
Optional helpers: __order_duplicate_by_asc / __order_duplicate_by_desc, __identity.
See: Change Data Capture (CDC) for full details and examples.

Gold jobs¶

SCD1/SCD2 with modes and options:

- job:
    step: gold
    topic: scd1
    item: update
    options:
      change_data_capture: scd1
      mode: update
- job:
    step: gold
    topic: scd1
    item: last_timestamp
    options:
      change_data_capture: scd1
      mode: update
      persist_last_timestamp: true
- job:
    step: gold
    topic: scd2
    item: correct_valid_from
    options:
      change_data_capture: scd2
      mode: memory
      correct_valid_from: false

Example SCD2 SQL using __key, __timestamp, __operation:

Note: Credit — Temporal Snapshot Fact Table (SQLBits 2012). Recommended to watch to understand SCD2 modeling in Gold.

Slides: Temporal Snapshot Fact Tables (slides)

with dates as (
  select id as id, __valid_from as __timestamp, 'upsert' as __operation from silver.monarch_scd2 where __valid_from > '1900-01-02'
  union
  select id as id, __valid_to as __timestamp, 'delete' as __operation from silver.monarch_scd2 where __is_deleted
)
select
  d.id as __key,
  s.id as id,
  s.name as monarch,
  s.doubleField as value,
  d.__operation,
  if(d.__operation = 'delete', d.__timestamp + interval 1 second, d.__timestamp) as __timestamp
from dates d
left join silver.monarch_scd2 s on d.id = s.id and d.__timestamp between s.__valid_from and s.__valid_to

When to model SCD2 in Gold (different grain)¶

Use SCD2 in Gold when the base SCD2 tables do not align with the required consumption grain, and you need to derive a slowly changing history at a coarser/different grain (e.g., roll up line-item SCD2 to order-level SCD2, or derive a customer segment SCD2 from underlying attribute histories).

Example: roll up line-item SCD2 (silver.order_item_scd2) to order-level SCD2 (total_amount by order_id)

- job:
    step: gold
    topic: order
    item: total_amount_scd2
    options:
      mode: update
      change_data_capture: scd2

-- Derive Gold SCD2 change points (upserts/deletes) at the order_id grain
with change_points as (
  select order_id as __key, __valid_from as __timestamp, 'upsert' as __operation
  from silver.order_item_scd2
  union
  select order_id as __key, __valid_to as __timestamp, 'delete' as __operation
  from silver.order_item_scd2
  where __is_deleted
  union
  -- article-level price change boundaries that fall within the order_item validity window
  select oi.order_id as __key, a.__valid_from as __timestamp, 'upsert' as __operation
  from silver.order_item_scd2 oi
  join silver.article_scd2 a
    on oi.article_id = a.article_id
    and a.__valid_from between oi.__valid_from and oi.__valid_to
  union
  select oi.order_id as __key, a.__valid_to as __timestamp, 'upsert' as __operation
  from silver.order_item_scd2 oi
  join silver.article_scd2 a
    on oi.article_id = a.article_id
    and a.__valid_to between oi.__valid_from and oi.__valid_to
)
-- Compute the order-level attributes as of each change point
select
  cp.__key,
  oh.customer_id as customer_id,
  seg.segment as customer_segment,
  sum(oi.amount) as total_amount,
  cp.__operation,
  if(cp.__operation = 'delete', cp.__timestamp + interval 1 second, cp.__timestamp) as __timestamp
from change_points cp
left join silver.order_item_scd2 oi
  on cp.__key = oi.order_id
  and cp.__timestamp between oi.__valid_from and oi.__valid_to
left join silver.order_header oh
  on cp.__key = oh.order_id
left join silver.customer_segment_scd2 seg
  on oh.customer_id = seg.customer_id
  and cp.__timestamp between seg.__valid_from and seg.__valid_to

Notes

The change_points CTE promotes underlying SCD2 intervals to the desired Gold grain by emitting 'upsert' at each interval start and 'delete' at interval end (or for deleted keys).
At each change point, compute the Gold attributes as-of that timestamp. Fabricks will render the SCD2 merge using __key, __timestamp, and __operation.
Article/price changes: extra unions in change_points add silver.article_scd2 validity boundaries (within each item window) so totals recompute when article prices change.
Joining extra tables:
Static/non-SCD2 tables (e.g., silver.order_header) can be joined directly on business keys (e.g., order_id).
SCD2 dimensions (e.g., silver.customer_segment_scd2) must be joined as-of the change point using a validity-window predicate: cp.__timestamp between seg.__valid_from and seg.__valid_to.
If your derived attribute can also disappear (e.g., no remaining items), the 'delete' operation correctly closes the last interval for that Gold key.

SCD2 with Aggregation Example (composite keys: order_id × customer_id)¶

This example demonstrates how to derive a Gold SCD2 stream with aggregation at a different grain than the source, by: - Computing change points (upsert at interval starts, delete at terminal interval ends) at the target business key grain. - Aggregating measures as of each change point over all source SCD2 segments that are valid at that timestamp. - Zeroing out measures at delete timestamps to correctly close intervals.

with
    -- 1) Establish target business key for Gold grain (order_id × customer_id)
    source_with_business_key as (
        select
            concat_ws('*', order_id, customer_id) as composite_business_key,
            *
        from silver.order_item_scd2
    ),

    -- 2) Identify terminal segments at the target grain
    --    For terminal segments we will emit a 'delete' change point at __valid_to.
    scd2_segments as (
        select
            if(
                coalesce(
                    lead(composite_business_key) over (
                        partition by __key
                        order by __valid_from
                    ),
                    composite_business_key
                ) = composite_business_key,
                __is_deleted,
                true
            ) as is_terminal_segment,
            *
        from source_with_business_key
    ),

    -- 3) Build change points at the Gold grain:
    --    - 'upsert' at each __valid_from
    --    - 'delete' at __valid_to only for terminal segments
    change_points as (
        select
            order_id,
            customer_id,
            __valid_from as change_timestamp,
            'upsert' as change_op
        from scd2_segments

        union

        select
            order_id,
            customer_id,
            __valid_to as change_timestamp,
            'delete' as change_op
        from scd2_segments
        where is_terminal_segment
    ),

    -- 4) Aggregate measures as of each change point over active segments
    snapshot_aggregates as (
        select
            concat_ws('*', d.order_id, d.customer_id) as __key,
            d.order_id,
            d.customer_id,
            d.change_timestamp as __timestamp,
            'upsert' as __operation,

            sum(
                if(d.change_op = 'delete' and d.change_timestamp = r.__valid_to, 0, r.item_qty)
            ) as total_item_qty,

            sum(
                if(d.change_op = 'delete' and d.change_timestamp = r.__valid_to, 0, r.item_amount)
            ) as total_item_amount,

            sum(
                if(d.change_op = 'delete' and d.change_timestamp = r.__valid_to, 0, r.item_cost)
            ) as total_item_cost,

            sum(
                if(d.change_op = 'delete' and d.change_timestamp = r.__valid_to, 0, r.pending_item_qty)
            ) as pending_total_item_qty,

            sum(
                if(d.change_op = 'delete' and d.change_timestamp = r.__valid_to, 0, r.pending_item_amount)
            ) as pending_total_item_amount,

            sum(
                if(d.change_op = 'delete' and d.change_timestamp = r.__valid_to, 0, r.pending_item_cost)
            ) as pending_total_item_cost
        from change_points d
        inner join scd2_segments r
            on d.change_timestamp between r.__valid_from and r.__valid_to
           and d.customer_id = r.customer_id
           and d.order_id = r.order_id
        group by d.order_id, d.customer_id, d.change_timestamp, d.change_op
    )
select *
from snapshot_aggregates;

Notes - composite_business_key ensures a clear target-grain identity for reasoning about terminal segments. - is_terminal_segment marks final intervals at the target grain, causing a 'delete' change point at __valid_to. - At delete timestamps, measures are zeroed to close the interval cleanly and avoid double counting. - The output conforms to Gold SCD2 inputs: (__key, __timestamp, __operation), with __operation fixed to 'upsert' as Fabricks handles rectification in update mode. - Adjust function names (if/concat_ws) to your SQL engine equivalents if needed.

Fact options: table options, clustering, properties, and Spark options:

- job:
    step: gold
    topic: fact
    item: option
    options:
      mode: complete
    table_options:
      identity: true
      liquid_clustering: true
      cluster_by: [monarch]
      properties:
        country: Belgium
      comment: Strength lies in unity
    spark_options:
      sql:
        spark.databricks.delta.properties.defaults.autoOptimize.optimizeWrite: false
        spark.databricks.delta.properties.defaults.enableChangeDataFeed: true

Example SQL for this job:

select id as id, name as monarch, doubleField as value from silver.monarch_scd1__current

Manual, memory, append, overwrite:

- job: { step: gold, topic: fact, item: manual, options: { type: manual, mode: complete } }
- job: { step: gold, topic: fact, item: memory, options: { mode: memory } }
- job: { step: gold, topic: fact, item: append, options: { mode: append } }
- job: { step: gold, topic: dim, item: overwrite, options: { change_data_capture: scd1, mode: update }, table_options: { identity: true } }

Memory outputs ignore columns starting with __ (e.g., __it_should_not_be_found).

Deduplicate and order-by-duplicate examples:

- job:
    step: gold
    topic: fact
    item: deduplicate
    options:
      mode: complete
      deduplicate: true

-- deduplicate.sql
select 1 as __key, 2 as dummy
union all
select 1 as __key, 1 as dummy

-- order_duplicate.sql
select 1 as __key, 1 as dummy, 1 as __order_duplicate_by_desc
union all
select 1 as __key, 2 as dummy, 2 as __order_duplicate_by_desc

SCD2 header-line change points with header closure (order_id × order_line_id)¶

This example shows how to generate SCD2 change points when combining header and line SCD2 sources, including: - Emitting upserts at each validity start across header and lines. - Emitting a delete for the header-only row as soon as a corresponding line exists (to close the sentinel header row). - Emitting deletes at validity end for deleted intervals. - Joining an additional SCD2 stream (e.g., item price) to ensure change points also reflect price boundaries.

with
  -- 1) Gather validity boundaries across header, line, and related SCD2 (price)
  hdr_line_dates as (
    select h.order_id, l.order_line_id, h.__valid_to, h.__valid_from, h.__is_deleted
    from silver.order_header_scd2 h
    left join silver.order_line_scd2 l
      on h.order_id = l.order_id
      and h.__valid_from between l.__valid_from and l.__valid_to

    union

    select order_id, order_line_id, __valid_to, __valid_from, __is_deleted
    from silver.order_line_scd2

    union

    -- ensure we have entries for the latest price window intersecting current header/line
    select h.order_id, l.order_line_id, ip.__valid_to, ip.__valid_from, false
    from silver.order_header_scd2__current h
    inner join silver.order_line_scd2__current l
      on h.order_id = l.order_id
    inner join silver.item_price_scd2 ip
      on ip.item_id = l.item_id
     and ip.price_list_id = h.price_list_id
     and ip.__is_current
  ),

  -- 2) Build change points
  change_points as (
    -- upsert at each validity start
    select order_id, order_line_id, __valid_from as __timestamp, 'upsert' as __operation
    from hdr_line_dates

    union

    -- as soon as a line exists, close the header-only row (order_line_id is NULL sentinel)
    select order_id, null as order_line_id, __valid_from as __timestamp, 'delete' as __operation
    from hdr_line_dates
    where order_line_id is not null

    union

    -- delete at validity end for deleted intervals
    select order_id, order_line_id, __valid_to as __timestamp, 'delete' as __operation
    from hdr_line_dates
    where __is_deleted
  )

-- 3) Project attributes as of each change point
select
  concat_ws('*', p.order_id, coalesce(p.order_line_id, -1)) as __key,
  p.order_id,
  p.order_line_id,

  -- header attributes (examples)
  h.customer_id,
  h.price_list_id,
  h.order_date,

  -- line attributes (examples)
  l.item_id,
  l.quantity,
  l.unit_price,

  -- CDC fields
  p.__operation,
  p.__timestamp
from change_points p
left join silver.order_header_scd2 h
  on p.order_id = h.order_id
 and p.__timestamp between h.__valid_from and h.__valid_to
left join silver.order_line_scd2 l
  on p.order_id = l.order_id
 and coalesce(p.order_line_id, -1) = coalesce(l.order_line_id, -1)
 and p.__timestamp between l.__valid_from and l.__valid_to

Notes - The null-sentinel header row (order_line_id = null -> coerced to -1 in __key) is closed via a 'delete' when any line appears at the same boundary. - Additional SCD2 inputs (e.g., item_price_scd2) can be unioned into the dates set to force change points whenever related dimensions change. - The output stream conforms to Gold SCD2 input fields: __key, __timestamp, __operation.

Next steps: Table Options
Data quality: Checks & Data Quality
Extensibility: Extenders, UDFs & Views
Sample runtime: Sample runtime