Checks and Data Quality¶

Fabricks provides built-in mechanisms to validate data before and/or after a job runs. Checks can enforce row counts, equality constraints, and custom SQL contracts that return actions and messages.

Configuring checks¶

Enable checks per job using check_options:

- job:
    step: gold
    topic: check
    item: max_rows
    options: { mode: complete }
    check_options:
      pre_run: true         # run checks before writing
      post_run: true        # run checks after writing
      min_rows: 10          # minimum expected row count
      max_rows: 10000       # maximum expected row count
      count_must_equal: fabricks.dummy  # enforce row count equality with another table
      skip: false           # skip all checks if true

Supported options: - pre_run/post_run: Run checks around the execution of the job. - min_rows/max_rows: Row count bounds. - count_must_equal: Assert row count equals a reference table. - skip: Disable checks for this job.

Catalog of checks¶

These are the built-in switches you can use in check_options. Built-in checks evaluate simple properties of the dataset and will fail the job on violation. Use SQL contracts when you need richer logic or to emit warnings instead of hard failures.

Name	Type	Default	Required	Description	Failure behavior
pre_run	boolean	false	no	Run SQL contracts and built-in checks before the write.	Contract: `__action` controls outcome.
post_run	boolean	false	no	Run SQL contracts and built-in checks after the write.	Contract: `__action` controls outcome.
min_rows	integer	unset	no	Minimum expected row count of the dataset being written.	Fail if `count < min_rows`.
max_rows	integer	unset	no	Maximum allowed row count of the dataset being written.	Fail if `count > max_rows`.
count_must_equal	string (table/view)	unset	no	Require row count to equal a reference table or view.	Fail if counts differ.
skip	boolean	false	no	Disable all checks for this job.	No checks executed.

Examples¶

Bounds only:

check_options:
  pre_run: true
  min_rows: 10
  max_rows: 100000

Equality against a reference:

check_options:
  post_run: true
  count_must_equal: analytics.dim_customer

Mixing built-ins with SQL contracts: - Use pre_run/post_run to enable contract execution and add your contract SQL files (see “SQL contracts” below).

SQL contracts¶

When pre_run or post_run checks are enabled, provide SQL files named after the table or job: - table_name.pre_run.sql - table_name.post_run.sql

Each SQL must return: - __action: either 'fail' or 'warning' - __message: human-readable message

Example:

-- fail.pre_run.sql
select "fail" as __action, "Please don't fail on me :(" as __message;

-- warning.post_run.sql
select "warning" as __action, "I want you to warn me !" as __message;

Behavior and rollback¶

For physical tables (append/complete/update modes), a failed check triggers an automatic rollback to the previous successful version.
For memory mode (views), results are not persisted and failures are logged only.

CI integration¶

Checks produce two observable signals that are useful in CI and orchestration systems:

Exit code
Any hard failure (built-in bound violations or a contract returning __action = 'fail') causes the job to exit with a non‑zero status.
For physical tables (append/complete/update modes), failures also trigger an automatic rollback to the previous successful version.
Most CI systems will mark the pipeline as failed automatically on non‑zero exit codes.
Log messages
Contracts should emit a single row with __action and __message. These are logged so you can surface them in CI logs and artifacts.
When __action = 'warning', the job continues and exits with code 0. Ensure your CI surfaces logs so warnings are visible in pull requests.

Example contract outputs (see “SQL contracts”):

-- Fails the job and rolls back (physical tables)
select 'fail'    as __action, 'Row count below threshold' as __message;

-- Logs a warning; job continues
select 'warning' as __action, 'Null rate above target'    as __message;

Tips: - Prefer post_run for checks that must consider the final persisted state. - Use built-in bounds for simple invariants; reserve contracts for multi-table logic and nuanced policies. - If you need to gate on warnings, implement a small CI step that scans logs for known warning markers and decides policy for your repository.

Examples¶

Working scenarios are available in the sample runtime:
Browsable: Sample runtime
Rich integration scenarios (repository layout):
framework/tests/integration/runtime/gold/gold/

Steps: Bronze • Silver • Gold
Table properties and physical layout: Table Options
Custom logic integration: Extenders, UDFs & Parsers

Checks and Data Quality¶

Configuring checks¶

Catalog of checks¶

Examples¶

SQL contracts¶

Behavior and rollback¶

CI integration¶

Examples¶

Related topics¶