dbt Cheat Sheet

Quick reference for data build tool — models, tests, Jinja, CLI & more.

📁 Project Structure

# Typical dbt project layout
my_project/
├── dbt_project.yml          # project config
├── profiles.yml             # connection profiles (usually ~/.dbt/)
├── packages.yml             # third-party packages
├── models/
│   ├── staging/             # stg_* — raw → typed/renamed
│   │   ├── _sources.yml
│   │   └── stg_orders.sql
│   ├── intermediate/        # int_* — joins, business logic
│   └── marts/               # fct_* / dim_* — final tables
│       ├── _schema.yml
│       └── fct_orders.sql
├── tests/                   # singular SQL tests
├── macros/                  # reusable Jinja functions
├── seeds/                   # static CSV → tables
├── snapshots/               # SCD type-2 history
├── analyses/                # ad-hoc queries (not materialised)
└── docs/                    # doc blocks (*.md)

⌨️ CLI Commands

Core commands

Command	What it does
`dbt run`	Execute models (SQL)
`dbt test`	Run all tests
`dbt build`	seeds → snapshots → models → tests
`dbt compile`	Compile Jinja → SQL (no execution)
`dbt seed`	Load CSV seeds to warehouse
`dbt snapshot`	Execute snapshots
`dbt docs generate`	Build documentation site
`dbt docs serve`	Serve docs locally
`dbt debug`	Check connection & config
`dbt deps`	Install packages
`dbt source freshness`	Check source freshness against configured thresholds
`dbt clean`	Delete target/ & dbt_packages/

Useful flags

Flag	What it does
`--select / -s`	Run only the specified models (name, path, tag…)
`--exclude`	Exclude matching models from the selection
`--target / -t`	Use a specific profile target (dev, prod, ci…)
`--threads`	Number of models to build in parallel
`--full-refresh`	Drop and rebuild incremental models from scratch
`--defer`	Resolve unselected refs from another target (prod) instead of building them
`--state`	Path to a folder containing a reference `manifest.json` (used by `--defer` and `state:` selector)
`--vars`	Pass key/value variables at runtime: `--vars '{"key": "val"}'`
`--no-version-check`	Skip the dbt version compatibility check

Examples

# Build specific models on prod with more parallelism
dbt build -s stg_orders fct_orders --target prod

# Full refresh an incremental model, passing a runtime variable
dbt run -s fct_events --full-refresh --vars '{"start_date": "2024-01-01"}'

# Test a subset of models, excluding a known flaky one
dbt test -s stg_orders fct_orders --exclude stg_legacy

Slim CI pipeline

# Build only modified models and their descendants — avoids rebuilding everything
dbt build --project-dir "$DBT_PROJECT_DIR" \
          --target ci \
          --state state \
          --defer \
          --select "+state:modified+"

Prerequisite: the state/ folder must contain the production manifest.json before running (e.g. downloaded from an artifact store at the start of the pipeline). --defer resolves unselected models from the prod target instead of rebuilding them.

🎯 Selector Syntax

Graph operators

Syntax	Meaning
`model_name`	Exact model
`+model_name`	Model + all ancestors
`model_name+`	Model + all descendants
`+model_name+`	Full lineage graph
`2+model_name`	Model + 2 parent levels
`model_name+3`	Model + 3 child levels
`@model_name`	Model + ancestors + all descendants of ancestors

Selector methods

Method	Example
`tag:`	`tag:daily`
`path:`	`path:models/staging`
`config:`	`config.materialized:table`
`package:`	`package:dbt_utils`
`source:`	`source:raw_shopify+`
`exposure:`	`exposure:orders_dashboard+`
`fqn:`	`fqn:my_project.marts.*`
`state:`	`state:modified+`
`result:`	`result:error+`

Unions & intersections

Syntax	Meaning
`-s model_a model_b`	Union (space = OR)
`-s tag:daily,config.materialized:table`	Intersection models tagged daily and materialized as table (`,` = AND)

Examples

# Run all sources from raw_shopify and their descendants
dbt run -s source:raw_shopify+

# Build only models modified since last prod run
dbt build -s state:modified+ --state ./prod_artifacts

# Rerun models that errored in the previous run
dbt run -s result:error+ --state ./target

⚙️ Model Configuration

config() block

-- Top of any .sql model file
{{
  config(
    materialized  = 'table',
    schema        = 'marts',
    alias         = 'orders_final',
    tags          = ['daily', 'finance'],
    meta          = {'owner': 'data_team'},
    post_hook     = "grant select on {{ this }} to role reporter"
  )
}}

SELECT * FROM ...

Materialization types

Type	Behaviour	Use when
`view`	Virtual — no data stored	Default, cheap staging
`table`	Full rebuild each run	Marts, performance
`incremental`	Append/merge new rows	Large fact tables
`ephemeral`	CTE, no object created	Intermediate reuse

Tip: ephemeral models can't be ref()-ed across separate SQL files by downstream tools — use view if you need lineage visibility.

dbt_project.yml — global config

# Apply config to all models in a folder
models:
  my_project:
    staging:
      +materialized: view
      +schema: staging
    marts:
      +materialized: table
      +tags: ['mart']
    intermediate:
      +materialized: ephemeral

Tip: use +persist_docs to push model and column descriptions from schema.yml as native comments on warehouse objects (supported on Snowflake, BigQuery, Redshift…).

models:
  my_project:
    +persist_docs:
      relation: true   # model description → table comment
      columns:  true   # column descriptions → column comments

🔗 ref() & source()

ref() — model dependencies

-- Reference another model
SELECT *
FROM {{ ref('stg_orders') }}

-- Reference a model in another project (cross-project)
FROM {{ ref('other_project', 'stg_orders') }}

-- ref() resolves to the full relation name:
-- analytics.staging.stg_orders (database.schema.table)

Always use ref() instead of hard-coded table names — it builds the DAG and handles schema routing automatically.

source() — raw data

-- _sources.yml definition first:
sources:
  - name: raw_shopify
    database: raw
    schema: shopify
    tables:
      - name: orders
        loaded_at_field: _fivetran_synced
        freshness:
          warn_after: {count: 12, period: hour}
          error_after: {count: 24, period: hour}

-- Then in a model:
SELECT *
FROM {{ source('raw_shopify', 'orders') }}

# Test source freshness
dbt source freshness

✅ Testing

Generic tests (schema.yml)

Test	What it checks
`unique`	No duplicate values in the column
`not_null`	No null values in the column
`accepted_values`	All values belong to a defined list
`relationships`	Every value exists in another model's column (referential integrity)

models:
  - name: fct_orders
    columns:
      - name: order_id
        tests:
          - unique
          - not_null
      - name: status
        tests:
          - accepted_values:
              values: ['placed', 'shipped', 'returned']
      - name: customer_id
        tests:
          - relationships:
              to: ref('dim_customers')
              field: customer_id

Singular tests

Custom SQL files in tests/ that express business-specific assertions generic tests can't cover. dbt runs the query — any row returned = failure.

-- tests/assert_no_negative_revenue.sql
SELECT order_id, revenue
FROM {{ ref('fct_orders') }}
WHERE revenue < 0

Convention: name singular tests assert_*. More flexible than generic tests but not reusable across models.

🧩 Jinja & Macros

Jinja syntax

Tag	Purpose
`{{ expression }}`	Output a value
`{% statement %}`	Control flow (no output)
`{# comment #}`	Comment (not rendered)
`-` (dash)	Strip whitespace: `{{- val -}}`

-- Variables
{% set my_date = '2024-01-01' %}
{% set cols = ['a', 'b', 'c'] %}

-- Conditionals
{% if target.name == 'prod' %}
  WHERE is_test = false
{% endif %}

-- Loop
{% for col in cols %}
  sum({{ col }}) as total_{{ col }}
  {% if not loop.last %},{% endif %}
{% endfor %}

Macros

-- macros/cents_to_dollars.sql
{% macro cents_to_dollars(column_name, decimals=2) %}
  round({{ column_name }} / 100.0, {{ decimals }})
{% endmacro %}

-- Usage in a model:
SELECT
  order_id,
  {{ cents_to_dollars('amount_cents') }} as amount_usd
FROM {{ ref('stg_orders') }}

-- Macro with run_query (meta-programming)
{% macro get_column_values(table, column) %}
  {% set query %}
    SELECT DISTINCT {{ column }} FROM {{ table }}
  {% endset %}
  {% set results = run_query(query) %}
  {{ return(results.columns[0].values()) }}
{% endmacro %}

Built-in variables & functions

target.name: Active profile target (dev, prod…)
target.schema: Target schema
this: Current model relation (in hooks)
env_var('VAR'): Read environment variable
var('name', default): Project variable from dbt_project.yml
log('msg', info=True): Print to console
run_started_at: Timestamp of current run
execute: True during run, False during parse

📈 Incremental Models

Standard pattern

{{
  config(
    materialized = 'incremental',
    unique_key   = 'event_id',       # for merge/delete+insert
    on_schema_change = 'sync_all_columns'
  )
}}

SELECT
  event_id,
  user_id,
  event_at,
  event_type
FROM {{ source('raw', 'events') }}

{% if is_incremental() %}
  -- Only process new rows on incremental runs
  WHERE event_at > (
    SELECT max(event_at) FROM {{ this }}
  )
{% endif %}

Incremental strategies

Strategy	Behaviour	Needs unique_key
`append`	Insert new rows only	No
`merge`	Upsert on unique_key	Yes
`delete+insert`	Delete matching, then insert	Yes
`insert_overwrite`	Replace partitions (BigQuery)	No
`microbatch`	dbt 1.9+ batch processing	No

{{ config(
  materialized      = 'incremental',
  incremental_strategy = 'merge',
  unique_key        = ['order_id', 'item_id'],
  merge_update_columns = ['status', 'updated_at']
) }}

Full refresh: dbt run --full-refresh drops and rebuilds the table entirely, ignoring is_incremental().

📄 YAML / Properties

_schema.yml structure

version: 2

models:
  - name: fct_orders
    description: "One row per order."
    meta:
      owner: "data-team@company.com"
    config:
      tags: ['finance', 'daily']
    columns:
      - name: order_id
        description: "Surrogate key for orders."
        tests:
          - unique
          - not_null
      - name: status
        tests:
          - accepted_values:
              values: ['placed','shipped','returned']

Doc blocks

-- docs/orders.md
{% docs order_id %}
The unique identifier for an order.
Generated by the upstream ERP system.
{% enddocs %}

{% docs order_status %}
Lifecycle status of the order.
Possible values: **placed**, **shipped**, **returned**.
{% enddocs %}

# Reference in schema.yml
columns:
  - name: order_id
    description: "{{ doc('order_id') }}"
  - name: status
    description: "{{ doc('order_status') }}"

Exposures

exposures:
  - name: orders_dashboard
    type: dashboard       # dashboard | notebook | analysis | ml | application
    maturity: high
    url: "https://metabase.company.com/dashboard/42"
    description: "Finance orders dashboard"
    owner:
      name: "Alice"
      email: "alice@company.com"
    depends_on:
      - ref('fct_orders')
      - ref('dim_customers')

📦 Useful Packages

Package summary

Package	Best for
`dbt_utils`	Macros: star, surrogate key, date spine, pivot, tests
`dbt_expectations`	Rich data-quality tests (Great Expectations style)
`codegen`	Auto-generate YAML & base model SQL
`audit_helper`	Compare relations during refactors
`dbt_project_evaluator`	Enforce dbt best practices as tests
`elementary`	Data observability & anomaly detection

packages.yml

packages:
  - package: dbt-labs/dbt_utils
    version: [">=1.0.0", "<2.0.0"]
  - package: calogica/dbt_expectations
    version: [">=0.10.0", "<1.0.0"]
  - package: dbt-labs/codegen
    version: [">=0.12.0", "<1.0.0"]
  - package: dbt-labs/audit_helper
    version: [">=0.9.0", "<1.0.0"]

dbt_utils — highlights

-- Star: expand all columns except…
SELECT {{ dbt_utils.star(ref('stg_orders'), except=["_loaded_at"]) }}

-- Surrogate key
{{ dbt_utils.generate_surrogate_key(['order_id', 'item_id']) }} as sk

-- Date spine
{{ dbt_utils.date_spine(
    datepart="day",
    start_date="cast('2020-01-01' as date)",
    end_date="cast('2025-01-01' as date)"
) }}

-- Pivot
{{ dbt_utils.pivot('status', dbt_utils.get_column_values(ref('stg_orders'), 'status')) }}

codegen — generate boilerplate

# Generate source YAML from existing tables
dbt run-operation codegen.generate_source \
  --args '{"schema_name": "shopify", "database_name": "raw"}'

# Generate model YAML (columns + types)
dbt run-operation codegen.generate_model_yaml \
  --args '{"model_names": ["stg_orders"]}'

# Generate base model SQL from a source
dbt run-operation codegen.generate_base_model \
  --args '{"source_name": "raw_shopify", "table_name": "orders"}'

audit_helper — compare models

# Compare row counts & columns between two relations
dbt run-operation audit_helper.compare_relations \
  --args '{
    "a_relation": "ref(\"fct_orders\")",
    "b_relation": "ref(\"fct_orders_v2\")",
    "primary_key": "order_id"
  }'

# Useful during refactors to validate parity