dbt Cheat Sheet

Quick reference for data build tool — models, tests, Jinja, CLI & more.

📁 Project Structure

# Typical dbt project layout
my_project/
├── dbt_project.yml          # project config
├── profiles.yml             # connection profiles (usually ~/.dbt/)
├── packages.yml             # third-party packages
├── models/
│   ├── staging/             # stg_* — raw → typed/renamed
│   │   ├── _sources.yml
│   │   └── stg_orders.sql
│   ├── intermediate/        # int_* — joins, business logic
│   └── marts/               # fct_* / dim_* — final tables
│       ├── _schema.yml
│       └── fct_orders.sql
├── tests/                   # singular SQL tests
├── macros/                  # reusable Jinja functions
├── seeds/                   # static CSV → tables
├── snapshots/               # SCD type-2 history
├── analyses/                # ad-hoc queries (not materialised)
└── docs/                    # doc blocks (*.md)

⌨️ CLI Commands

Core commands
CommandWhat it does
dbt runExecute models (SQL)
dbt testRun all tests
dbt buildseeds → snapshots → models → tests
dbt compileCompile Jinja → SQL (no execution)
dbt seedLoad CSV seeds to warehouse
dbt snapshotExecute snapshots
dbt docs generateBuild documentation site
dbt docs serveServe docs locally
dbt debugCheck connection & config
dbt depsInstall packages
dbt source freshnessCheck source freshness against configured thresholds
dbt cleanDelete target/ & dbt_packages/
Useful flags
FlagWhat it does
--select / -sRun only the specified models (name, path, tag…)
--excludeExclude matching models from the selection
--target / -tUse a specific profile target (dev, prod, ci…)
--threadsNumber of models to build in parallel
--full-refreshDrop and rebuild incremental models from scratch
--deferResolve unselected refs from another target (prod) instead of building them
--statePath to a folder containing a reference manifest.json (used by --defer and state: selector)
--varsPass key/value variables at runtime: --vars '{"key": "val"}'
--no-version-checkSkip the dbt version compatibility check
Examples
# Build specific models on prod with more parallelism
dbt build -s stg_orders fct_orders --target prod

# Full refresh an incremental model, passing a runtime variable
dbt run -s fct_events --full-refresh --vars '{"start_date": "2024-01-01"}'

# Test a subset of models, excluding a known flaky one
dbt test -s stg_orders fct_orders --exclude stg_legacy
Slim CI pipeline
# Build only modified models and their descendants — avoids rebuilding everything
dbt build --project-dir "$DBT_PROJECT_DIR" \
          --target ci \
          --state state \
          --defer \
          --select "+state:modified+"
Prerequisite: the state/ folder must contain the production manifest.json before running (e.g. downloaded from an artifact store at the start of the pipeline). --defer resolves unselected models from the prod target instead of rebuilding them.

🎯 Selector Syntax

Graph operators
SyntaxMeaning
model_nameExact model
+model_nameModel + all ancestors
model_name+Model + all descendants
+model_name+Full lineage graph
2+model_nameModel + 2 parent levels
model_name+3Model + 3 child levels
@model_nameModel + ancestors + all descendants of ancestors
Selector methods
MethodExample
tag:tag:daily
path:path:models/staging
config:config.materialized:table
package:package:dbt_utils
source:source:raw_shopify+
exposure:exposure:orders_dashboard+
fqn:fqn:my_project.marts.*
state:state:modified+
result:result:error+
Unions & intersections
SyntaxMeaning
-s model_a model_bUnion (space = OR)
-s tag:daily,config.materialized:tableIntersection models tagged daily and materialized as table (, = AND)
Examples
# Run all sources from raw_shopify and their descendants
dbt run -s source:raw_shopify+

# Build only models modified since last prod run
dbt build -s state:modified+ --state ./prod_artifacts

# Rerun models that errored in the previous run
dbt run -s result:error+ --state ./target

⚙️ Model Configuration

config() block
-- Top of any .sql model file
{{
  config(
    materialized  = 'table',
    schema        = 'marts',
    alias         = 'orders_final',
    tags          = ['daily', 'finance'],
    meta          = {'owner': 'data_team'},
    post_hook     = "grant select on {{ this }} to role reporter"
  )
}}

SELECT * FROM ...
Materialization types
TypeBehaviourUse when
view Virtual — no data stored Default, cheap staging
table Full rebuild each run Marts, performance
incremental Append/merge new rows Large fact tables
ephemeral CTE, no object created Intermediate reuse
Tip: ephemeral models can't be ref()-ed across separate SQL files by downstream tools — use view if you need lineage visibility.
dbt_project.yml — global config
# Apply config to all models in a folder
models:
  my_project:
    staging:
      +materialized: view
      +schema: staging
    marts:
      +materialized: table
      +tags: ['mart']
    intermediate:
      +materialized: ephemeral
Tip: use +persist_docs to push model and column descriptions from schema.yml as native comments on warehouse objects (supported on Snowflake, BigQuery, Redshift…).
models:
  my_project:
    +persist_docs:
      relation: true   # model description → table comment
      columns:  true   # column descriptions → column comments

🔗 ref() & source()

ref() — model dependencies
-- Reference another model
SELECT *
FROM {{ ref('stg_orders') }}

-- Reference a model in another project (cross-project)
FROM {{ ref('other_project', 'stg_orders') }}

-- ref() resolves to the full relation name:
-- analytics.staging.stg_orders (database.schema.table)
Always use ref() instead of hard-coded table names — it builds the DAG and handles schema routing automatically.
source() — raw data
-- _sources.yml definition first:
sources:
  - name: raw_shopify
    database: raw
    schema: shopify
    tables:
      - name: orders
        loaded_at_field: _fivetran_synced
        freshness:
          warn_after: {count: 12, period: hour}
          error_after: {count: 24, period: hour}

-- Then in a model:
SELECT *
FROM {{ source('raw_shopify', 'orders') }}
# Test source freshness
dbt source freshness

Testing

Generic tests (schema.yml)
TestWhat it checks
uniqueNo duplicate values in the column
not_nullNo null values in the column
accepted_valuesAll values belong to a defined list
relationshipsEvery value exists in another model's column (referential integrity)
models:
  - name: fct_orders
    columns:
      - name: order_id
        tests:
          - unique
          - not_null
      - name: status
        tests:
          - accepted_values:
              values: ['placed', 'shipped', 'returned']
      - name: customer_id
        tests:
          - relationships:
              to: ref('dim_customers')
              field: customer_id
Singular tests

Custom SQL files in tests/ that express business-specific assertions generic tests can't cover. dbt runs the query — any row returned = failure.

-- tests/assert_no_negative_revenue.sql
SELECT order_id, revenue
FROM {{ ref('fct_orders') }}
WHERE revenue < 0
Convention: name singular tests assert_*. More flexible than generic tests but not reusable across models.

🧩 Jinja & Macros

Jinja syntax
TagPurpose
{{ expression }}Output a value
{% statement %}Control flow (no output)
{# comment #}Comment (not rendered)
- (dash)Strip whitespace: {{- val -}}
-- Variables
{% set my_date = '2024-01-01' %}
{% set cols = ['a', 'b', 'c'] %}

-- Conditionals
{% if target.name == 'prod' %}
  WHERE is_test = false
{% endif %}

-- Loop
{% for col in cols %}
  sum({{ col }}) as total_{{ col }}
  {% if not loop.last %},{% endif %}
{% endfor %}
Macros
-- macros/cents_to_dollars.sql
{% macro cents_to_dollars(column_name, decimals=2) %}
  round({{ column_name }} / 100.0, {{ decimals }})
{% endmacro %}

-- Usage in a model:
SELECT
  order_id,
  {{ cents_to_dollars('amount_cents') }} as amount_usd
FROM {{ ref('stg_orders') }}
-- Macro with run_query (meta-programming)
{% macro get_column_values(table, column) %}
  {% set query %}
    SELECT DISTINCT {{ column }} FROM {{ table }}
  {% endset %}
  {% set results = run_query(query) %}
  {{ return(results.columns[0].values()) }}
{% endmacro %}
Built-in variables & functions
target.name
Active profile target (dev, prod…)
target.schema
Target schema
this
Current model relation (in hooks)
env_var('VAR')
Read environment variable
var('name', default)
Project variable from dbt_project.yml
log('msg', info=True)
Print to console
run_started_at
Timestamp of current run
execute
True during run, False during parse

📈 Incremental Models

Standard pattern
{{
  config(
    materialized = 'incremental',
    unique_key   = 'event_id',       # for merge/delete+insert
    on_schema_change = 'sync_all_columns'
  )
}}

SELECT
  event_id,
  user_id,
  event_at,
  event_type
FROM {{ source('raw', 'events') }}

{% if is_incremental() %}
  -- Only process new rows on incremental runs
  WHERE event_at > (
    SELECT max(event_at) FROM {{ this }}
  )
{% endif %}
Incremental strategies
StrategyBehaviourNeeds unique_key
appendInsert new rows onlyNo
mergeUpsert on unique_keyYes
delete+insertDelete matching, then insertYes
insert_overwriteReplace partitions (BigQuery)No
microbatchdbt 1.9+ batch processingNo
{{ config(
  materialized      = 'incremental',
  incremental_strategy = 'merge',
  unique_key        = ['order_id', 'item_id'],
  merge_update_columns = ['status', 'updated_at']
) }}
Full refresh: dbt run --full-refresh drops and rebuilds the table entirely, ignoring is_incremental().

📄 YAML / Properties

_schema.yml structure
version: 2

models:
  - name: fct_orders
    description: "One row per order."
    meta:
      owner: "data-team@company.com"
    config:
      tags: ['finance', 'daily']
    columns:
      - name: order_id
        description: "Surrogate key for orders."
        tests:
          - unique
          - not_null
      - name: status
        tests:
          - accepted_values:
              values: ['placed','shipped','returned']
Doc blocks
-- docs/orders.md
{% docs order_id %}
The unique identifier for an order.
Generated by the upstream ERP system.
{% enddocs %}

{% docs order_status %}
Lifecycle status of the order.
Possible values: **placed**, **shipped**, **returned**.
{% enddocs %}
# Reference in schema.yml
columns:
  - name: order_id
    description: "{{ doc('order_id') }}"
  - name: status
    description: "{{ doc('order_status') }}"
Exposures
exposures:
  - name: orders_dashboard
    type: dashboard       # dashboard | notebook | analysis | ml | application
    maturity: high
    url: "https://metabase.company.com/dashboard/42"
    description: "Finance orders dashboard"
    owner:
      name: "Alice"
      email: "alice@company.com"
    depends_on:
      - ref('fct_orders')
      - ref('dim_customers')

📦 Useful Packages

Package summary
PackageBest for
dbt_utilsMacros: star, surrogate key, date spine, pivot, tests
dbt_expectationsRich data-quality tests (Great Expectations style)
codegenAuto-generate YAML & base model SQL
audit_helperCompare relations during refactors
dbt_project_evaluatorEnforce dbt best practices as tests
elementaryData observability & anomaly detection
packages.yml
packages:
  - package: dbt-labs/dbt_utils
    version: [">=1.0.0", "<2.0.0"]
  - package: calogica/dbt_expectations
    version: [">=0.10.0", "<1.0.0"]
  - package: dbt-labs/codegen
    version: [">=0.12.0", "<1.0.0"]
  - package: dbt-labs/audit_helper
    version: [">=0.9.0", "<1.0.0"]
dbt_utils — highlights
-- Star: expand all columns except…
SELECT {{ dbt_utils.star(ref('stg_orders'), except=["_loaded_at"]) }}

-- Surrogate key
{{ dbt_utils.generate_surrogate_key(['order_id', 'item_id']) }} as sk

-- Date spine
{{ dbt_utils.date_spine(
    datepart="day",
    start_date="cast('2020-01-01' as date)",
    end_date="cast('2025-01-01' as date)"
) }}

-- Pivot
{{ dbt_utils.pivot('status', dbt_utils.get_column_values(ref('stg_orders'), 'status')) }}
codegen — generate boilerplate
# Generate source YAML from existing tables
dbt run-operation codegen.generate_source \
  --args '{"schema_name": "shopify", "database_name": "raw"}'

# Generate model YAML (columns + types)
dbt run-operation codegen.generate_model_yaml \
  --args '{"model_names": ["stg_orders"]}'

# Generate base model SQL from a source
dbt run-operation codegen.generate_base_model \
  --args '{"source_name": "raw_shopify", "table_name": "orders"}'
audit_helper — compare models
# Compare row counts & columns between two relations
dbt run-operation audit_helper.compare_relations \
  --args '{
    "a_relation": "ref(\"fct_orders\")",
    "b_relation": "ref(\"fct_orders_v2\")",
    "primary_key": "order_id"
  }'

# Useful during refactors to validate parity