dbt Cheat Sheet
Quick reference for data build tool — models, tests, Jinja, CLI & more.
📁 Project Structure
# Typical dbt project layout
my_project/
├── dbt_project.yml # project config
├── profiles.yml # connection profiles (usually ~/.dbt/)
├── packages.yml # third-party packages
├── models/
│ ├── staging/ # stg_* — raw → typed/renamed
│ │ ├── _sources.yml
│ │ └── stg_orders.sql
│ ├── intermediate/ # int_* — joins, business logic
│ └── marts/ # fct_* / dim_* — final tables
│ ├── _schema.yml
│ └── fct_orders.sql
├── tests/ # singular SQL tests
├── macros/ # reusable Jinja functions
├── seeds/ # static CSV → tables
├── snapshots/ # SCD type-2 history
├── analyses/ # ad-hoc queries (not materialised)
└── docs/ # doc blocks (*.md)
⌨️ CLI Commands
Core commands
| Command | What it does |
|---|---|
dbt run | Execute models (SQL) |
dbt test | Run all tests |
dbt build | seeds → snapshots → models → tests |
dbt compile | Compile Jinja → SQL (no execution) |
dbt seed | Load CSV seeds to warehouse |
dbt snapshot | Execute snapshots |
dbt docs generate | Build documentation site |
dbt docs serve | Serve docs locally |
dbt debug | Check connection & config |
dbt deps | Install packages |
dbt source freshness | Check source freshness against configured thresholds |
dbt clean | Delete target/ & dbt_packages/ |
Useful flags
| Flag | What it does |
|---|---|
--select / -s | Run only the specified models (name, path, tag…) |
--exclude | Exclude matching models from the selection |
--target / -t | Use a specific profile target (dev, prod, ci…) |
--threads | Number of models to build in parallel |
--full-refresh | Drop and rebuild incremental models from scratch |
--defer | Resolve unselected refs from another target (prod) instead of building them |
--state | Path to a folder containing a reference manifest.json (used by --defer and state: selector) |
--vars | Pass key/value variables at runtime: --vars '{"key": "val"}' |
--no-version-check | Skip the dbt version compatibility check |
Examples
# Build specific models on prod with more parallelism
dbt build -s stg_orders fct_orders --target prod
# Full refresh an incremental model, passing a runtime variable
dbt run -s fct_events --full-refresh --vars '{"start_date": "2024-01-01"}'
# Test a subset of models, excluding a known flaky one
dbt test -s stg_orders fct_orders --exclude stg_legacy
Slim CI pipeline
# Build only modified models and their descendants — avoids rebuilding everything
dbt build --project-dir "$DBT_PROJECT_DIR" \
--target ci \
--state state \
--defer \
--select "+state:modified+"
Prerequisite: the
state/ folder must contain the production manifest.json before running (e.g. downloaded from an artifact store at the start of the pipeline). --defer resolves unselected models from the prod target instead of rebuilding them.
🎯 Selector Syntax
Graph operators
| Syntax | Meaning |
|---|---|
model_name | Exact model |
+model_name | Model + all ancestors |
model_name+ | Model + all descendants |
+model_name+ | Full lineage graph |
2+model_name | Model + 2 parent levels |
model_name+3 | Model + 3 child levels |
@model_name | Model + ancestors + all descendants of ancestors |
Selector methods
| Method | Example |
|---|---|
tag: | tag:daily |
path: | path:models/staging |
config: | config.materialized:table |
package: | package:dbt_utils |
source: | source:raw_shopify+ |
exposure: | exposure:orders_dashboard+ |
fqn: | fqn:my_project.marts.* |
state: | state:modified+ |
result: | result:error+ |
Unions & intersections
| Syntax | Meaning |
|---|---|
-s model_a model_b | Union (space = OR) |
-s tag:daily,config.materialized:table | Intersection models tagged daily and materialized as table (, = AND) |
Examples
# Run all sources from raw_shopify and their descendants
dbt run -s source:raw_shopify+
# Build only models modified since last prod run
dbt build -s state:modified+ --state ./prod_artifacts
# Rerun models that errored in the previous run
dbt run -s result:error+ --state ./target
⚙️ Model Configuration
config() block
-- Top of any .sql model file
{{
config(
materialized = 'table',
schema = 'marts',
alias = 'orders_final',
tags = ['daily', 'finance'],
meta = {'owner': 'data_team'},
post_hook = "grant select on {{ this }} to role reporter"
)
}}
SELECT * FROM ...
Materialization types
| Type | Behaviour | Use when |
|---|---|---|
view |
Virtual — no data stored | Default, cheap staging |
table |
Full rebuild each run | Marts, performance |
incremental |
Append/merge new rows | Large fact tables |
ephemeral |
CTE, no object created | Intermediate reuse |
Tip: ephemeral models can't be
ref()-ed across separate SQL files by downstream tools — use view if you need lineage visibility.dbt_project.yml — global config
# Apply config to all models in a folder
models:
my_project:
staging:
+materialized: view
+schema: staging
marts:
+materialized: table
+tags: ['mart']
intermediate:
+materialized: ephemeral
Tip: use
+persist_docs to push model and column descriptions from schema.yml as native comments on warehouse objects (supported on Snowflake, BigQuery, Redshift…).
models:
my_project:
+persist_docs:
relation: true # model description → table comment
columns: true # column descriptions → column comments
🔗 ref() & source()
ref() — model dependencies
-- Reference another model
SELECT *
FROM {{ ref('stg_orders') }}
-- Reference a model in another project (cross-project)
FROM {{ ref('other_project', 'stg_orders') }}
-- ref() resolves to the full relation name:
-- analytics.staging.stg_orders (database.schema.table)
Always use
ref() instead of hard-coded table names — it builds the DAG and handles schema routing automatically.source() — raw data
-- _sources.yml definition first:
sources:
- name: raw_shopify
database: raw
schema: shopify
tables:
- name: orders
loaded_at_field: _fivetran_synced
freshness:
warn_after: {count: 12, period: hour}
error_after: {count: 24, period: hour}
-- Then in a model:
SELECT *
FROM {{ source('raw_shopify', 'orders') }}
# Test source freshness
dbt source freshness
✅ Testing
Generic tests (schema.yml)
| Test | What it checks |
|---|---|
unique | No duplicate values in the column |
not_null | No null values in the column |
accepted_values | All values belong to a defined list |
relationships | Every value exists in another model's column (referential integrity) |
models:
- name: fct_orders
columns:
- name: order_id
tests:
- unique
- not_null
- name: status
tests:
- accepted_values:
values: ['placed', 'shipped', 'returned']
- name: customer_id
tests:
- relationships:
to: ref('dim_customers')
field: customer_id
Singular tests
Custom SQL files in tests/ that express business-specific assertions generic tests can't cover. dbt runs the query — any row returned = failure.
-- tests/assert_no_negative_revenue.sql
SELECT order_id, revenue
FROM {{ ref('fct_orders') }}
WHERE revenue < 0
Convention: name singular tests
assert_*. More flexible than generic tests but not reusable across models.🧩 Jinja & Macros
Jinja syntax
| Tag | Purpose |
|---|---|
{{ expression }} | Output a value |
{% statement %} | Control flow (no output) |
{# comment #} | Comment (not rendered) |
- (dash) | Strip whitespace: {{- val -}} |
-- Variables
{% set my_date = '2024-01-01' %}
{% set cols = ['a', 'b', 'c'] %}
-- Conditionals
{% if target.name == 'prod' %}
WHERE is_test = false
{% endif %}
-- Loop
{% for col in cols %}
sum({{ col }}) as total_{{ col }}
{% if not loop.last %},{% endif %}
{% endfor %}
Macros
-- macros/cents_to_dollars.sql
{% macro cents_to_dollars(column_name, decimals=2) %}
round({{ column_name }} / 100.0, {{ decimals }})
{% endmacro %}
-- Usage in a model:
SELECT
order_id,
{{ cents_to_dollars('amount_cents') }} as amount_usd
FROM {{ ref('stg_orders') }}
-- Macro with run_query (meta-programming)
{% macro get_column_values(table, column) %}
{% set query %}
SELECT DISTINCT {{ column }} FROM {{ table }}
{% endset %}
{% set results = run_query(query) %}
{{ return(results.columns[0].values()) }}
{% endmacro %}
Built-in variables & functions
- target.name
- Active profile target (dev, prod…)
- target.schema
- Target schema
- this
- Current model relation (in hooks)
- env_var('VAR')
- Read environment variable
- var('name', default)
- Project variable from dbt_project.yml
- log('msg', info=True)
- Print to console
- run_started_at
- Timestamp of current run
- execute
- True during run, False during parse
📈 Incremental Models
Standard pattern
{{
config(
materialized = 'incremental',
unique_key = 'event_id', # for merge/delete+insert
on_schema_change = 'sync_all_columns'
)
}}
SELECT
event_id,
user_id,
event_at,
event_type
FROM {{ source('raw', 'events') }}
{% if is_incremental() %}
-- Only process new rows on incremental runs
WHERE event_at > (
SELECT max(event_at) FROM {{ this }}
)
{% endif %}
Incremental strategies
| Strategy | Behaviour | Needs unique_key |
|---|---|---|
append | Insert new rows only | No |
merge | Upsert on unique_key | Yes |
delete+insert | Delete matching, then insert | Yes |
insert_overwrite | Replace partitions (BigQuery) | No |
microbatch | dbt 1.9+ batch processing | No |
{{ config(
materialized = 'incremental',
incremental_strategy = 'merge',
unique_key = ['order_id', 'item_id'],
merge_update_columns = ['status', 'updated_at']
) }}
Full refresh:
dbt run --full-refresh drops and rebuilds the table entirely, ignoring is_incremental().📄 YAML / Properties
_schema.yml structure
version: 2
models:
- name: fct_orders
description: "One row per order."
meta:
owner: "data-team@company.com"
config:
tags: ['finance', 'daily']
columns:
- name: order_id
description: "Surrogate key for orders."
tests:
- unique
- not_null
- name: status
tests:
- accepted_values:
values: ['placed','shipped','returned']
Doc blocks
-- docs/orders.md
{% docs order_id %}
The unique identifier for an order.
Generated by the upstream ERP system.
{% enddocs %}
{% docs order_status %}
Lifecycle status of the order.
Possible values: **placed**, **shipped**, **returned**.
{% enddocs %}
# Reference in schema.yml
columns:
- name: order_id
description: "{{ doc('order_id') }}"
- name: status
description: "{{ doc('order_status') }}"
Exposures
exposures:
- name: orders_dashboard
type: dashboard # dashboard | notebook | analysis | ml | application
maturity: high
url: "https://metabase.company.com/dashboard/42"
description: "Finance orders dashboard"
owner:
name: "Alice"
email: "alice@company.com"
depends_on:
- ref('fct_orders')
- ref('dim_customers')
📦 Useful Packages
Package summary
| Package | Best for |
|---|---|
dbt_utils | Macros: star, surrogate key, date spine, pivot, tests |
dbt_expectations | Rich data-quality tests (Great Expectations style) |
codegen | Auto-generate YAML & base model SQL |
audit_helper | Compare relations during refactors |
dbt_project_evaluator | Enforce dbt best practices as tests |
elementary | Data observability & anomaly detection |
packages.yml
packages:
- package: dbt-labs/dbt_utils
version: [">=1.0.0", "<2.0.0"]
- package: calogica/dbt_expectations
version: [">=0.10.0", "<1.0.0"]
- package: dbt-labs/codegen
version: [">=0.12.0", "<1.0.0"]
- package: dbt-labs/audit_helper
version: [">=0.9.0", "<1.0.0"]
dbt_utils — highlights
-- Star: expand all columns except…
SELECT {{ dbt_utils.star(ref('stg_orders'), except=["_loaded_at"]) }}
-- Surrogate key
{{ dbt_utils.generate_surrogate_key(['order_id', 'item_id']) }} as sk
-- Date spine
{{ dbt_utils.date_spine(
datepart="day",
start_date="cast('2020-01-01' as date)",
end_date="cast('2025-01-01' as date)"
) }}
-- Pivot
{{ dbt_utils.pivot('status', dbt_utils.get_column_values(ref('stg_orders'), 'status')) }}
codegen — generate boilerplate
# Generate source YAML from existing tables
dbt run-operation codegen.generate_source \
--args '{"schema_name": "shopify", "database_name": "raw"}'
# Generate model YAML (columns + types)
dbt run-operation codegen.generate_model_yaml \
--args '{"model_names": ["stg_orders"]}'
# Generate base model SQL from a source
dbt run-operation codegen.generate_base_model \
--args '{"source_name": "raw_shopify", "table_name": "orders"}'
audit_helper — compare models
# Compare row counts & columns between two relations
dbt run-operation audit_helper.compare_relations \
--args '{
"a_relation": "ref(\"fct_orders\")",
"b_relation": "ref(\"fct_orders_v2\")",
"primary_key": "order_id"
}'
# Useful during refactors to validate parity