REFRESH MATERIALIZED VIEW CONCURRENTLY makes a promise: update the view without locking it, touching only the rows that actually changed. Spectrum leans on that promise hard: four analytics materialized views refresh on every pipeline run, every five minutes, while Grafana keeps reading them.
For a while, the promise was quietly broken. The refresh worked, the dashboards were correct, but every run was rewriting the entire view, holding heavier locks than it should, and taking as long as a plain non-concurrent refresh. The feature I was paying for wasn’t doing anything.
The cause was a single column. This is the entry where I explain it, because the bug is invisible until you understand how CONCURRENTLY decides what “changed” means.
// 01 — THE SETUP
Spectrum’s analytics schema is materialized views: funnels, retention, revenue, experiments. They exist so dashboards never run heavy aggregations against fact_events directly. The tradeoff is freshness: a materialized view is a snapshot, so it has to be refreshed.
CONCURRENTLY is what makes refreshing on a live system safe. A plain REFRESH takes an ACCESS EXCLUSIVE lock: every reader blocks until it finishes. CONCURRENTLY instead builds the new version alongside the old, diffs them, and applies only the delta. Readers never block. On a view that mostly doesn’t change between runs, that delta should be tiny.
// 02 — THE SYMPTOM
It wasn’t tiny. Every refresh behaved like a full rebuild:
▸ pipeline run — Phase 3
REFRESH MATERIALIZED VIEW CONCURRENTLY analytics.mv_conversion_funnels 1.9s
REFRESH MATERIALIZED VIEW CONCURRENTLY analytics.mv_retention_cohorts 2.2s
REFRESH MATERIALIZED VIEW CONCURRENTLY analytics.mv_revenue_summary 1.7s
REFRESH MATERIALIZED VIEW CONCURRENTLY analytics.mv_experiment_results 2.0s
Those numbers barely moved whether one event had arrived since the last run or ten thousand had. A truly incremental refresh should get cheaper when little changes. This one didn’t, which meant CONCURRENTLY was finding every row “changed,” every time.
// 03 — THE CULPRIT
Each MV carried a freshness column. The idea was reasonable: stamp every row with when it was last refreshed, so the health check could read it.
-- the original MV definition (abridged)
SELECT
account_id,
funnel_step,
sessions,
NOW() AS last_refreshed_at -- the problem
FROM ...
NOW() returns the current transaction’s timestamp. It is different on every single refresh. And here is the part that makes the bug invisible: CONCURRENTLY decides what changed by comparing the new rows to the old ones, value by value. Every row now carried a last_refreshed_at that differed from its previous value, so to the diff engine, every row had changed, even when nothing else about it had. The optimization had nothing left to skip. It dutifully deleted and reinserted the whole view, every run.
The column meant to observe freshness was destroying the thing that made refreshing cheap.
// 04 — THE FIX
Freshness tracking doesn’t belong inside the data it’s tracking. I pulled last_refreshed_at out of every MV and gave it its own home: one tiny table, one row per view, updated after each refresh:
-- one small write per MV per refresh, outside the view
INSERT INTO analytics.mv_refresh_log (mv_name, refreshed_at)
VALUES ('analytics.mv_conversion_funnels', NOW())
ON CONFLICT (mv_name) DO UPDATE SET refreshed_at = NOW();
The materialized views are now pure aggregations with no per-row timestamp, nothing that changes when the underlying numbers don’t. CONCURRENTLY can finally skip the rows that didn’t move, and freshness is still tracked, just in the one place that’s supposed to change every run.
TAKEAWAYS
REFRESH MATERIALIZED VIEW CONCURRENTLYdiffs row values to find changes. Anything non-deterministic inside the view (NOW(),random(),clock_timestamp()) makes every row look changed and defeats the optimization entirely.- Keep observability about your data out of the data itself. A separate
*_refresh_logtable tracks freshness without poisoning the diff. - “It works and the numbers are right” is not the same as “it works efficiently.” This bug never produced a wrong dashboard, only a slow, lock-heavy one. Those are the ones you have to go looking for.
NEXT
- Concept: REFRESH MATERIALIZED VIEW CONCURRENTLY: how it works, and how to break it (the general version of this story).
- Build log 03: the idempotent ELT pipeline that schedules these refreshes, and the partition bug that lived next door.
