DOC: spectrum-n
STATUS: ● PUBLISHED
SYSTEM SPECTRUM

NULL ≠ NULL: A Duplicate-Geography Bug

Why dedup silently failed, and the Postgres 15 feature that fixed it.

Cover image — NULL ≠ NULL: A Duplicate-Geography Bug

dim_geography was supposed to hold one row per location. Instead it slowly multiplied: two, three, several rows for what should have been the same place. The unique constraint that was meant to prevent this was working exactly as SQL specifies. That was the problem.

// 01 — THE SETUP

Geography is deduplicated on (country_code, region, city) with a unique constraint, and upsert_geo() relies on that constraint to return the existing geo_id on conflict. Plenty of events have a country but no region or city, for example (US, NULL, NULL).

// 02 — THE SYMPTOM

Two events both with (country='US', region=NULL, city=NULL) produced two rows. Over time, all-NULL-subdivision locations piled up, and upsert_geo() could return different geo_id values for what was semantically the same place, quietly fragmenting any analysis grouped by geography.

// 03 — THE CULPRIT

Standard SQL treats NULL as not equal to NULL for uniqueness. Two rows of (US, NULL, NULL) both pass the unique check, because NULL = NULL is not true. It’s NULL. The constraint never fires, so the upsert inserts a fresh row every time instead of conflicting onto the existing one.

// 04 — THE FIX

PostgreSQL 15 added the exact knob for this: NULLS NOT DISTINCT, which makes NULL equal to NULL for constraint purposes:

UNIQUE NULLS NOT DISTINCT (country_code, region, city)

Now (US, NULL, NULL) conflicts with itself, the upsert returns the existing row, and one place is one row again.

TAKEAWAYS

NEXT

@frogwebp brand mark
ANTHONY PENA · @FROGWEBP
I build data systems and write about everything around them, the architecture, the failures, what each one teaches me. Documenting in public since 2021: the process, not just the result.

// NEWSLETTER — THE BUILD LOG SIGNAL

When I ship something or learn something worth keeping, it lands here first — build logs, concepts, and the honest process behind them. Come along; no spam, leave anytime.