Anonymous events, ones with no account_name, were being filed under a real account named “Unknown.” It looked harmless. It was the most dangerous row in the warehouse.
// 01 — THE SETUP
Events arrive with an optional account_name. The pipeline upserts an account dimension and links each event to it. The original code had a reasonable-looking fallback for missing names:
account_name = payload.get("account_name") or "Unknown"
// 02 — THE SYMPTOM
Every anonymous event got upserted into dim_accounts as account_name = 'Unknown' with a real account_id. So all anonymous traffic, from everywhere, all tenants, all time, merged into a single fake account that then showed up in every dashboard as if it were a customer. Worse: any real company actually named “Unknown” would silently fuse with all of that anonymous traffic and have its metrics destroyed.
// 03 — THE CULPRIT
The or "Unknown" mapped a missing value to a present one. In a dimensional model, that’s the cardinal sin: it invents a member that doesn’t exist and attributes real events to it. A null (“we don’t know the account”) was being laundered into a fact (“the account is Unknown”).
// 04 — THE FIX
Let absence stay absent:
account_name = payload.get("account_name") or None
When account_name is None, account_id is left NULL and no dim_accounts row is created. Anonymous events are simply excluded from account-level aggregations rather than polluting a fake bucket. The dashboards now answer “conversion by account” using only events that actually have an account. That is the only honest answer.
TAKEAWAYS
- Never map “missing” to a synthetic present value.
NULLmeans “unknown”; a row named “Unknown” means “a real thing called Unknown.” Conflating them corrupts every aggregation that touches the dimension. - In dimensional models, anonymous/unattributed events belong as
NULLforeign keys, excluded from grouped metrics. Do not bucket them into a fake member. - A fallback default is a decision about truth.
or "Unknown"looks defensive; it’s actually fabricating data.
NEXT
- Anomaly log: NULL ≠ NULL: a different way the same dimension multiplied rows.
