Small accounts had suspiciously good retention. Not a little good. Implausibly good, the kind of numbers a low-user tenant shouldn’t post. The cohort math was right. The scope was wrong.
// 01 — THE SETUP
mv_retention_cohorts buckets users by first-seen week, then checks whether each cohort returned in W+1, W+2, and W+4. The return-check is a correlated subquery: for each cohort user, did they fire an event in the later week?
// 02 — THE SYMPTOM
A user who belonged to Account A and returned in a later week was counted toward Account B’s retention too, whenever both accounts shared cohort weeks. The fewer users an account had, the more a single cross-counted return distorted its percentage, so small tenants showed inflated, contaminated retention.
// 03 — THE CULPRIT
The return-check subquery joined on user_id and week alone:
-- a user returning on ANY account satisfied the check for ALL their accounts
WHERE fe.user_id = cohort.user_id
AND fe.event_week = cohort.target_week
No account_id filter. So “did this user return?” was answered globally, not per-tenant. A return anywhere counted everywhere.
// 04 — THE FIX
One clause, restoring tenant isolation:
WHERE fe.user_id = cohort.user_id
AND fe.event_week = cohort.target_week
AND fe.account_id = cohort.account_id -- the missing guard
Each tenant’s retention is now computed strictly within that tenant. The implausible numbers collapsed back to reality.
TAKEAWAYS
- In multi-tenant analytics, every correlated subquery needs an explicit tenant guard. The natural join key (here, user + week) is rarely enough. The account boundary has to be stated.
- Cross-tenant contamination hides in plain sight: the math is correct, only the scope is wrong, so tests on single-tenant data pass while real multi-tenant data lies.
- Suspiciously good metrics deserve the same scrutiny as suspiciously bad ones. Both are signals.
NEXT
- Concept: star schema vs snowflake: the modeling choice underneath all of this.
