Most analytics stacks look like this: ingest events → ship them to a data warehouse (Snowflake, BigQuery, Redshift) → run dbt transforms → visualize in Grafana or Metabase. That stack works great if you’re at scale and have the budget. At hobby-project scale, you’re paying $50/month for a warehouse that holds 50,000 rows.
Spectrum is a different bet: everything in PostgreSQL. Events, dimensions, aggregations, materialized views, dashboards: all of it, one database. This is the first build log entry, where I explain what I’m building and why.
// 01 — THE PROBLEM
I wanted product analytics on ContextFlow: funnels, retention, A/B experiments. The obvious choice was Amplitude, Mixpanel, or PostHog. But I also write a build-in-public series and I wanted the analytics system to be part of the content, something I could explain, show the bugs on, and document the architecture of.
A SaaS analytics tool is a black box. Spectrum is glass.
// 02 — THE ARCHITECTURE
Ingestion: events arrive via a small FastAPI endpoint as JSON. Each event carries a type, account_id, user_id, and a properties blob. The endpoint validates and queues them into a raw_events staging table, one row per event.
Warehouse layer: a PostgreSQL schema called warehouse holds the star schema: fact_events at the center, dim_accounts, dim_users, dim_geography, dim_event_types as dimensions. fact_events is range-partitioned by month.
Analytics layer: a separate analytics schema holds materialized views: mv_conversion_funnels, mv_retention_cohorts, mv_revenue_summary, mv_experiment_results. These are refreshed concurrently every five minutes by the pipeline.
Dashboards: Grafana reads from the analytics schema. Dashboards never touch fact_events directly. They only see pre-aggregated views.
// 03 — WHY PostgreSQL ALL THE WAY DOWN
The answer is simplicity, observability, and forced learning.
Simplicity: One database, one language, one mental model. No Kafka, no dbt, no Airflow. The pipeline is a Python script that runs as a service. The entire system fits in a single docker-compose.yml.
Observability: Every intermediate state is inspectable with psql. I can SELECT into raw_events to see what arrived, SELECT into fact_events to see what was processed, SELECT into the analytics views to see what dashboards will show. Nothing is opaque.
Forced learning: Building it myself forces me to understand the pieces I’d otherwise buy: deduplication, partitioning, idempotent ELT, sessionization in SQL, A/B statistics. Every feature is a concept I have to implement and can write about.
// 04 — WHAT I’M NOT SOLVING
Spectrum is not trying to be production analytics at scale. It doesn’t support:
- Millions of events per day (partitioned PostgreSQL saturates somewhere around tens of millions before you need a columnar store)
- Real-time streaming (five-minute materialized view refresh is the latency floor)
- Multi-database federation or external tables
If you need those things, you need a different stack. Spectrum is for the zone where PostgreSQL is enough and where the learning value is worth more than the time saved by buying a SaaS.
NEXT
- Build log 02: the JSONB staging pattern that keeps raw events unmodified while the warehouse processes them.
