DOC: spectrum-s
STATUS: ● PUBLISHED
SYSTEM SPECTRUM

I Built a Self-Hosted Amplitude With Pure PostgreSQL

Kicking off the Spectrum build: why I went all-in on SQL for product analytics.

Cover image — I Built a Self-Hosted Amplitude With Pure PostgreSQL

Most analytics stacks look like this: ingest events → ship them to a data warehouse (Snowflake, BigQuery, Redshift) → run dbt transforms → visualize in Grafana or Metabase. That stack works great if you’re at scale and have the budget. At hobby-project scale, you’re paying $50/month for a warehouse that holds 50,000 rows.

Spectrum is a different bet: everything in PostgreSQL. Events, dimensions, aggregations, materialized views, dashboards: all of it, one database. This is the first build log entry, where I explain what I’m building and why.

// 01 — THE PROBLEM

I wanted product analytics on ContextFlow: funnels, retention, A/B experiments. The obvious choice was Amplitude, Mixpanel, or PostHog. But I also write a build-in-public series and I wanted the analytics system to be part of the content, something I could explain, show the bugs on, and document the architecture of.

A SaaS analytics tool is a black box. Spectrum is glass.

// 02 — THE ARCHITECTURE

Ingestion: events arrive via a small FastAPI endpoint as JSON. Each event carries a type, account_id, user_id, and a properties blob. The endpoint validates and queues them into a raw_events staging table, one row per event.

Warehouse layer: a PostgreSQL schema called warehouse holds the star schema: fact_events at the center, dim_accounts, dim_users, dim_geography, dim_event_types as dimensions. fact_events is range-partitioned by month.

Analytics layer: a separate analytics schema holds materialized views: mv_conversion_funnels, mv_retention_cohorts, mv_revenue_summary, mv_experiment_results. These are refreshed concurrently every five minutes by the pipeline.

Dashboards: Grafana reads from the analytics schema. Dashboards never touch fact_events directly. They only see pre-aggregated views.

// 03 — WHY PostgreSQL ALL THE WAY DOWN

The answer is simplicity, observability, and forced learning.

Simplicity: One database, one language, one mental model. No Kafka, no dbt, no Airflow. The pipeline is a Python script that runs as a service. The entire system fits in a single docker-compose.yml.

Observability: Every intermediate state is inspectable with psql. I can SELECT into raw_events to see what arrived, SELECT into fact_events to see what was processed, SELECT into the analytics views to see what dashboards will show. Nothing is opaque.

Forced learning: Building it myself forces me to understand the pieces I’d otherwise buy: deduplication, partitioning, idempotent ELT, sessionization in SQL, A/B statistics. Every feature is a concept I have to implement and can write about.

// 04 — WHAT I’M NOT SOLVING

Spectrum is not trying to be production analytics at scale. It doesn’t support:

If you need those things, you need a different stack. Spectrum is for the zone where PostgreSQL is enough and where the learning value is worth more than the time saved by buying a SaaS.

NEXT

@frogwebp brand mark
ANTHONY PENA · @FROGWEBP
I build data systems and write about everything around them, the architecture, the failures, what each one teaches me. Documenting in public since 2021: the process, not just the result.

// NEWSLETTER — THE BUILD LOG SIGNAL

When I ship something or learn something worth keeping, it lands here first — build logs, concepts, and the honest process behind them. Come along; no spam, leave anytime.