PROJECT CODEX

Three systems, shipped and operational. Each codex is the complete build record: architecture decisions, anomalies, fixes, and the reasoning behind every one. This is the portfolio, written as documentation.

INDEX: CODEX_ROOT
SYSTEMS: 3 OPERATIONAL
SCAN: 2026-06-12
3
SYSTEMS SHIPPED
20
BUILD LOGS
92
TESTS GREEN
19
CONTAINERS

// SYSTEM 01 — SPECTRUM

01

PROJECT CODEX · ANALYTICS WAREHOUSE

SPECTRUM ● OPERATIONAL 32/32 TESTS · CI GREEN

A self-hosted Amplitude / Mixpanel built entirely on PostgreSQL 16 and async Python, with no Kafka, no Spark, no ClickHouse. Clickstream events land in a JSONB staging zone in under 5ms, an idempotent ELT pipeline claims them with FOR UPDATE SKIP LOCKED, upserts five dimensions, inserts into a monthly-partitioned fact table, and refreshes four analytics materialized views: funnels, retention cohorts, revenue, A/B experiment results with Wilson confidence intervals. Grafana reads the analytics schema directly.

INGEST <5MS
──→
JSONB STAGING
──→
ELT
──→
STAR SCHEMA
──→
ANALYTICS MVs
Grafana :3000
Codex 100% written · 7 build logs Started Apr 27 · last updated Jun 11, 2026
PostgreSQL 16 Star Schema FastAPI asyncpg Alembic Grafana Podman

// SYSTEM 02 — PHRONIS

02

PROJECT CODEX · AI AGENT OBSERVABILITY

PHRONIS ● OPERATIONAL 11 CONTAINERS · K8S READY

Real-time observability and circuit breaking for AI agents. When an agent misbehaves through tool-call storms, runaway spend, or output drift, streaming SQL detects it in under 500ms and trips a circuit breaker that halts the agent in ~600ms total. Existing tools batch their aggregations every 30–60 seconds; by then a runaway agent has made thousands of calls. Redpanda ingests at ~10ms p99, RisingWave TUMBLE windows detect, Iceberg on MinIO keeps cold history, and an MCP server lets Claude query it all in plain English.

@AGENT
──→
REDPANDA
──→
RISINGWAVE
──→
ALERTS
──→
BREAKER ~600MS
Grafana · Streamlit · MCP
Codex 100% written · 7 build logs Started Apr 30 · last updated Jun 05, 2026
Redpanda RisingWave Iceberg MinIO Schema Registry MCP Kubernetes

// SYSTEM 03 — CONTEXTFLOW

03

PROJECT CODEX · SEMANTIC SEARCH ETL

CONTEXTFLOW ● OPERATIONAL 60/60 TESTS · CI GREEN

An ETL pipeline that turns multilingual PDFs into a local semantic index. Built for the hard case: LaTeX-compiled academic papers, IPA symbols, broken unicode. Pages stream through extraction and cleaning, chunk at 512 chars with overlap, embed into 384-dim vectors locally (zero API cost), and upsert into ChromaDB with deterministic SHA-256 IDs; re-running never duplicates. Query in plain English, get ranked answers with page-level provenance in under 2 seconds. Airflow orchestrates; Streamlit serves the RAG dashboard.

EXTRACT
──→
CLEAN + CHUNK
──→
EMBED 384-DIM
──→
CHROMADB
──→
QUERY <2S
Airflow · Streamlit
Codex 100% written · 6 build logs Started May 04 · last updated Jun 02, 2026
ChromaDB sentence-transformers pypdf LangChain Airflow Streamlit structlog

// REFERENCE — CONCEPTS INDEX

REF

REFERENCE DOCS · STANDALONE

CONCEPTS 12 REFERENCES

Standalone engineering references: star schemas, streaming vs batch, idempotency patterns, vector embeddings. Written from the systems above, useful outside any of them. Filter the log by CONCEPT to browse.

Star Schema Streaming Idempotency Embeddings