Real-time detection only needs a short window of data. But the full history, every agent event forever, is worth keeping for forensics, cost analysis, and training. Phronis lands all of it in Apache Iceberg on MinIO, so the hot path stays lean and the cold path stays complete.
// 01 — THE SINK
A passthrough materialized view, mv_iceberg_staging, feeds every event into RisingWave’s native Iceberg sink, which writes Parquet files to MinIO at s3://phronis-iceberg/ on a ~6-second commit cadence. RisingWave does the hot aggregation and the cold archival from the same stream, with no separate ETL job to maintain.
// 02 — WHY ICEBERG, NOT RAW PARQUET
You could dump Parquet straight to object storage. Iceberg adds the things raw files lack: time-travel (query the table as of a past snapshot), schema evolution (add a field without rewriting history), and ACID commits (readers never see a half-written batch). For an audit trail of autonomous agents, “what did the data look like at the moment that incident happened?” is a question you want to be able to answer, and time-travel answers it directly.
// 03 — SELF-HOSTED, ZERO CLOUD
MinIO provides the S3-compatible object store locally, so the entire cold tier runs on your own hardware at zero cloud cost. (Wiring RisingWave’s Iceberg sink to MinIO was, candidly, the hardest debugging of the whole project, a multi-session credential saga that gets its own anomaly log.) The sink is also treated as non-fatal: if it fails, detection and every dashboard keep working. Cold storage degrading never takes the hot path down.
TAKEAWAYS
- Split hot and cold off the same stream. Real-time views serve detection; an Iceberg sink archives everything: one pipeline, two horizons.
- Iceberg over bare Parquet buys time-travel, schema evolution, and ACID: exactly the properties an audit trail of autonomous systems needs.
- Make the archival path non-fatal. Losing cold storage for a while is an inconvenience; taking down detection because of it is an incident.
NEXT
- Build log 06: asking your pipeline questions in plain English: the MCP server.
