DOC: contextflo
STATUS: ● PUBLISHED
SYSTEM CONTEXTFLOW

384 Dimensions, Zero API Cost: Local Embeddings

Why a small local model and L2 normalization are the right default for RAG.

Cover image — 384 Dimensions, Zero API Cost: Local Embeddings

The embedding step turns each chunk into a vector. ContextFlow does it locally with all-MiniLM-L6-v2: 384 dimensions, no API, no per-call cost, no data leaving the machine. For document retrieval, that’s not a compromise; it’s the right default.

// 01 — SMALL MODEL, LOCAL

all-MiniLM-L6-v2 produces 384-dimensional vectors and runs comfortably on CPU. The trade against a giant hosted model is precision you mostly don’t need for retrieval, in exchange for: zero API cost, zero rate limits, zero privacy risk, and full offline operation. For a private corpus of academic PDFs, “nothing leaves the box” is a feature you can’t buy back later.

// 02 — L2 NORMALIZATION

Embeddings are L2-normalized, which makes cosine similarity reduce to a dot product. Once every vector has length 1, the angle between two vectors (their semantic closeness) is just their dot product, cheaper to compute and exactly what the vector store wants for a cosine collection. A small math choice that makes every query faster.

// 03 — BATCHED THROUGH A PROTOCOL

All of a document’s chunks are encoded in a single batched encode() call, since self-attention parallelizes across the batch, making it far faster than one-at-a-time. And the embedder sits behind a Protocol:

@runtime_checkable
class Embedder(Protocol):
    def encode(self, texts: list[str]) -> np.ndarray: ...

Anything with an encode() method satisfies it. Swapping MiniLM for OpenAI, Cohere, or Ollama is one new class and zero changes anywhere else. (That pattern gets its own concept post.)

TAKEAWAYS

NEXT

@frogwebp brand mark
ANTHONY PENA · @FROGWEBP
I build data systems and write about everything around them, the architecture, the failures, what each one teaches me. Documenting in public since 2021: the process, not just the result.

// NEWSLETTER — THE BUILD LOG SIGNAL

When I ship something or learn something worth keeping, it lands here first — build logs, concepts, and the honest process behind them. Come along; no spam, leave anytime.