The embedding model you start with is rarely the one you end with: local today, hosted tomorrow, something new next year. A Protocol makes that swap a one-class change instead of a refactor.
What a Protocol is
A typing.Protocol defines an interface by shape, not by inheritance. Anything with the right methods satisfies it, with no base class to subclass and no registration:
from typing import Protocol, runtime_checkable
@runtime_checkable
class Embedder(Protocol):
def encode(self, texts: list[str]) -> np.ndarray: ...
Any object with an encode(list[str]) -> ndarray method is an Embedder, as far as the type system and the rest of the code are concerned. This is structural typing, or “duck typing” with static checking.
Why it beats inheritance here
The rest of the pipeline depends only on the Embedder shape, never on a concrete class. So swapping all-MiniLM-L6-v2 for OpenAI, Cohere, or a local Ollama model means writing one new class with an encode() method and changing nothing else. No abstract base to import, no class hierarchy to fit into; a third-party object you don’t control can satisfy the Protocol without knowing it exists.
When to reach for it
Use a Protocol wherever you have a seam, a place one implementation might be swapped for another: embedders, storage backends, notifiers, payment providers. It keeps the dependency pointing at a shape, which is the loosest possible coupling.
Takeaway
Protocol defines interfaces by shape, so anything with the right methods fits, no inheritance required. It’s the cleanest way to build a swappable seam in Python: depend on the Protocol, and every implementation, including ones you didn’t write, drops in for free.
