The Other Half of LLM Cost: Context You Never Needed to Send

Multi-agent stacks reinvent coordination badly

Five Claude Code tabs on the same project. An overnight cron. A CI runner. A dispatched server-side agent on another host. Each one needs to see what the others are doing, avoid duplicating work, hand off findings, and recover when a tab crashes.

The default answer today is some combination of: a Slack channel a human watches, ad-hoc REST endpoints between services, "remember to message me when you're done", or a custom queue per project. None of it composes. None of it survives a tab close.

We built Kronaxis Fabric because we hit this wall building everything else. The session that built Fabric was itself five concurrent tabs on two hosts, coordinated through Fabric's own coord broadcasts. The orchestrator (one tab) posted findings. The workers (four tabs) replied. The whole audit trail lives in Postgres. No bot in a Slack channel. No Redis pub/sub. No ad-hoc queue.

The preload that grew up

One of our project repos had a CLAUDE.md that grew to 715 lines (97 KB), plus a MEMORY.md index that hit 1,050 lines (300 KB). Together they preloaded into every coding-agent session: 397 KB of background context, every turn, in every tab.

None of it was wrong. All of it was reference. Almost none of it was relevant to the specific task at hand.

At frontier model rates that preload alone costs around 1.5 pence per turn. A typical 30-turn session costs about 45 pence in preload, before the model has read a single line of new code. Multiplied across a working day, a small team, a year: tens of thousands of pounds spent on context the model did not need to see.

The filing cabinet principle

A surgeon does not read the whole filing cabinet before answering a question about a patient. They open the one file. The same logic applies to agent context.

Every preload-heavy convention (hand-curated MEMORY files, fat CLAUDE.md, sprawling system prompts) is the equivalent of reading the cabinet front-to-back to find one fact. It works, but it is unbelievably expensive at any scale that matters.

What we built

Kronaxis Fabric is a single Go binary that sits behind a Postgres database. It exposes nine HTTP endpoints (memo CRUD, hybrid search, embeddings backfill, a coord channel for cross-session events) and a thin Python MCP shim that drops into ~/.claude.json as mcp__fabric__search.

The agent calls search instead of preloading notes. The cabinet stays closed. Only the relevant file opens.

# Before
agent session preload: 397 KB of notes
   sent to model: every token, every turn

# After
agent session preload: 4 KB of pointer + the rule "use mcp__fabric__search"
   sent to model: the 3 memos that match this turn's question
   typical: 2 KB, returned in 90 ms

The cost arithmetic

Pattern	Preload per turn	Cost at $15/M input	Per 30-turn session
Hand-curated MEMORY + CLAUDE.md (typical)	~92K tokens	$1.38	$41.40
Fabric on-demand search (3 memos avg)	~500 tokens	$0.0075	$0.22
Saving	~91.5K tokens	$1.37	$41.18

The numbers above are real, measured on our own repo on the day we cut over. CLAUDE.md went 715 to 172 lines (53% smaller, with 24 pitfall sections extracted as individual memos). MEMORY.md went 1,050 to 51 lines (95% smaller, replaced by a pointer to the fabric MCP).

This compounds beautifully when you also run Kronaxis Router. The router decides which model gets the request. Fabric decides what context goes into it. Together: cheaper model, smaller context, same answer quality. We saw a 100x-600x drop in per-session bills when both ran together on the easy 80% of agent work.

How the search works

Three signals get combined into one score per memo:

Semantic similarity (50%): cosine distance against an Ollama-generated embedding (nomic-embed-text, 768 dimensions, stored in pgvector with an ivfflat index). A query like "database connection pool exhaustion fix" finds a memo titled "postgres max_connections raised after 503 burst" without sharing a single word.
Lexical match (30%): Postgres tsvector full-text rank, weighted A on title, B on body. Catches exact technical terms that embeddings sometimes blur over.
Recency (20%): 30-day half-life decay. A fresh half-baked note does not outrank a verified three-week-old reference unless it is much more relevant on the other two axes.

The defaults work for a small-team operator. The weights are config-knobs if you want to bias differently.

The other half: cross-session coordination

Memory solves "what did I learn last week". The other half of any multi-agent workflow is "what is the agent in the other tab doing right now". A second tab, a CI runner, an overnight cron: they all need to see each other's findings without polling a Slack channel a human runs.

Fabric ships a coord channel on the same Postgres database: POST /v1/coord writes a row to public.coord_messages and fires pg_notify('coord', ...); GET /v1/coord/recent lets any session poll the last N events filtered by sender, recipient, or timestamp. One Bearer token, one schema, two tables.

We used it in the session that built this. Five concurrent agent tabs (an orchestrator and four workers) coordinated on the same project: broadcasts went out, replies came back, the orchestrator tracked who was doing what. agentmemory, mem0 and Letta do not have a cross-session event bus; they only have memory. Multi-agent workflows on one project need both.

Why Postgres

Most memory-for-agents systems ship with their own storage (custom SQLite extensions, an embedded vector DB, a bespoke key-value file). That is one more thing to back up, restore, monitor, and explain to a future on-call engineer.

Fabric uses Postgres because anyone running a production stack already has it, already backs it up, and already knows what to do when something goes wrong. The schema is two tables and three indexes. SELECT * FROM fabric.memos WHERE deleted_at IS NULL is your audit trail. pg_dump is your backup. Standard Postgres replication is your DR.

pgvector + tsvector are both standard extensions that ship with most managed Postgres. Embeddings via local Ollama mean zero per-query cost and zero data leaving your box.

How this compares to alternatives

Fair warning before reading these tables: most memory-for-agents tools are memory-only. Fabric does three jobs in one binary (coord channel + memory store + orchestrator-coming v0.6). Compare on a feature they don't try to have and the comparison is by definition unfair to them. We've split the table by job-area so you can see what Fabric uniquely covers vs where alternatives match.

1. Memory (where agentmemory, mem0, Letta actually compete)

Dimension	Fabric	agentmemory	mem0	Letta
Search	Hybrid (cosine + tsvector + recency)	BM25 + vector + graph (RRF)	Vector + graph	Vector only
Embeddings	Local Ollama, free at query time	Local Xenova, free	Cloud-provider call per query	Cloud-provider call per query
Storage	Postgres + pgvector	SQLite + iii-engine	Qdrant or pgvector	Postgres + vector DB
MCP wire-up	5-line stanza	5-line stanza	Manual	Manual
Auto-capture hooks	No (explicit remember calls)	Yes (12 lifecycle hooks)	No	Limited (agent self-edits)
Real-time viewer	No (psql is your viewer)	Yes (port 3113)	Cloud dash	Cloud dash

agentmemory wins on auto-capture and the real-time viewer. Fabric wins on stack simplicity (one binary on the Postgres you already run) and ranks evenly on retrieval quality on our 6-query A/B (tied on relevance, 2-3x faster latency).

2. Coordination + pub/sub (real competitors here are brokers, not memory tools)

Dimension	Fabric	Redis Streams	NATS JetStream	RabbitMQ	Raw pg_notify
Adds a service to run	No (reuses Postgres)	Yes (Redis)	Yes (NATS cluster)	Yes (RabbitMQ)	No
Persistent event log	Yes (coord_messages table)	Yes (XADD)	Yes (JetStream)	Yes (with disk queue)	No (fires once)
Audit via SQL	Yes (SELECT * FROM coord_messages)	No (custom CLI)	No (nats stream)	No (mgmt UI)	N/A
Push notifications	Yes (pg_notify trigger)	Yes (XREAD BLOCK)	Yes	Yes	Yes
Bearer-auth same as memory	Yes (one token)	Separate	Separate	Separate	Postgres role
Sweet spot	Multi-agent project coord	High-throughput cache+stream	Microservice fleets	Enterprise messaging	Quick hacks

For real high-throughput broker workloads (millions of msg/s) use NATS or Redis. For multi-agent project coord (events per second, not per millisecond), Fabric gives you a sufficient subset on the Postgres you already run.

3. Multi-agent orchestration (real competitors are workflow engines + agent frameworks)

Dimension	Fabric v0.6	Temporal	CrewAI	LangGraph	AutoGen
Adds a new service	No (Postgres)	Yes (cluster + Cassandra/ES)	No (library)	No (library)	No (library)
Cross-language agents	Yes (HTTP from anything)	Yes (SDKs)	Python only	Python only	Python only
Task graph storage	Postgres (fabric.tasks)	Internal events DB	In-memory	In-memory	In-memory
Capability-typed dispatch	Yes	Workflow workers	Role-based	State-based	Agent types
Presence + heartbeat	Yes (90 s auto-offline)	Yes (workers)	No	No	No
Built-in pub/sub	Yes (same coord channel)	Signals	No	No	No
Built-in semantic memory	Yes (same memo store)	No	No (DIY RAG)	Some (state)	No (DIY)
Sweet spot	Multi-agent coding sessions on one project	1000-worker production pipelines	LLM agent crews	LLM agent state machines	LLM agent conversations

Orchestrating a 1000-worker production payment pipeline? Take Temporal. Orchestrating five Claude Code tabs and a CI runner on one project? The Temporal cluster is overkill and you don't already have it running. Fabric is the small-team multi-agent-coding subset of Temporal's job, on the database you already have.

4. Code graph + symbol search (v0.5 — real competitors are code-indexing tools)

Dimension	Fabric v0.5	Sourcegraph	codegraph (MCP)	ast-grep	ripgrep+ctags
Self-host friction	One binary	Docker compose / SaaS	One binary	One binary	Native
MCP-native	Yes	No (HTTP/GQL)	Yes	No	No
Symbol embeddings	Yes (cosine on signature)	Yes (Cody)	No	No	No
Cross-symbol edges (calls / imports)	Yes	Yes	Yes	No	Limited (ctags)
Same auth as memory + coord	Yes (one Bearer)	Separate	Different MCP	N/A	N/A
Languages day-one	Go + Python	50+	Many	Many	All (lexical only)

Need cross-50-language enterprise code search? Sourcegraph wins. Want symbol search and relationships in the same Bearer-authed MCP namespace your agents already use for memory and coord? Fabric is the integrated choice.

vs CLAUDE.md / cursorrules-style preload (the default)

Free, simple, no service required. Falls over the moment the file gets to a few hundred lines because every turn re-sends the whole thing. We were the proof: 715-line CLAUDE.md preloading 17K tokens, 1,050-line MEMORY.md preloading another 75K. Fabric is what we built when we got tired of paying that bill.

The pairing with Kronaxis Router

If you already run Kronaxis Router you have decided to stop paying frontier rates for tasks a 9B model handles. Fabric closes the loop on the OTHER half of LLM cost: the context window.

Router decides which model handles a request. Fabric decides what context goes into the request. Pair them and the typical "how does X work in this codebase" turn drops from $0.60 (Opus + 120 K-token preload) to $0.001 (sovereign 7B + 2 KB of relevant memos). Same answer quality, because the model receives the three specific things it needs instead of being asked to grep the filing cabinet.

What it is not

It does not auto-capture session activity. agentmemory's lifecycle hooks do that well; we deliberately stayed narrow. You bank memos explicitly via mcp__fabric__remember when you learn something worth keeping.

It does not run an agent loop. Letta does that. Fabric is one HTTP service the agent calls; the agent runtime stays whatever you already use.

It does not unify N provider APIs (Router does that for LLM calls; LiteLLM does that more broadly).

Getting started

# Schema (one-time, in your existing Postgres)
psql -h db.example.com -U postgres -d kronaxis \
  -c "CREATE SCHEMA IF NOT EXISTS fabric; CREATE EXTENSION IF NOT EXISTS vector;"

# Pull the embedding model (one-time)
ollama pull nomic-embed-text

# Build + run
go build -o /tmp/fabric ./cmd/fabric
FABRIC_KEY=secret-bearer-token \
FABRIC_PG_DSN="postgres://user:pass@db:5432/kronaxis" \
/tmp/fabric &

# Bank a memo
curl -X POST -H "Authorization: Bearer secret-bearer-token" \
  -H "Content-Type: application/json" \
  -d '{"title":"prod db pass location","content":"in 1Password under Infra-Prod","type":"reference"}' \
  http://localhost:8201/v1/memo

# Find it later (semantic, lexical, recency, all blended)
curl -X POST -H "Authorization: Bearer secret-bearer-token" \
  -H "Content-Type: application/json" \
  -d '{"query":"where is the prod database password","top_k":1}' \
  http://localhost:8201/v1/memo/search
# returns your memo with score 0.92

The Python MCP shim is included in the repo. Drop it into ~/.claude.json under mcpServers.fabric and the agent gets mcp__fabric__search as a native tool.

Single binary. Postgres-backed. 2-3x faster than the alternatives we tested on identical queries. BSL 1.1.

GitHub: github.com/Kronaxis/kronaxis-fabric

Try Kronaxis Fabric

One Go binary. Postgres + pgvector. BSL 1.1. Stop preloading what the model never needed to see.

View on GitHub