← Back to Blog

The Other Half of LLM Cost: Context You Never Needed to Send

Multi-agent stacks reinvent coordination badly

Five Claude Code tabs on the same project. An overnight cron. A CI runner. A dispatched server-side agent on another host. Each one needs to see what the others are doing, avoid duplicating work, hand off findings, and recover when a tab crashes.

The default answer today is some combination of: a Slack channel a human watches, ad-hoc REST endpoints between services, "remember to message me when you're done", or a custom queue per project. None of it composes. None of it survives a tab close.

We built Kronaxis Fabric because we hit this wall building everything else. The session that built Fabric was itself five concurrent tabs on two hosts, coordinated through Fabric's own coord broadcasts. The orchestrator (one tab) posted findings. The workers (four tabs) replied. The whole audit trail lives in Postgres. No bot in a Slack channel. No Redis pub/sub. No ad-hoc queue.

The preload that grew up

One of our project repos had a CLAUDE.md that grew to 715 lines (97 KB), plus a MEMORY.md index that hit 1,050 lines (300 KB). Together they preloaded into every coding-agent session: 397 KB of background context, every turn, in every tab.

None of it was wrong. All of it was reference. Almost none of it was relevant to the specific task at hand.

At frontier model rates that preload alone costs around 1.5 pence per turn. A typical 30-turn session costs about 45 pence in preload, before the model has read a single line of new code. Multiplied across a working day, a small team, a year: tens of thousands of pounds spent on context the model did not need to see.

The filing cabinet principle

A surgeon does not read the whole filing cabinet before answering a question about a patient. They open the one file. The same logic applies to agent context.

Every preload-heavy convention (hand-curated MEMORY files, fat CLAUDE.md, sprawling system prompts) is the equivalent of reading the cabinet front-to-back to find one fact. It works, but it is unbelievably expensive at any scale that matters.

What we built

Kronaxis Fabric is a single Go binary that sits behind a Postgres database. It exposes nine HTTP endpoints (memo CRUD, hybrid search, embeddings backfill, a coord channel for cross-session events) and a thin Python MCP shim that drops into ~/.claude.json as mcp__fabric__search.

The agent calls search instead of preloading notes. The cabinet stays closed. Only the relevant file opens.

# Before
agent session preload: 397 KB of notes
   sent to model: every token, every turn

# After
agent session preload: 4 KB of pointer + the rule "use mcp__fabric__search"
   sent to model: the 3 memos that match this turn's question
   typical: 2 KB, returned in 90 ms

The cost arithmetic

PatternPreload per turnCost at $15/M inputPer 30-turn session
Hand-curated MEMORY + CLAUDE.md (typical)~92K tokens$1.38$41.40
Fabric on-demand search (3 memos avg)~500 tokens$0.0075$0.22
Saving~91.5K tokens$1.37$41.18

The numbers above are real, measured on our own repo on the day we cut over. CLAUDE.md went 715 to 172 lines (53% smaller, with 24 pitfall sections extracted as individual memos). MEMORY.md went 1,050 to 51 lines (95% smaller, replaced by a pointer to the fabric MCP).

This compounds beautifully when you also run Kronaxis Router. The router decides which model gets the request. Fabric decides what context goes into it. Together: cheaper model, smaller context, same answer quality. We saw a 100x-600x drop in per-session bills when both ran together on the easy 80% of agent work.

How the search works

Three signals get combined into one score per memo:

The defaults work for a small-team operator. The weights are config-knobs if you want to bias differently.

The other half: cross-session coordination

Memory solves "what did I learn last week". The other half of any multi-agent workflow is "what is the agent in the other tab doing right now". A second tab, a CI runner, an overnight cron: they all need to see each other's findings without polling a Slack channel a human runs.

Fabric ships a coord channel on the same Postgres database: POST /v1/coord writes a row to public.coord_messages and fires pg_notify('coord', ...); GET /v1/coord/recent lets any session poll the last N events filtered by sender, recipient, or timestamp. One Bearer token, one schema, two tables.

We used it in the session that built this. Five concurrent agent tabs (an orchestrator and four workers) coordinated on the same project: broadcasts went out, replies came back, the orchestrator tracked who was doing what. agentmemory, mem0 and Letta do not have a cross-session event bus; they only have memory. Multi-agent workflows on one project need both.

Why Postgres

Most memory-for-agents systems ship with their own storage (custom SQLite extensions, an embedded vector DB, a bespoke key-value file). That is one more thing to back up, restore, monitor, and explain to a future on-call engineer.

Fabric uses Postgres because anyone running a production stack already has it, already backs it up, and already knows what to do when something goes wrong. The schema is two tables and three indexes. SELECT * FROM fabric.memos WHERE deleted_at IS NULL is your audit trail. pg_dump is your backup. Standard Postgres replication is your DR.

pgvector + tsvector are both standard extensions that ship with most managed Postgres. Embeddings via local Ollama mean zero per-query cost and zero data leaving your box.

How this compares to alternatives

Fair warning before reading these tables: most memory-for-agents tools are memory-only. Fabric does three jobs in one binary (coord channel + memory store + orchestrator-coming v0.6). Compare on a feature they don't try to have and the comparison is by definition unfair to them. We've split the table by job-area so you can see what Fabric uniquely covers vs where alternatives match.

1. Memory (where agentmemory, mem0, Letta actually compete)

DimensionFabricagentmemorymem0Letta
SearchHybrid (cosine + tsvector + recency)BM25 + vector + graph (RRF)Vector + graphVector only
EmbeddingsLocal Ollama, free at query timeLocal Xenova, freeCloud-provider call per queryCloud-provider call per query
StoragePostgres + pgvectorSQLite + iii-engineQdrant or pgvectorPostgres + vector DB
MCP wire-up5-line stanza5-line stanzaManualManual
Auto-capture hooksNo (explicit remember calls)Yes (12 lifecycle hooks)NoLimited (agent self-edits)
Real-time viewerNo (psql is your viewer)Yes (port 3113)Cloud dashCloud dash

agentmemory wins on auto-capture and the real-time viewer. Fabric wins on stack simplicity (one binary on the Postgres you already run) and ranks evenly on retrieval quality on our 6-query A/B (tied on relevance, 2-3x faster latency).

2. Coordination + pub/sub (real competitors here are brokers, not memory tools)

DimensionFabricRedis StreamsNATS JetStreamRabbitMQRaw pg_notify
Adds a service to runNo (reuses Postgres)Yes (Redis)Yes (NATS cluster)Yes (RabbitMQ)No
Persistent event logYes (coord_messages table)Yes (XADD)Yes (JetStream)Yes (with disk queue)No (fires once)
Audit via SQLYes (SELECT * FROM coord_messages)No (custom CLI)No (nats stream)No (mgmt UI)N/A
Push notificationsYes (pg_notify trigger)Yes (XREAD BLOCK)YesYesYes
Bearer-auth same as memoryYes (one token)SeparateSeparateSeparatePostgres role
Sweet spotMulti-agent project coordHigh-throughput cache+streamMicroservice fleetsEnterprise messagingQuick hacks

For real high-throughput broker workloads (millions of msg/s) use NATS or Redis. For multi-agent project coord (events per second, not per millisecond), Fabric gives you a sufficient subset on the Postgres you already run.

3. Multi-agent orchestration (real competitors are workflow engines + agent frameworks)

DimensionFabric v0.6TemporalCrewAILangGraphAutoGen
Adds a new serviceNo (Postgres)Yes (cluster + Cassandra/ES)No (library)No (library)No (library)
Cross-language agentsYes (HTTP from anything)Yes (SDKs)Python onlyPython onlyPython only
Task graph storagePostgres (fabric.tasks)Internal events DBIn-memoryIn-memoryIn-memory
Capability-typed dispatchYesWorkflow workersRole-basedState-basedAgent types
Presence + heartbeatYes (90 s auto-offline)Yes (workers)NoNoNo
Built-in pub/subYes (same coord channel)SignalsNoNoNo
Built-in semantic memoryYes (same memo store)NoNo (DIY RAG)Some (state)No (DIY)
Sweet spotMulti-agent coding sessions on one project1000-worker production pipelinesLLM agent crewsLLM agent state machinesLLM agent conversations

Orchestrating a 1000-worker production payment pipeline? Take Temporal. Orchestrating five Claude Code tabs and a CI runner on one project? The Temporal cluster is overkill and you don't already have it running. Fabric is the small-team multi-agent-coding subset of Temporal's job, on the database you already have.

4. Code graph + symbol search (v0.5 — real competitors are code-indexing tools)

DimensionFabric v0.5Sourcegraphcodegraph (MCP)ast-grepripgrep+ctags
Self-host frictionOne binaryDocker compose / SaaSOne binaryOne binaryNative
MCP-nativeYesNo (HTTP/GQL)YesNoNo
Symbol embeddingsYes (cosine on signature)Yes (Cody)NoNoNo
Cross-symbol edges (calls / imports)YesYesYesNoLimited (ctags)
Same auth as memory + coordYes (one Bearer)SeparateDifferent MCPN/AN/A
Languages day-oneGo + Python50+ManyManyAll (lexical only)

Need cross-50-language enterprise code search? Sourcegraph wins. Want symbol search and relationships in the same Bearer-authed MCP namespace your agents already use for memory and coord? Fabric is the integrated choice.

vs CLAUDE.md / cursorrules-style preload (the default)

Free, simple, no service required. Falls over the moment the file gets to a few hundred lines because every turn re-sends the whole thing. We were the proof: 715-line CLAUDE.md preloading 17K tokens, 1,050-line MEMORY.md preloading another 75K. Fabric is what we built when we got tired of paying that bill.

The pairing with Kronaxis Router

If you already run Kronaxis Router you have decided to stop paying frontier rates for tasks a 9B model handles. Fabric closes the loop on the OTHER half of LLM cost: the context window.

Router decides which model handles a request. Fabric decides what context goes into the request. Pair them and the typical "how does X work in this codebase" turn drops from $0.60 (Opus + 120 K-token preload) to $0.001 (sovereign 7B + 2 KB of relevant memos). Same answer quality, because the model receives the three specific things it needs instead of being asked to grep the filing cabinet.

What it is not

It does not auto-capture session activity. agentmemory's lifecycle hooks do that well; we deliberately stayed narrow. You bank memos explicitly via mcp__fabric__remember when you learn something worth keeping.

It does not run an agent loop. Letta does that. Fabric is one HTTP service the agent calls; the agent runtime stays whatever you already use.

It does not unify N provider APIs (Router does that for LLM calls; LiteLLM does that more broadly).

Getting started

# Schema (one-time, in your existing Postgres)
psql -h db.example.com -U postgres -d kronaxis \
  -c "CREATE SCHEMA IF NOT EXISTS fabric; CREATE EXTENSION IF NOT EXISTS vector;"

# Pull the embedding model (one-time)
ollama pull nomic-embed-text

# Build + run
go build -o /tmp/fabric ./cmd/fabric
FABRIC_KEY=secret-bearer-token \
FABRIC_PG_DSN="postgres://user:pass@db:5432/kronaxis" \
/tmp/fabric &

# Bank a memo
curl -X POST -H "Authorization: Bearer secret-bearer-token" \
  -H "Content-Type: application/json" \
  -d '{"title":"prod db pass location","content":"in 1Password under Infra-Prod","type":"reference"}' \
  http://localhost:8201/v1/memo

# Find it later (semantic, lexical, recency, all blended)
curl -X POST -H "Authorization: Bearer secret-bearer-token" \
  -H "Content-Type: application/json" \
  -d '{"query":"where is the prod database password","top_k":1}' \
  http://localhost:8201/v1/memo/search
# returns your memo with score 0.92

The Python MCP shim is included in the repo. Drop it into ~/.claude.json under mcpServers.fabric and the agent gets mcp__fabric__search as a native tool.

Single binary. Postgres-backed. 2-3x faster than the alternatives we tested on identical queries. BSL 1.1.

GitHub: github.com/Kronaxis/kronaxis-fabric

Try Kronaxis Fabric

One Go binary. Postgres + pgvector. BSL 1.1. Stop preloading what the model never needed to see.

View on GitHub