Working Memory & Session Start

The agent's context window is its working memory — fast, volatile, and strictly size-limited. Brain doesn't manage the context window itself (that's the host agent's job), but everything Brain injects at session start consumes working memory. So Brain has to be budget-aware: it must decide what's worth loading and guarantee it never overflows.

info

This is working memory in the CoALA agent-memory model — the one type Brain deliberately delegates to the host, while taking responsibility for not bloating it.

The session-start aggregator

When a session begins, the agent makes a single call:

brain session-start --project "<current project>"

It returns one deterministic, token-budget-bounded JSON payload — replacing what used to be a hand-rolled sequence of separate recall, review, and low-confidence checks:

{
  "memory_count": 142,
  "pinned": [{ "id": "mem_…", "title": "…", "content": "…", "tokens": 48 }],
  "skills_index": [{ "name": "structured-code-review", "description": "…" }],
  "context_recall": [{ "id": "mem_…", "title": "…", "score": 0.72 }],
  "due_for_review": 3,
  "low_confidence_alerts": [],
  "budget": { "used": 1840, "cap": 3000 }
}

The agent treats pinned facts as active constraints, notes which skills_index skills exist (loading full instructions only on a match), and keeps context_recall in mind — all without dumping any of it to the user.

The budget

A separate ~/.brain/config.json (created lazily with safe defaults) caps how much each part of the payload may consume:

{
  "working_memory_budget_tokens": 3000,
  "pin_budget_tokens": 1500,
  "skills_index_budget_tokens": 800,
  "recall_budget_tokens": 700
}

Setting	Default	Caps
`working_memory_budget_tokens`	3000	The entire `session-start` payload
`pin_budget_tokens`	1500	Pinned always-present memories
`skills_index_budget_tokens`	800	The skills L0 index
`recall_budget_tokens`	700	Context-relevant recalled memories

Token estimation

Token counts use a dependency-free heuristic computed once per memory at write time and stored on its index entry — no tokenizer, no runtime dependencies, consistent with Brain's zero-dependency design.

Selection and overflow

The aggregator fills the budget in priority order: applicable pins first (by priority then strength), then the skills index, then top-ranked context recall. When a section exceeds its sub-budget, the overflow is reported in the payload rather than silently dropped, so you always know what didn't make it in.

tip

Raise working_memory_budget_tokens if you run with very large context windows and want more memory loaded up front; lower it to keep Brain's footprint minimal on smaller models.

Primacy & recency ordering

Long contexts suffer from "lost in the middle" — models attend best to the start and end of their input. The aggregator orders recalled memories so the highest-ranked land at the edges of the payload, where they're most likely to be used, rather than buried in the middle.