A hierarchical, file-system memory that decays on an Ebbinghaus curve, strengthens through recall, and consolidates during sleep. Deterministic across Claude Code, Gemini CLI, Codex CLI, and OpenCode — one brain, any model, every agent.
The directory tree is the semantic structure. Browse memory in any file explorer — no opaque embeddings.
Memories fade along an exponential forgetting curve. Recall arrests decay and pushes strength back up.
Weighted edges link related memories. Recalling one activates its neighbours via spreading activation.
Longer intervals between recalls produce larger, more durable boosts. The spacing effect, by design.
Episodic, semantic, and procedural memories each carry their own decay rate and consolidation rules.
Claude Code, Gemini CLI, Codex CLI, and OpenCode share one store — and any LLM underneath. Switch model or agent, keep your memory. Identical scoring, deterministic recall everywhere.
A nine-phase nightly cycle: replay, consolidation, pruning, reorganization, REM-style recombination.
Plain files in a folder. Point BRAIN_DIR at Google Drive, Dropbox, or iCloud — or sync via git or encrypted export. No account required.
Agents write decisions, learnings, and preferences as Markdown with YAML frontmatter. Initial strength is set by type and impact.
Strength decays exponentially at each memory's own rate. Episodic fades fast; procedural is sticky.
Deterministic scoring: TF-IDF × decayed-strength + spreading-activation × context-match. Identical results across every agent.
Recalled memories strengthen via spaced reinforcement. Longer gaps → larger boosts; the decay rate itself improves with each recall.
The nine-phase cycle runs maintenance: replay, synaptic homeostasis, knowledge propagation, semantic crystallization, pruning, REM recombination, expertise detection.
Grounded in 2025–2026 long-term-memory evaluation methodology — LongMemEval, MemoryAgentBench, SWE-Bench-CL, Mem0 / BEAM. Cross-family LLM judge, distractor haystacks, N-arm matrix. Methodology → Live results →
| Scenario | What it tests |
|---|---|
| A Noisy Project Folder“200 memories from 6 projects — does brain find the 3 relevant ones?” | Retrieval under distractors · LongMemEval-S analog |
| B Three Sessions, One Decision“Postgres Monday, gRPC rewrite Wednesday, new resource Friday — still Postgres?” | Multi-session continuity · pinned-tier ablation |
| C The Contradiction Test“Tabs, then spaces, then tabs again — which version wins?” | Decay-weighted recency · contradiction handling |
| D Skill Progressive Disclosure“Five skills indexed, one needed — does brain load just the one?” | Procedural skills (L0/L1/L2) token efficiency |
| E Continual Coding“Five bugs in order — does bug 5 finish faster than bug 1?” | Forward transfer · agent writes its own memories |
| F Abstention“No deployment target in memory — does the agent ask or invent?” | Confabulation resistance · stale-fact rejection |
| arm | tokens | tok / success | recall@5 | pass |
|---|---|---|---|---|
| bare | 24.1K | 24.1K | — | 100% |
| fixture-only | 16.8K | 16.8K | — | 100% |
| brain-real | 27.8K | 27.8K | 0.33 | 100% |
| brain-no-recall | 43.6K | 43.6K | — | 100% |
| brain-no-pin | 86.3K | 86.3K | 0.33 | 100% |
| context-dump | 50.8K | 50.8K | — | 100% |
Scenario A — Noisy Project Folder · 1 run · all arms judged by a cross-family LLM. Results in progress; numbers update as runs complete.
| Mechanism | Implementation in Brain Memory |
|---|---|
| Spreading activation | Recalling memory A automatically surfaces its linked neighbours B and C along weighted edges. |
| Hebbian learning | Memories recalled together strengthen their mutual link — neurons that fire together, wire together. |
| Context-dependent recall | Memories encoded in a similar context score higher at retrieval time. |
| Spacing effect | Longer recall intervals produce larger, longer-lasting strength boosts. |
| Ebbinghaus decay | Exponential forgetting with per-memory decay rates set by cognitive type. |
| Synaptic homeostasis | Global strength down-scaling during sleep prevents runaway inflation. |
Brain Memory's architecture and benchmark methodology draw on the academic literature on language-agent memory and evaluation. Selected citations below.
Sumers, Yao, Narasimhan, Griffiths — the agent-memory model Brain Memory implements. Pinned tier, procedural skills, and budget-aware working memory map to CoALA's semantic / procedural / episodic decomposition.
Packer et al. — paging-style memory management that motivated budget-bounded working memory.
Park et al. — recency · importance · relevance retrieval that inspired the recall scoring weights.
The original forgetting curve. Brain Memory's exponential decay and spaced-reinforcement boosts follow it directly.
Distractor-haystack design (Scenario A) and the rubric-based LLM judge.
Four-competency framework; FactConsolidation inspired Scenario C (The Contradiction Test).
Forward transfer in continual coding — basis for Scenario E.
Tokens-per-query co-reported with accuracy — source of the tokens-per-successful-task metric.
Documents same-family judging cost. Brain's benchmark enforces a cross-family judge map.
Empirical position-bias study. Brain's benchmark uses position-swap on every pairwise judgment.
Synthetic, decay-driven scenarios guard against memorised public-set answers.
Then run brain --claude (or --gemini / --codex / --opencode, or --all) to configure your runtime(s). One store, deterministic recall, all agents.