beta · live on npm

Memory for AI agents,
modeled on the brain.

A hierarchical, file-system memory that decays on an Ebbinghaus curve, strengthens through recall, and consolidates during sleep. Deterministic across Claude Code, Gemini CLI, Codex CLI, and OpenCode — one brain, any model, every agent.

Get started
npmbetalicenseMITdeterministicrecall
01 What's inside

A memory architecture, not a vector store.

Hierarchical memory

/

The directory tree is the semantic structure. Browse memory in any file explorer — no opaque embeddings.

Strength & decay

/

Memories fade along an exponential forgetting curve. Recall arrests decay and pushes strength back up.

Associative network

/

Weighted edges link related memories. Recalling one activates its neighbours via spreading activation.

Spaced reinforcement

/

Longer intervals between recalls produce larger, more durable boosts. The spacing effect, by design.

Cognitive types

/

Episodic, semantic, and procedural memories each carry their own decay rate and consolidation rules.

Cross-agent

/

Claude Code, Gemini CLI, Codex CLI, and OpenCode share one store — and any LLM underneath. Switch model or agent, keep your memory. Identical scoring, deterministic recall everywhere.

Sleep & consolidation

/

A nine-phase nightly cycle: replay, consolidation, pruning, reorganization, REM-style recombination.

Sync your way — no lock-in

/

Plain files in a folder. Point BRAIN_DIR at Google Drive, Dropbox, or iCloud — or sync via git or encrypted export. No account required.

02 How it works

The lifecycle of a memory.

01

Memorize

Agents write decisions, learnings, and preferences as Markdown with YAML frontmatter. Initial strength is set by type and impact.

02

Decay

Strength decays exponentially at each memory's own rate. Episodic fades fast; procedural is sticky.

03

Recall

Deterministic scoring: TF-IDF × decayed-strength + spreading-activation × context-match. Identical results across every agent.

04

Reinforce

Recalled memories strengthen via spaced reinforcement. Longer gaps → larger boosts; the decay rate itself improves with each recall.

05

Sleep

The nine-phase cycle runs maintenance: replay, synaptic homeostasis, knowledge propagation, semantic crystallization, pruning, REM recombination, expertise detection.

Forgetting curvestrength(t) = e^(−λt)
strengthfull timelinerecall event
03 Benchmark results

Six-scenario suite for long-term agent memory.

Grounded in 2025–2026 long-term-memory evaluation methodology — LongMemEval, MemoryAgentBench, SWE-Bench-CL, Mem0 / BEAM. Cross-family LLM judge, distractor haystacks, N-arm matrix. Methodology → Live results →

27.8K
brain-real tokens / success
50.8K
context-dump tokens / success
86.3K
brain-no-pin tokens / success
ScenarioWhat it tests
A Noisy Project Folder200 memories from 6 projects — does brain find the 3 relevant ones?Retrieval under distractors · LongMemEval-S analog
B Three Sessions, One DecisionPostgres Monday, gRPC rewrite Wednesday, new resource Friday — still Postgres?Multi-session continuity · pinned-tier ablation
C The Contradiction TestTabs, then spaces, then tabs again — which version wins?Decay-weighted recency · contradiction handling
D Skill Progressive DisclosureFive skills indexed, one needed — does brain load just the one?Procedural skills (L0/L1/L2) token efficiency
E Continual CodingFive bugs in order — does bug 5 finish faster than bug 1?Forward transfer · agent writes its own memories
F AbstentionNo deployment target in memory — does the agent ask or invent?Confabulation resistance · stale-fact rejection
gemini-2.5-flash · Scenario A × 1 run
armtokenstok / successrecall@5pass
bare24.1K24.1K100%
fixture-only16.8K16.8K100%
brain-real27.8K27.8K0.33100%
brain-no-recall43.6K43.6K100%
brain-no-pin86.3K86.3K0.33100%
context-dump50.8K50.8K100%

Scenario A — Noisy Project Folder · 1 run · all arms judged by a cross-family LLM. Results in progress; numbers update as runs complete.

04 Neuroscience foundations

Every mechanism maps to a published model.

MechanismImplementation in Brain Memory
Spreading activationRecalling memory A automatically surfaces its linked neighbours B and C along weighted edges.
Hebbian learningMemories recalled together strengthen their mutual link — neurons that fire together, wire together.
Context-dependent recallMemories encoded in a similar context score higher at retrieval time.
Spacing effectLonger recall intervals produce larger, longer-lasting strength boosts.
Ebbinghaus decayExponential forgetting with per-memory decay rates set by cognitive type.
Synaptic homeostasisGlobal strength down-scaling during sleep prevents runaway inflation.
05 Research & references

Grounded in the literature.

Brain Memory's architecture and benchmark methodology draw on the academic literature on language-agent memory and evaluation. Selected citations below.

Foundations

Cognitive Architectures for Language Agents (CoALA)arXiv 2309.02427

Sumers, Yao, Narasimhan, Griffiths — the agent-memory model Brain Memory implements. Pinned tier, procedural skills, and budget-aware working memory map to CoALA's semantic / procedural / episodic decomposition.

MemGPT: Towards LLMs as Operating SystemsarXiv 2310.08560

Packer et al. — paging-style memory management that motivated budget-bounded working memory.

Generative Agents: Interactive Simulacra of Human BehaviorarXiv 2304.03442

Park et al. — recency · importance · relevance retrieval that inspired the recall scoring weights.

Ebbinghaus — Über das Gedächtnis (1885)foundational

The original forgetting curve. Brain Memory's exponential decay and spaced-reinforcement boosts follow it directly.

Memory benchmarks

LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive MemoryarXiv 2410.10813

Distractor-haystack design (Scenario A) and the rubric-based LLM judge.

MemoryAgentBench: A Unified Evaluation for Long-Term Memory AgentsarXiv 2507.05257

Four-competency framework; FactConsolidation inspired Scenario C (The Contradiction Test).

SWE-Bench-CL: Continual Learning for Coding AgentsarXiv 2507.00014

Forward transfer in continual coding — basis for Scenario E.

Mem0 / BEAM: Memory Architectures for Production AgentsarXiv 2504.19413

Tokens-per-query co-reported with accuracy — source of the tokens-per-successful-task metric.

Methodology

Preference Leakage: A Pitfall in LLM-as-a-JudgearXiv 2502.01534

Documents same-family judging cost. Brain's benchmark enforces a cross-family judge map.

When Judgment Becomes Noise: Position Bias in LLM JudgesarXiv 2509.20293

Empirical position-bias study. Brain's benchmark uses position-swap on every pairwise judgment.

LastingBench: Defending Benchmarks Against Data LeakagearXiv 2506.21614

Synthetic, decay-driven scenarios guard against memorised public-set answers.

06 Compatibility

One memory store. Every agent.

Claude Code
Anthropic
brain --claude --global
Gemini CLI
Google
brain --gemini --global
Codex CLI
OpenAI
brain --codex --global
OpenCode
Any model
brain --opencode --global
07 Quick start

Install globally, then wire your runtime.

$npm install -g brain-memory@beta

Then run brain --claude (or --gemini / --codex / --opencode, or --all) to configure your runtime(s). One store, deterministic recall, all agents.