What is Brain Memory?

Brain Memory is a hierarchical, file-system-based memory system for AI coding agents. Modeled on human neuroscience, memories decay on an Ebbinghaus curve, strengthen through recall, connect via associative networks, and consolidate during a sleep cycle.

Which AI agents does Brain Memory support?

The universal path is the hosted MCP connector (https://mcp.brainmemory.ai/mcp) — one connector reaches Claude Code, OpenAI Codex CLI, OpenCode, GitHub Copilot CLI, Kilo, the Claude.ai apps, ChatGPT, Google Antigravity, OpenClaw, and Hermes Agent. There's also a free local-first native plugin for Claude Code, Codex, OpenCode, Copilot CLI, Kilo, and Antigravity (experimental), plus dedicated memory-engine integrations for OpenClaw and Hermes. A deterministic recall engine produces identical scoring across every agent and model — one brain, any model, every agent.

Is Brain Memory free and open source?

Yes. Brain Memory is free and open source, published on npm as brain-memory and developed in the open by Omelas on GitHub.

Where are my memories stored?

All memories live in a single global ~/.brain/ directory in your home folder as human-readable Markdown files with YAML frontmatter. There is no database and no server — the file system is the database, so it is fully browseable, Git-friendly, and portable.

How is Brain Memory different from a vector database?

Instead of opaque embeddings in a vector store, Brain Memory uses transparent Markdown files scored by a deterministic engine that combines TF-IDF relevance, neuroscience-inspired strength and decay, spreading activation across an associative network, and context-dependent recall. It requires no runtime dependencies and is readable by both humans and agents.

Brain Memory — Memory for AI Agents

01 What's inside

A memory architecture, not a vector store.

Hierarchical memory

The directory tree is the semantic structure. Browse memory in any file explorer — no opaque embeddings.

Strength & decay

Memories fade along an exponential forgetting curve. Recall arrests decay and pushes strength back up.

Associative network

Weighted edges link related memories. Recalling one activates its neighbours via spreading activation.

Spaced reinforcement

Longer intervals between recalls produce larger, more durable boosts. The spacing effect, by design.

Cognitive types

Episodic, semantic, and procedural memories each carry their own decay rate and consolidation rules.

Cross-agent

Claude Code, Codex CLI, OpenCode, Copilot CLI, Kilo, and Antigravity share one store — plus the Claude and ChatGPT apps via the hosted MCP connector, OpenClaw and Hermes as native memory engines, and any LLM underneath. Switch model or agent, keep your memory. Identical scoring, deterministic recall everywhere.

Sleep & consolidation

A nine-phase nightly cycle: replay, consolidation, pruning, reorganization, REM-style recombination.

Sync your way — no lock-in

Plain files in a folder. Point BRAIN_DIR at Google Drive, Dropbox, or iCloud — or sync via git or encrypted export. No account required.

02 How it works

The lifecycle of a memory.

Memorize

Agents write decisions, learnings, and preferences as Markdown with YAML frontmatter. Initial strength is set by type and impact.

Decay

Strength decays exponentially at each memory's own rate. Episodic fades fast; procedural is sticky.

Recall

Deterministic scoring: TF-IDF × decayed-strength + spreading-activation × context-match. Identical results across every agent.

Reinforce

Recalled memories strengthen via spaced reinforcement. Longer gaps → larger boosts; the decay rate itself improves with each recall.

Sleep

The nine-phase cycle runs maintenance: replay, synaptic homeostasis, knowledge propagation, semantic crystallization, pruning, REM recombination, expertise detection.

Forgetting curvestrength(t) = e^(−λt)

strengthfull timelinerecall event

03 Benchmark results

Six-scenario suite for long-term agent memory.

Grounded in 2025–2026 long-term-memory evaluation methodology — LongMemEval, MemoryAgentBench, SWE-Bench-CL, Mem0, BEAM. DeepSeek V4 Pro under test, graded by a cross-family judge panel (Gemini + Gemma-4 + Qwen-3.5, majority vote), distractor haystacks, floor-to-oracle arm matrix. Methodology → Live results →

100%

brain pass-rate on the 1000-distractor haystack — where BM25 & vector retrievers both score 0%

3,199

brain tokens / success on Scenario A — the leanest of every passing arm

−31%

tokens vs dumping all skills — load only the one needed (Scenario D)

Scenario	What it tests
A Noisy Project Folder“200 memories from 6 projects — does brain find the 3 relevant ones?”	Retrieval under distractors · LongMemEval-S analog
B Three Sessions, One Decision“Postgres Monday, gRPC rewrite Wednesday, new resource Friday — still Postgres?”	Multi-session continuity · pinned-tier ablation
C The Contradiction Test“Tabs, then spaces, then tabs again — which version wins?”	Decay-weighted recency · contradiction handling
D Skill Progressive Disclosure“Five skills indexed, one needed — does brain load just the one?”	Procedural skills (L0/L1/L2) token efficiency
E Continual Coding“Five bugs in order — does bug 5 finish faster than bug 1?”	Forward transfer · agent writes its own memories
F Abstention“No deployment target in memory — does the agent ask or invent?”	Confabulation resistance · stale-fact rejection

Scenario A — retrieval under a 1000-distractor haystack

arm	tok / success	recall@5	pass
no-memory (floor)	—	—	0%
vector (embeddings)	—	0.00	0%
keyword (BM25)	—	0.33	0%
brain-full	3,199	0.67	100%
brain-no-pin	5,135	0.33	67%
oracle (ceiling)	3,572	1.00	100%
context-dump 8k	5,856	—	100%
context-dump 60k	20,689	—	100%

Brain is the only retriever whose memories let the model succeed under heavy noise — BM25 (R@5 0.33) and vector embeddings (R@5 0.00) both fail outright — and it passes at the lowest tokens-per-success of any passing arm. Turning the pinned tier off drops it to 67%.

Agent under test: DeepSeek V4 Pro (single-shot) · graded by a cross-family panel — Gemini + Gemma-4 + Qwen-3.5, majority vote · 3 runs per arm. At n=3 the token gaps are directional, not yet statistically significant; the pass-rate gradient is the result.

04 Neuroscience foundations

Every mechanism maps to a published model.

Mechanism	Implementation in Brain Memory
Spreading activation	Recalling memory A automatically surfaces its linked neighbours B and C along weighted edges.
Hebbian learning	Memories recalled together strengthen their mutual link — neurons that fire together, wire together.
Context-dependent recall	Memories encoded in a similar context score higher at retrieval time.
Spacing effect	Longer recall intervals produce larger, longer-lasting strength boosts.
Ebbinghaus decay	Exponential forgetting with per-memory decay rates set by cognitive type.
Synaptic homeostasis	Global strength down-scaling during sleep prevents runaway inflation.

05 Research & references

Grounded in the literature.

Brain Memory's architecture and benchmark methodology draw on the academic literature on language-agent memory and evaluation. Selected citations below.

Foundations

Cognitive Architectures for Language Agents (CoALA)arXiv 2309.02427

Sumers, Yao, Narasimhan, Griffiths — the agent-memory model Brain Memory implements. Pinned tier, procedural skills, and budget-aware working memory map to CoALA's semantic / procedural / episodic decomposition.

MemGPT: Towards LLMs as Operating SystemsarXiv 2310.08560

Packer et al. — paging-style memory management that motivated budget-bounded working memory.

Generative Agents: Interactive Simulacra of Human BehaviorarXiv 2304.03442

Park et al. — recency · importance · relevance retrieval that inspired the recall scoring weights.

Ebbinghaus — Über das Gedächtnis (1885)foundational

The original forgetting curve. Brain Memory's exponential decay and spaced-reinforcement boosts follow it directly.

Memory benchmarks

LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive MemoryarXiv 2410.10813

Distractor-haystack design (Scenario A) and the rubric-based LLM judge.

MemoryAgentBench: A Unified Evaluation for Long-Term Memory AgentsarXiv 2507.05257

Four competencies — accurate retrieval, test-time learning, long-range understanding, selective forgetting. FactConsolidation inspired Scenario C (The Contradiction Test).

SWE-Bench-CL: Continual Learning for Coding AgentsarXiv 2507.00014

Forward transfer in continual coding — basis for Scenario E.

Mem0: Scalable Long-Term Memory for Production AI AgentsarXiv 2504.19413

Tokens-per-query co-reported with accuracy — source of the tokens-per-successful-task metric.

BEAM: Benchmarking Long-Term Memory up to 10M TokensarXiv 2510.27246

Structured memory beats a 10M-token context window, and the gap widens with scale — memory still matters past long context.

Methodology

Preference Leakage: A Pitfall in LLM-as-a-JudgearXiv 2502.01534

Same-family judging inflates scores (~24% leakage vs ~3% cross-family). The benchmark judges DeepSeek with a panel of non-DeepSeek families.

Replacing Judges with Juries: a Panel of LLM Evaluators (PoLL)arXiv 2404.18796

A panel of smaller, disjoint-family judges beats a single large judge on human agreement, with less intra-model bias — the basis for the Gemini + Gemma-4 + Qwen-3.5 panel.

Judging LLM-as-a-Judge with MT-Bench and Chatbot ArenaarXiv 2306.05685

Documents position, verbosity, and self-enhancement bias; origin of the swap-consistency check the benchmark applies to pairwise judgments.

When Judgment Becomes NoisearXiv 2509.20293

Judge verdicts carry large unexplained variance — report uncertainty rather than aggregate it away. Why results are published with n=3 error bars, not bare point estimates.

LastingBench: Defending Benchmarks Against Data LeakagearXiv 2506.21614

Synthetic, decay-driven scenarios guard against memorised public-set answers.

06 Compatibility

One memory store. Every agent.

The universal path is the hosted MCP connector — one streamable-HTTP + OAuth endpoint that reaches every host below. Prefer to stay local? The free native plugin writes to~/.brain/ for the current CLIs.

MCPhttps://mcp.brainmemory.ai/mcp

Claude Code

Anthropic

claude mcp add --transport http brain …/mcp

Codex CLI

OpenAI

codex mcp add brain --url …/mcp

OpenCode

Any model

opencode.json → mcp remote url

Copilot CLI

GitHub

mcp-config.json → type http + url

Kilo

Any model

kilo.jsonc → mcp remote url

Claude apps

Anthropic

Settings → Connectors → Add (URL + OAuth)

ChatGPT

OpenAI

Settings → Connectors → custom (paid / Dev Mode)

Antigravity

Google

mcp_config.json → serverUrl

Or run it local-first.

npm i -g brain-memory@beta && brain installs the free native plugin — slash commands and prompt sections wired into the agent's own config, all pointing at a single~/.brain/.

Claude Code

Anthropic

brain --claude --global

Codex CLI

OpenAI

brain --codex --global

OpenCode

Any model

brain --opencode --global

Copilot CLI

GitHub

brain --copilot --global

Kilo

Any model

brain --kilo --global

Antigravity *

Google

brain --antigravity --global

* Antigravity native support is experimental. For everything else — the Claude.ai apps and ChatGPT included — use the MCP connector above.

07 Quick start

Install globally, then wire your runtime.

$npm install -g brain-memory@beta

Then run brain --claude (or --codex / --opencode / --antigravity, or --all) to configure your runtime(s). For the Claude and ChatGPT apps, add the MCP connector instead. One store, deterministic recall, every agent.