Knowledge & Memory

Memory System

A 35-tool MCP server that gives AI persistent memory through cognitive layers, adaptive decay, outcome-driven learning, and proactive context injection at session start.

In Plain English

AI assistants (like ChatGPT or Claude) normally forget everything between conversations. This system gives them a real memory: it watches coding sessions, remembers what worked and what failed, and automatically brings up relevant past knowledge whenever a new session starts. The AI actually gets smarter the more you use it.

Problem

The fundamental limitation of large language models is that they have no persistent memory. Every conversation starts from zero. You can spend an hour debugging a complex issue, discover the fix, and the next day the AI has no recollection it ever happened. It will suggest the same failed approaches, miss the same gotchas, and ignore the preferences you have expressed dozens of times before. For a single conversation, this is a minor inconvenience. For someone who uses AI as a core part of their daily workflow across months and years, it is a crippling limitation that compounds over time.

The Memory System was built to solve this problem at its root. Rather than patching around stateless conversations with manual context injection, it creates a genuine cognitive architecture: six distinct memory layers modeled on how human memory actually works. Working memory holds the current session state. Short-term memory retains the last 24 to 48 hours with configurable decay rates. Long-term memory stores compressed patterns and principles that have been validated through repeated success. Episodic memory preserves rich narratives of specific events ("the day we broke production"). Semantic memory holds facts about the codebase, tools, and people. And procedural memory records how to do things, capturing workflows and sequences that can be replayed later.

On top of this cognitive foundation sits a learning system that tracks every approach the AI takes, records whether it succeeded or failed, adjusts confidence scores accordingly, and automatically promotes proven strategies to "principles" while flagging repeated failures as "anti-patterns." The result is not just memory, but genuine learning: an AI that gets measurably better at its job over time, because it can recall not only what happened, but what worked and what did not.

Architecture

Scroll to explore diagram

Features

Six-Layer Cognitive Architecture

6 memory types

Modeled on how human memory actually works, the system maintains six distinct memory layers. Working memory holds the current session state in roughly 500 tokens that are always present. Short-term memory retains the last 24 to 48 hours with a configurable decay rate (default 10% per day), so recent work fades naturally unless reinforced by use. Long-term memory stores compressed patterns and principles that have been validated through repeated success, with confidence scores and instance counts. Episodic memory preserves rich narratives of specific events, tagged with emotions (frustrating, breakthrough, routine) and outcomes. Semantic memory holds facts about the codebase, tools, and people. Procedural memory records workflows and sequences that can be replayed.

Outcome-Driven Learning System

3-part pipeline

The learning system operates through three interconnected modules. The Event Detector monitors conversations using regex patterns with confidence weights across 9 event types (errors, decisions, approaches, confusion, code writing, command running, context switches, successes, failures). The Outcome Recorder tracks every approach the AI takes: successes boost confidence by +0.1, failures penalize by -0.15, creating an asymmetric incentive that requires demonstrated reliability. The Promotion Engine automatically upgrades approaches with 3+ successes to "principles" in long-term memory and flags approaches with 2+ failures as "anti-patterns" that get surfaced as warnings. User preferences are tracked separately with accumulating evidence and occurrence counts.

Adaptive Decay and Relevance

30-day half-life

Not all memories deserve equal weight. The adaptive decay system uses feedback-driven rates: memories that have been recalled and marked helpful decay at just 0.5% per day, while unhelpful memories decay at 5% per day. The base rate is 2% for unaccessed memories. Relevance scoring combines four weighted factors: recency (30%, exponential decay with 30-day half-life), recall count (30%, logarithmic scaling capped at 5 recalls), extraction quality (20%), and a pin bonus (30% for manually pinned memories). Memories that fall below the relevance threshold are archived and pruned from the search index, keeping the active memory lean and relevant without permanently deleting anything.

Proactive Session Injection

session-aware

Rather than waiting for the AI to search its memory, the injector proactively surfaces relevant context at session start. It detects the current project from the working directory using pattern matching against known project signatures. It analyzes time-of-day patterns to predict what the user typically works on at this hour. It checks for risky operations in the first message and boosts anti-pattern recall when it detects deployment, deletion, or refactoring keywords. Pinned memories are always included. Entity matches pull in context about specific tools, services, or people mentioned in the opening message. The output is compressed for token efficiency, so the AI starts every session already primed with the most relevant accumulated knowledge.

How It Works

Session Monitoring and Segmentation

The watcher processes continuously monitor JSONL log files from Claude Code and Gemini CLI sessions, detecting new entries as they are written. Raw log entries are parsed and grouped into coherent conversation segments using topic detection. The system identifies turn boundaries, extracts metadata (timestamps, tool usage, file paths), and produces structured segments ready for extraction. A Gemini-specific watcher handles the different log format from Google's CLI, normalizing everything into a common representation. The watchdog loop ensures monitoring runs continuously and restarts automatically if interrupted.

LLM Extraction and Quality Scoring

Each conversation segment is sent to a local LLM (hermes3) for structured extraction. The extractor pulls out entities (tools, files, services, people), decisions made and their rationale, errors encountered and their fixes, patterns observed, and any other knowledge worth preserving. The output is validated against a JSON schema to ensure structural correctness. A quality scorer then classifies and ranks each extraction, filtering out low-value or redundant information. The entity resolver normalizes names (ensuring "memory-watcher," "memory_watcher," and "MemoryWatcher" all map to the same entity) to prevent vault fragmentation.

Multi-Layer Storage

Extracted knowledge flows into multiple storage layers simultaneously. The embedding engine generates 768-dimensional vectors using nomic-embed-text and stores them in the embeddings database with three vector types: topical (what it is about), entity-based (who and what is involved), and error-based (what went wrong). The vault writer creates Obsidian markdown notes with validated backlinks, ensuring every wikilink points to an existing note. Knowledge is organized into topic hubs, entity pages, and an error encyclopedia. Six SQLite databases handle different concerns: embeddings for vector search, analytics for usage metrics and recall tracking, sources for provenance, temporal for timeline queries, memory for core storage, and cognitive for the six-layer memory architecture.

Learning Loop and Confidence Tracking

As the AI works, the event detector monitors conversations for 9 types of significant events using weighted regex patterns (for example, "TypeError" matches at 0.95 confidence, while generic "error" matches at 0.7). When an approach succeeds or fails, the outcome recorder adjusts confidence scores in the approaches database. The asymmetric scoring (successes add 0.1, failures subtract 0.15) means a single failure outweighs a single success, requiring genuine reliability for high confidence. The promotion engine checks whether approaches have crossed the promotion threshold: three successes elevates an approach to a principle, while two failures creates an anti-pattern. Promoted patterns are stored in the cognitive memory system and surfaced during future sessions.

Search, Decay, and Retrieval

When the AI needs to recall past knowledge, the 35-tool MCP server provides specialized search tools. memory_search performs semantic similarity matching against the embedding index with relevance weighting. memory_errors searches specifically for past error fixes. memory_live_error provides instant matching against known issues before the AI starts debugging manually. memory_context pulls relevant knowledge for specified topics or files. The adaptive decay system runs daily, reducing relevance scores for unaccessed memories while preserving frequently recalled ones. Memories that fall below threshold are archived and their embeddings pruned from the index. The validate_approach tool checks proposed strategies against anti-pattern records before the AI commits to a risky action, closing the feedback loop between learning and execution.

Tech Stack

Protocol

MCP (Model Context Protocol) with 35 tools, one of the largest custom MCP implementations

Extraction

Ollama hermes3 for structured extraction, llama3.2:1b for lightweight tasks, JSON schema validation

Embeddings

nomic-embed-text producing 768-dimensional vectors, with three embedding types (topical, entity, error)

Storage

6 SQLite databases (embeddings, analytics, sources, temporal, memory, cognitive) plus learning.db for outcomes

Output

Obsidian markdown with validated backlinks, topic hubs, entity pages, and an error encyclopedia

Modes

Live watch, backfill, consolidation, deduplication, decay, archival, and bridge sync to local-brain