memex8 is a self-hosted memory system for AI agents. Ingest your notes, docs, and skills into organized knowledge realms. Semantic search, TurboQuant compression, and MCP integration — one Docker command.
Built for agents that need real context, not just a few-shot prompt.
Every chunk of knowledge is embedded and stored in Qdrant. Search by meaning, not keyword. Cosine similarity across your entire knowledge base.
Memories self-organize into knowledge clusters via cosine similarity. Realms grow, split, and merge organically as you ingest more context.
Near-optimal vector quantization (2.5–4 bits/channel) keeps memory lean without losing recall quality. Inspired by arXiv:2504.19874.
Idle-time maintenance: re-cluster, summarize, compress, and prune stale memories automatically. Your agent wakes up with clean context.
JSON-RPC 2.0 over stdio. Works with any MCP-compatible agent — OpenClaw, Hermes, pi.dev, or roll your own. 11 built-in memory tools.
Full CRUD, semantic search, realm management, and real-time updates via WebSocket. Build custom UIs or integrate into any workflow.
No cloud dependency. Runs entirely in Docker on your machine or server. Ollama keeps embeddings local. Your data stays yours.
Writes MEMEX8.md context files back to your project directories for seamless model context pickup. Plugs into your existing workflow.
Built for performance and correctness. Small binary, fast queries, no runtime overhead.
┌─────────────────────────────────────────────────────────────────────┐ │ DOCKER COMPOSE ← one command: docker compose up -d │ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌────────────────────────────┐ │ │ │ Qdrant │ │ memex8 │ │ Web UI (planned) │ │ │ │ (6333) │ │ Core │ │ (8080) │ │ │ │ Collections:│ │ │ │ Reddit-like cards │ │ │ │ memories ◄──┤ REST API ──►│ Realm Browser │ │ │ │ realms ◄──┤ MCP Server ──►│ 3D Force Graph (planned)│ │ │ │ quantized ◄──┤ Slumber ──►│ Admin Dashboard │ │ │ │ │ Ingester ──►│ │ │ │ │ └──────┬───────┘ └────────────────────────────┘ │ │ │ │ │ │ │ ┌──────────────┐ ┌──────────────────┐ │ │ │ │ OpenAI │ │ Ollama (optional) │ │ │ │ │ text-emb... │ │ nomic-embed-text │ │ │ │ └──────────────┘ └──────────────────┘ │ │ │ │ │ │ │ ┌─────────────┐ ┌────────────┐ ┌───────────┐ │ │ │ │ OpenClaw │ │ Hermes │ │ pi.dev │ │ │ │ │ (webhooks) │ │ (MCP) │ │ (ext) │ │ │ │ └─────────────┘ └────────────┘ └───────────┘ │ └─────────────────────────────────────────────────────────────────────┘
Three collections: memories (full-res), realms (centroids), and quantized (TurboQuant). 768d from Ollama or 1536d from OpenAI.
Trait-based embedder design. text-embedding-3-small via OpenAI (1536d) or nomic-embed-text locally via Ollama (768d). Swap providers without touching the engine.
Rust binary ships as a single ~8.4MB executable. No JVM, no Python runtime. Starts in milliseconds. Configure once, run forever.
MCP-native by design. Any MCP-compatible agent can connect.
Clone the repo, run docker compose up, and give your agent a memory that persists across sessions. OpenAI or Ollama — your call.