Open Source — Rust

Give your AI agents
a memory that lasts

memex8 is a self-hosted memory system for AI agents. Ingest your notes, docs, and skills into organized knowledge realms. Semantic search, TurboQuant compression, and MCP integration — one Docker command.

View on GitHub Quick Start
Zero cost per query 1536 dimensions Fully private text-embedding-3-small No API key needed No external calls
memex8 — OpenAI embeddings
# 1. Clone & configure
git clone https://github.com/Ex8-ca/memex8.git
cd memex8 && cp .env.example .env
# 2. Add your OpenAI key
echo "OPENAI_API_KEY=sk-..." >> .env
# 3. One command — everything starts (Qdrant + memex8)
docker compose up -d
# 4. Ingest your notes
docker compose exec memex8 ./memex8 ingest ./my-notes/
# 5. Search from any agent via MCP
docker compose exec memex8 ./memex8 mcp
# Or use the REST API directly
curl http://localhost:8080/search -H "Authorization: Bearer $MEMEX8_API_KEY" -d '{"query":"async Rust patterns"}'
memex8 — Ollama local embeddings
# 1. Clone & configure
git clone https://github.com/Ex8-ca/memex8.git
cd memex8 && cp config.example.toml config.toml
# 2. Enable Ollama in config.toml
# provider = "ollama" (already the default)
# 3. One command — Qdrant + memex8 + Ollama
docker compose --profile local-embeddings up -d
# 4. Ingest your notes
docker compose exec memex8 ./memex8 ingest ./my-notes/
# 5. Start the MCP server
docker compose exec memex8 ./memex8 mcp
# Zero API costs — embeddings stay on your machine
Core Features

Everything agents need to remember

Built for agents that need real context, not just a few-shot prompt.

Semantic Vector Memory

Every chunk of knowledge is embedded and stored in Qdrant. Search by meaning, not keyword. Cosine similarity across your entire knowledge base.

Auto-Discovered Realms

Memories self-organize into knowledge clusters via cosine similarity. Realms grow, split, and merge organically as you ingest more context.

TurboQuant Compression

Near-optimal vector quantization (2.5–4 bits/channel) keeps memory lean without losing recall quality. Inspired by arXiv:2504.19874.

Slumber Mode

Idle-time maintenance: re-cluster, summarize, compress, and prune stale memories automatically. Your agent wakes up with clean context.

MCP Native

JSON-RPC 2.0 over stdio. Works with any MCP-compatible agent — OpenClaw, Hermes, pi.dev, or roll your own. 11 built-in memory tools.

REST + WebSocket API

Full CRUD, semantic search, realm management, and real-time updates via WebSocket. Build custom UIs or integrate into any workflow.

Self-Hosted, Private

No cloud dependency. Runs entirely in Docker on your machine or server. Ollama keeps embeddings local. Your data stays yours.

Augment, Don't Replace

Writes MEMEX8.md context files back to your project directories for seamless model context pickup. Plugs into your existing workflow.

Under the Hood

Rust-powered architecture

Built for performance and correctness. Small binary, fast queries, no runtime overhead.

┌─────────────────────────────────────────────────────────────────────┐
  DOCKER COMPOSE  ← one command: docker compose up -d              
                                                                     
  ┌──────────────┐   ┌──────────────┐   ┌────────────────────────────┐  
   Qdrant          memex8          Web UI (planned)         
   (6333)          Core            (8080)                    
   Collections:                    Reddit-like cards        
   memories   ◄──┤ REST API  ──►│ Realm Browser          
   realms     ◄──┤ MCP Server ──►│ 3D Force Graph (planned)  
   quantized  ◄──┤ Slumber   ──►│ Admin Dashboard          
                Ingester   ──►│                         
               └──────┬───────┘   └────────────────────────────┘  
                                                            
                 ┌──────────────┐  ┌──────────────────┐           
                  OpenAI      Ollama (optional)            
                  text-emb...  nomic-embed-text            
                 └──────────────┘  └──────────────────┘           
                                                            
    ┌─────────────┐   ┌────────────┐   ┌───────────┐              
     OpenClaw       Hermes         pi.dev               
     (webhooks)     (MCP)         (ext)                
    └─────────────┘   └────────────┘   └───────────┘              
└─────────────────────────────────────────────────────────────────────┘

Qdrant Vector Storage

Three collections: memories (full-res), realms (centroids), and quantized (TurboQuant). 768d from Ollama or 1536d from OpenAI.

OpenAI or Ollama — Your Choice

Trait-based embedder design. text-embedding-3-small via OpenAI (1536d) or nomic-embed-text locally via Ollama (768d). Swap providers without touching the engine.

8.4MB Stripped Binary

Rust binary ships as a single ~8.4MB executable. No JVM, no Python runtime. Starts in milliseconds. Configure once, run forever.

Integrations

Works with your agent stack

MCP-native by design. Any MCP-compatible agent can connect.

OpenClaw

Webhook hooks + REST CLI

Hermes Agent

MCP server — 11 memory tools

pi.dev

TypeScript extension + skill files

Any MCP Agent

JSON-RPC 2.0 stdio — universal

Start building agent memory

Clone the repo, run docker compose up, and give your agent a memory that persists across sessions. OpenAI or Ollama — your call.

MIT License — Built with Rust