Open Source — Rust

Give your AI agents
a memory that lasts

memex8 is a self-hosted memory system for AI agents. Ingest your notes, docs, and skills into organized knowledge realms. Semantic search, TurboQuant compression, and MCP integration — one Docker command.

View on GitHub Quick Start

Zero cost per query 1536 dimensions Fully private text-embedding-3-small No API key needed No external calls

memex8 — OpenAI embeddings

# 1. Clone & configure

git clone https://github.com/Ex8-ca/memex8.git

cd memex8 && cp .env.example .env

# 2. Add your OpenAI key

echo "OPENAI_API_KEY=sk-..." >> .env

# 3. One command — everything starts (Qdrant + memex8)

docker compose up -d

# 4. Ingest your notes

docker compose exec memex8 ./memex8 ingest ./my-notes/

# 5. Search from any agent via MCP

docker compose exec memex8 ./memex8 mcp

# Or use the REST API directly

curl http://localhost:8080/search -H "Authorization: Bearer $MEMEX8_API_KEY" -d '{"query":"async Rust patterns"}'

memex8 — Ollama local embeddings

# 1. Clone & configure

git clone https://github.com/Ex8-ca/memex8.git

cd memex8 && cp config.example.toml config.toml

# 2. Enable Ollama in config.toml

# provider = "ollama" (already the default)

# 3. One command — Qdrant + memex8 + Ollama

docker compose --profile local-embeddings up -d

# 4. Ingest your notes

docker compose exec memex8 ./memex8 ingest ./my-notes/

# 5. Start the MCP server

docker compose exec memex8 ./memex8 mcp

# Zero API costs — embeddings stay on your machine

Core Features

Everything agents need to remember

Built for agents that need real context, not just a few-shot prompt.

Semantic Vector Memory

Every chunk of knowledge is embedded and stored in Qdrant. Search by meaning, not keyword. Cosine similarity across your entire knowledge base.

Auto-Discovered Realms

Memories self-organize into knowledge clusters via cosine similarity. Realms grow, split, and merge organically as you ingest more context.

TurboQuant Compression

Near-optimal vector quantization (2.5–4 bits/channel) keeps memory lean without losing recall quality. Inspired by arXiv:2504.19874.

Slumber Mode

Idle-time maintenance: re-cluster, summarize, compress, and prune stale memories automatically. Your agent wakes up with clean context.

MCP Native

JSON-RPC 2.0 over stdio. Works with any MCP-compatible agent — OpenClaw, Hermes, pi.dev, or roll your own. 11 built-in memory tools.

REST + WebSocket API

Full CRUD, semantic search, realm management, and real-time updates via WebSocket. Build custom UIs or integrate into any workflow.

Self-Hosted, Private

No cloud dependency. Runs entirely in Docker on your machine or server. Ollama keeps embeddings local. Your data stays yours.

Augment, Don't Replace

Writes MEMEX8.md context files back to your project directories for seamless model context pickup. Plugs into your existing workflow.

Under the Hood

Rust-powered architecture

Built for performance and correctness. Small binary, fast queries, no runtime overhead.

┌─────────────────────────────────────────────────────────────────────┐
│  DOCKER COMPOSE  ← one command: docker compose up -d              │
│                                                                     │
│  ┌──────────────┐   ┌──────────────┐   ┌────────────────────────────┐  │
│  │ Qdrant      │   │ memex8      │   │ Web UI (planned)       │  │
│  │ (6333)      │   │ Core        │   │ (8080)                  │  │
│  │ Collections:│   │             │   │ Reddit-like cards      │  │
│  │ memories   ◄──┤ REST API  ──►│ Realm Browser        │  │
│  │ realms     ◄──┤ MCP Server ──►│ 3D Force Graph (planned)│  │
│  │ quantized  ◄──┤ Slumber   ──►│ Admin Dashboard        │  │
│  │             │ Ingester   ──►│                       │  │
│  │             └──────┬───────┘   └────────────────────────────┘  │
│  │                        │                                  │
│  │               ┌──────────────┐  ┌──────────────────┐           │
│  │               │ OpenAI     │  │ Ollama (optional) │           │
│  │               │ text-emb... │  │ nomic-embed-text │           │
│  │               └──────────────┘  └──────────────────┘           │
│  │                        │                                  │
│  │  ┌─────────────┐   ┌────────────┐   ┌───────────┐              │
│  │  │ OpenClaw   │   │ Hermes     │   │ pi.dev │              │
│  │  │ (webhooks) │   │ (MCP)     │   │ (ext)  │              │
│  │  └─────────────┘   └────────────┘   └───────────┘              │
└─────────────────────────────────────────────────────────────────────┘

Qdrant Vector Storage

Three collections: memories (full-res), realms (centroids), and quantized (TurboQuant). 768d from Ollama or 1536d from OpenAI.

OpenAI or Ollama — Your Choice

Trait-based embedder design. text-embedding-3-small via OpenAI (1536d) or nomic-embed-text locally via Ollama (768d). Swap providers without touching the engine.

8.4MB Stripped Binary

Rust binary ships as a single ~8.4MB executable. No JVM, no Python runtime. Starts in milliseconds. Configure once, run forever.

Give your AI agents
a memory that lasts

Everything agents need to remember

Semantic Vector Memory

Auto-Discovered Realms

TurboQuant Compression

Slumber Mode

MCP Native

REST + WebSocket API

Self-Hosted, Private

Augment, Don't Replace

Rust-powered architecture

Qdrant Vector Storage

OpenAI or Ollama — Your Choice

8.4MB Stripped Binary

Works with your agent stack

OpenClaw

Hermes Agent

pi.dev

Any MCP Agent

Start building agent memory

Give your AI agentsa memory that lasts

Everything agents need to remember

Semantic Vector Memory

Auto-Discovered Realms

TurboQuant Compression

Slumber Mode

MCP Native

REST + WebSocket API

Self-Hosted, Private

Augment, Don't Replace

Rust-powered architecture

Qdrant Vector Storage

OpenAI or Ollama — Your Choice

8.4MB Stripped Binary

Works with your agent stack

OpenClaw

Hermes Agent

pi.dev

Any MCP Agent

Start building agent memory

Give your AI agents
a memory that lasts