Skip to content

feat(memory): add livekit-memory — sub-10ms in-process semantic memory#725

Open
Piyussh01 wants to merge 2 commits into
livekit:mainfrom
Piyussh01:feat/livekit-memory
Open

feat(memory): add livekit-memory — sub-10ms in-process semantic memory#725
Piyussh01 wants to merge 2 commits into
livekit:mainfrom
Piyussh01:feat/livekit-memory

Conversation

@Piyussh01

Copy link
Copy Markdown

Problem

Voice agents need a user's context mid-turn, under a tight latency budget. A remote vector DB round-trip or a transformer query-embed on CPU (~10ms median, ~50ms p99) is enough to break a live conversation loop.

Approach

A new self-contained workspace package livekit-memory (livekit.memory) providing in-process semantic memory:

  • Static embedder (Model2VecEmbedder) — token-lookup + mean-pool, no transformer forward pass. ~0.03ms per short query.
  • Swappable index — exact BruteForceIndex (one normalized matmul, sub-ms to ~100k vectors), auto-upgrading to UsearchIndex (HNSW) past ~100k.
  • Facts vs. collection split — a small always-scanned facts namespace (pinned user profile) plus an ANN-indexed semantic collection, with a context() helper that returns one prompt-ready string per turn.

Evidence (measured, Apple M4 Pro, 384d)

Stage N=1M
Model2Vec embed 0.028ms median
usearch HNSW search (~0.95 recall) 0.27ms median / 0.54ms p99
End-to-end (embed + search) 0.17ms median / 0.31ms p99

~30x under the 10ms budget at a million vectors. Transformer embedders (fastembed MiniLM ~9.4ms median / ~53ms p99) confirm why the static embedder is required.

Scope & safety

  • New isolated package; no changes to rtc / api / protocol.
  • Pure-Python core, numpy-only required dep. model2vec / usearch / fastembed are optional extras, env-gated to Python >= 3.10.
  • Dependency-free HashingEmbedder + brute-force is the default, so tests run offline with no model downloads.
  • Passes make check (ruff format, ruff lint, mypy --strict) and adds 11 tests.

Caveats

  • Static-embedding retrieval quality is ~82% of MiniLM's; the onnx extra (CallableEmbedder + fastembed) is the opt-in escape hatch for higher recall at higher latency.
  • HNSW recall degrades under heavy churn — long-lived indices want periodic compaction.

Happy to discuss whether this belongs here vs. livekit/agents, and to trim to the pure-numpy core if preferred.

@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

devin-ai-integration[bot]

This comment was marked as resolved.

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

Open in Devin Review

digest = hashlib.blake2b(token.encode("utf-8"), digest_size=8).digest()
h = int.from_bytes(digest, "little")
idx = h % self._dims
sign = 1.0 if (h >> 1) & 1 else -1.0

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 HashingEmbedder sign bit is fully determined by bucket index, defeating signed feature hashing

In HashingEmbedder._token_index_sign, the sign is derived from bit 1 of the hash ((h >> 1) & 1), while the bucket index is h % self._dims. When dims is a power of 2 (which the default 256 is, and all test values like 64 are), h % dims equals h & (dims-1), meaning bit 1 is part of the bits that determine the bucket. This makes the sign a deterministic function of the bucket — tokens that collide into the same bucket always get the same sign, so collision contributions never cancel. Empirically verified: with 10,000 test tokens, 0 out of 256 (or 64) buckets ever see mixed signs. The fix is to use a high bit for the sign (e.g., (h >> 32) & 1), which is independent of the low bits used for the bucket.

Suggested change
sign = 1.0 if (h >> 1) & 1 else -1.0
sign = 1.0 if (h >> 32) & 1 else -1.0
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +201 to +203
def add(self, key: int, vector: np.ndarray) -> None:
# usearch upserts when a key already exists.
self._index.add(key, np.ascontiguousarray(vector, dtype=np.float32))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 UsearchIndex.add() claims upsert semantics that may not hold in all usearch versions

The comment at livekit-memory/livekit/memory/_index.py:202 says "usearch upserts when a key already exists," but in many usearch v2.x versions, Index.add() allows duplicate keys by default rather than replacing. If this assumption is wrong, _put() at livekit-memory/livekit/memory/store.py:181 would create duplicate vectors in the HNSW index when updating an existing item's text/embedding within the same namespace (since the old vector is only removed when the namespace changes, per store.py:165-167). This would cause search to potentially return stale vectors. Could not verify against the actual usearch version since it wasn't installed in the test environment. Worth confirming with a test that exercises upsert on UsearchIndex specifically.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants