Synapse is the managed vector database built for RAG. Store a billion embeddings, run hybrid search in single-digit milliseconds, and ship answers grounded in fresh data — without operating a search cluster or tuning an index at 2 a.m.
from synapse import Synapse
db = Synapse("prod-knowledge-base")
hits = db.search(
query="how do I rotate an API key?",
top_k=8,
hybrid=True, # dense + keyword
rerank="synapse-rerank-v2",
filter={"workspace": "acme", "acl": user.groups},
)
# → 8 chunks · recall@8 0.97 · 19msRetrieval infrastructure for teams shipping AI in production
Most vector stores demo beautifully and fall over in production — stale data, recall cliffs, runaway bills. Synapse is engineered for the failure modes you actually hit.
Dense vectors find meaning; sparse keyword search finds the exact part number, error code, or name. Synapse fuses both and reranks in one call, so retrieval stops missing the obvious.
New documents are queryable in under a second. No nightly reindex, no waiting on a rebuild — your assistant answers from what shipped this morning, not last week.
A tuned HNSW graph holds 95%+ recall even at a billion vectors, and every response reports its own recall — so you measure retrieval quality instead of hoping for it.
Pre-filter by tenant, ACL, language, or recency without wrecking latency. Permissions are enforced inside the query, so users never retrieve a chunk they shouldn't see.
Plug in OpenAI, Cohere, Voyage, or your own model. Synapse stores any dimensionality and lets you re-embed a collection in the background without downtime.
Quantization and tiered storage cut memory up to 75%, and per-collection spend caps mean a traffic spike shows up on your dashboard, not on a surprise invoice.
The numbers behind the index
From the first embedding to a billion-scale, multi-tenant index — chunking, storage, search, and reranking unified under a single typed API and one billing line.
Stream documents in and Synapse chunks, embeds, and indexes them for you — with deduplication, versioning, and automatic backfill when you change your embedding model. The pipeline you'd otherwise build and maintain is just an upload.
A cross-encoder reranker lifts the most relevant chunks to the top of every result set, turning good recall into precise context windows.
Isolate thousands of customers in one cluster with namespace-level routing and quotas — no noisy-neighbor latency.
Replicas and shards rebalance themselves as your corpus and traffic grow. Scale to a billion vectors without a migration.
Continuous snapshots let you roll any collection back to a moment before a bad ingest — your index is as recoverable as your database.
Idempotent writes, typed clients, and a local emulator that matches prod. Wire up retrieval in an afternoon and retire the search cluster for good.
First-class SDKs for TypeScript, Python, and Go — upsert, search, and filter through a single consistent, idempotent surface.
A Docker emulator with full search parity runs in CI and on your laptop, so retrieval tests don't depend on a live cluster.
Bulk-load millions of vectors with backpressure handling and get a guaranteed-delivery webhook the moment indexing completes.
Every response returns latency, recall, and which stage fired, so you debug retrieval quality with evidence rather than guesswork.
The same hybrid search and reranking power support bots, internal copilots, and agents — each one tuned with filters and namespaces, none of them needing a separate stack.
Ground answers in current help center articles and ticket history. Sub-second freshness means a docs update is in the bot's replies minutes later, not after the next reindex.
Search wikis, runbooks, and Slack with ACL filters applied in the query, so each employee retrieves only the documents they're cleared to see.
Hybrid search pairs semantic intent with exact symbol matches, so the assistant surfaces the precise function signature, not a near-match that merely sounds related.
Give long-running agents durable, queryable memory and namespace-isolated tool corpora that stay fast as the number of agents climbs.
“We ripped out a self-hosted cluster two engineers babysat full-time. Synapse cut our p99 from 140ms to 18ms and gave that headcount back to the product.”
“Hybrid search plus reranking took our answer accuracy from 'usually right' to 'right enough to ship support deflection.' Our containment rate jumped eleven points.”
“Sub-second freshness is the feature. Docs published at 9am are in the assistant's answers by 9:01 — no reindex job, no cron, no pager.”
Usage-based pricing with no per-seat fees and no annual lock-in. Start on a free index and scale linearly as your corpus grows.
For prototypes and side projects.
For products in production.
For billion-scale, regulated workloads.
Synapse is the same idea without the operations: no shard rebalancing, no index tuning, no memory firefights. Hybrid search, reranking, freshness, and multi-tenancy come built in, backed by an SLA — so your engineers ship features instead of nursing a cluster.
Yes. Bring vectors from OpenAI, Cohere, Voyage, or any in-house model at any dimensionality. Synapse can also call your embedding endpoint during managed ingestion and re-embed a collection in the background when you upgrade models.
A tuned HNSW graph with optional scalar and product quantization keeps memory low and recall high, while auto-scaling shards and replicas distribute query load. p99 stays under 25ms at a billion vectors in our published benchmarks.
Filters and ACLs are enforced inside the query path, so a user can never retrieve a chunk outside their scope. Namespaces isolate tenants with their own quotas, and Enterprise adds VPC peering and data residency.
Continuous snapshots give every collection point-in-time recovery. Roll back to the moment before a bad load, or replay a versioned ingest — your index is as recoverable as a managed database.
Spin up a free index, upsert your first vectors, and run a query in minutes. No cluster to provision, no sales call to start.