Synapse — Managed vector database for RAG pipelines

Synapse

Vector search, fully managed

Synapse is the managed vector database built for RAG. Store a billion embeddings, run hybrid search in single-digit milliseconds, and ship answers grounded in fresh data — without operating a search cluster or tuning an index at 2 a.m.

p99 under 25ms
Sub-second freshness
99.99% uptime SLA

query.py

from synapse import Synapse

db = Synapse("prod-knowledge-base")

hits = db.search(
    query="how do I rotate an API key?",
    top_k=8,
    hybrid=True,            # dense + keyword
    rerank="synapse-rerank-v2",
    filter={"workspace": "acme", "acl": user.groups},
)

# → 8 chunks · recall@8 0.97 · 19ms

Retrieval infrastructure for teams shipping AI in production

Lattice AICortexaNorthbridgeHel/ioParsecVerdaSundialMeridianLattice AICortexaNorthbridgeHel/ioParsecVerdaSundialMeridian

Why teams move to Synapse

A search layer builtfor the way RAG breaks.

Most vector stores demo beautifully and fall over in production — stale data, recall cliffs, runaway bills. Synapse is engineered for the failure modes you actually hit.

Hybrid by default

Dense vectors find meaning; sparse keyword search finds the exact part number, error code, or name. Synapse fuses both and reranks in one call, so retrieval stops missing the obvious.

Sub-second freshness

New documents are queryable in under a second. No nightly reindex, no waiting on a rebuild — your assistant answers from what shipped this morning, not last week.

Recall you can trust

A tuned HNSW graph holds 95%+ recall even at a billion vectors, and every response reports its own recall — so you measure retrieval quality instead of hoping for it.

Metadata filters that scale

Pre-filter by tenant, ACL, language, or recency without wrecking latency. Permissions are enforced inside the query, so users never retrieve a chunk they shouldn't see.

Bring your own embeddings

Plug in OpenAI, Cohere, Voyage, or your own model. Synapse stores any dimensionality and lets you re-embed a collection in the background without downtime.

Costs that stay flat

Quantization and tiered storage cut memory up to 75%, and per-collection spend caps mean a traffic spike shows up on your dashboard, not on a surprise invoice.

The numbers behind the index

19ms

Median query at 1B vectors

97%

Typical recall@10

<1s

Document-to-queryable

99.99%

Uptime SLA

The platform

Everything a RAG pipelineneeds, behind one endpoint.

From the first embedding to a billion-scale, multi-tenant index — chunking, storage, search, and reranking unified under a single typed API and one billing line.

Managed ingestion

Stream documents in and Synapse chunks, embeds, and indexes them for you — with deduplication, versioning, and automatic backfill when you change your embedding model. The pipeline you'd otherwise build and maintain is just an upload.

Built-in reranking

A cross-encoder reranker lifts the most relevant chunks to the top of every result set, turning good recall into precise context windows.

Namespaces & multi-tenancy

Isolate thousands of customers in one cluster with namespace-level routing and quotas — no noisy-neighbor latency.

Auto-scaling shards

Replicas and shards rebalance themselves as your corpus and traffic grow. Scale to a billion vectors without a migration.

Point-in-time recovery

Continuous snapshots let you roll any collection back to a moment before a bad ingest — your index is as recoverable as your database.

Built for developers

An API that feelslike a database, not a science project.

Idempotent writes, typed clients, and a local emulator that matches prod. Wire up retrieval in an afternoon and retire the search cluster for good.

One typed client

First-class SDKs for TypeScript, Python, and Go — upsert, search, and filter through a single consistent, idempotent surface.

Local emulator

A Docker emulator with full search parity runs in CI and on your laptop, so retrieval tests don't depend on a live cluster.

Streaming & async ingest

Bulk-load millions of vectors with backpressure handling and get a guaranteed-delivery webhook the moment indexing completes.

Query insights

Every response returns latency, recall, and which stage fired, so you debug retrieval quality with evidence rather than guesswork.

Use cases

One retrieval layer, everyplace your model needs context.

The same hybrid search and reranking power support bots, internal copilots, and agents — each one tuned with filters and namespaces, none of them needing a separate stack.

Customer AI

Support deflection

Ground answers in current help center articles and ticket history. Sub-second freshness means a docs update is in the bot's replies minutes later, not after the next reindex.

Productivity

Internal knowledge copilot

Search wikis, runbooks, and Slack with ACL filters applied in the query, so each employee retrieves only the documents they're cleared to see.

Developer tools

Code & API assistants

Hybrid search pairs semantic intent with exact symbol matches, so the assistant surfaces the precise function signature, not a near-match that merely sounds related.

Agents

Agent memory & tools

Give long-running agents durable, queryable memory and namespace-isolated tool corpora that stay fast as the number of agents climbs.

Builders

Teams stopped fighting their retrieval layer.

“We ripped out a self-hosted cluster two engineers babysat full-time. Synapse cut our p99 from 140ms to 18ms and gave that headcount back to the product.”

Priya Nair

Head of AI, Lattice AI

“Hybrid search plus reranking took our answer accuracy from 'usually right' to 'right enough to ship support deflection.' Our containment rate jumped eleven points.”

Marcus Vey

Founding Engineer, Cortexa

“Sub-second freshness is the feature. Docs published at 9am are in the assistant's answers by 9:01 — no reindex job, no cron, no pager.”

Dana Okonkwo

Staff Engineer, Northbridge

Pricing

Pay for vectors and queries, nothing else.

Usage-based pricing with no per-seat fees and no annual lock-in. Start on a free index and scale linearly as your corpus grows.

Free

For prototypes and side projects.

$0/mo

Up to 1M vectors
1 namespace
Hybrid search
Local emulator
Community support

Scale

For products in production.

$199/mo

Up to 250M vectors
Unlimited namespaces
Built-in reranking
Sub-second freshness
99.99% uptime SLA
Priority support

Enterprise

For billion-scale, regulated workloads.

Custom

Unlimited vectors
Dedicated clusters
VPC peering & data residency
SSO + audit logs
Point-in-time recovery
Named solutions engineer

Questions, answered.

How is this different from running an open-source vector store myself?

Synapse is the same idea without the operations: no shard rebalancing, no index tuning, no memory firefights. Hybrid search, reranking, freshness, and multi-tenancy come built in, backed by an SLA — so your engineers ship features instead of nursing a cluster.

Can I use my own embedding model?

Yes. Bring vectors from OpenAI, Cohere, Voyage, or any in-house model at any dimensionality. Synapse can also call your embedding endpoint during managed ingestion and re-embed a collection in the background when you upgrade models.

How do you keep retrieval fast at a billion vectors?

A tuned HNSW graph with optional scalar and product quantization keeps memory low and recall high, while auto-scaling shards and replicas distribute query load. p99 stays under 25ms at a billion vectors in our published benchmarks.

How are permissions and tenant isolation handled?

Filters and ACLs are enforced inside the query path, so a user can never retrieve a chunk outside their scope. Namespaces isolate tenants with their own quotas, and Enterprise adds VPC peering and data residency.

What happens to a bad ingest?

Continuous snapshots give every collection point-in-time recovery. Roll back to the moment before a bad load, or replay a versioned ingest — your index is as recoverable as a managed database.

Ground your AI in your own data.

Spin up a free index, upsert your first vectors, and run a query in minutes. No cluster to provision, no sales call to start.

Retrieval thatkeeps up withyour model.