Quarry — Open Lakehouse & Columnar Analytics

Quarry

Open lakehouse · columnar speed

Quarry is a vectorized SQL engine that reads Iceberg and Delta tables straight out of your S3, GCS, or Azure bucket. No nightly load, no proprietary copy, no warehouse to feed and babysit. Point it at object storage and get warehouse latency on petabytes you never moved.

Open table formats, zero lock-in
Storage and compute, fully separate
Bills to zero the second a query ends

quarry · explain analyze

SELECT region, sum(net_revenue)
FROM iceberg.sales.fact_orders
WHERE order_date >= DATE '2026-01-01'
GROUP BY region ORDER BY 2 DESC;

scanned  2.7 TB  →  read 41 GB  (98.5% pruned)
vectorized · 18 nodes · 0 bytes copied

8 rows · 612 ms · $0.004 compute

Data platform teams run their lakehouse on Quarry

NorthvaultGreypineCobalt DataHelio LabsTesseraSandgroveDriftwellNorthvaultGreypineCobalt DataHelio LabsTesseraSandgroveDriftwell

The engine

Warehouse-grade speedthat never owns your data.

Quarry splits the fast part from the stored part. The engine is ours and tuned to the metal; the tables stay open, in your bucket, in formats any tool can still read in five years.

Vectorized columnar scans

Quarry reads Parquet in batches of thousands of values at once and pushes predicates and projections down to the file footer. It touches only the columns and row groups a query needs, so a filter on one day of a year-long table reads gigabytes, not terabytes, and returns before the dashboard finishes painting.

Open table formats, natively

First-class Apache Iceberg and Delta Lake: snapshots, schema evolution, hidden partitioning, and time-travel, all read and written the open way.

Storage you already pay for

Your data stays in S3, GCS, or Azure Blob. Quarry brings the compute to it. No ingestion bill, no second copy, no vendor sitting between you and your own bytes.

Elastic compute, by the second

Spin up sixty-four nodes for a backfill, drop to zero when idle. You pay for scan-seconds, never for a warehouse left running overnight.

ANSI SQL, no dialect tax

Window functions, CTEs, lateral joins, and JSON, all standard SQL your team already writes. Existing dashboards and dbt models connect unchanged.

What separating storage from compute actually buys you

612ms

Median interactive query

98.5%

Bytes pruned before they're read

Copies of your data we keep

64→0

Nodes, scaled per query

Built for data engineers

The engine layerfor the data stack you have.

Quarry slots underneath the tools you already run. It is the compute layer, not another silo you have to sync, govern, and pay for twice.

dbt and BI, connected

A Postgres-wire and Arrow Flight SQL interface means dbt, Tableau, Superset, and your notebooks connect with the drivers they ship with today.

Bring your own catalog

Point Quarry at Glue, Unity, Polaris, or Hive, or use Quarry's REST catalog. One source of truth for every engine that reads the lake.

Streaming and batch, one table

Land Kafka and CDC streams into Iceberg with exactly-once commits, then query the same table the moment a row arrives.

Governance without a copy

Row filters, column masking, and table grants enforced in the engine and audited per query, applied before a single byte leaves your bucket.

Workloads

One engine, every layer of the lake.

From a 9am dashboard refresh to a six-hour reprocessing job, Quarry sizes the cluster to the question and tears it down the moment the answer lands.

Sub-second

Interactive BI

Dashboards and ad-hoc exploration straight on Iceberg. No extract, no cube, no pre-aggregation to keep in sync.

dbt-native

ELT transformations

Run thousand-model dbt projects against the lake with incremental merges and snapshot isolation built in.

Elastic

Petabyte backfills

Burst to hundreds of nodes for a one-time reprocess, then scale to zero. Pay for the hours you scanned and nothing else.

ML-ready

Feature pipelines

Materialize training sets and serve features over Arrow Flight straight into your notebooks and model store.

Embedded

Customer-facing analytics

Serve tenant-isolated queries behind your product with row-level filters enforced inside the engine.

Lift-free

Lakehouse migration

From the data platform teams

They stopped paying to store the same data twice.

“We retired a six-figure warehouse contract and pointed Quarry at the Iceberg tables already sitting in S3. Same dashboards, sub-second now, and the ingestion pipeline we babysat for two years is simply gone.”

Priya Nandakumar

Head of Data Platform, Northvault

“A year-long fact table that used to take ninety seconds returns in under one. Pushdown reads about two percent of the bytes. Our analysts noticed the speed; finance noticed the bill first.”

Felix Marchetti

Staff Data Engineer, Greypine

“Storage and compute being genuinely separate changed how we plan capacity. We burst to two hundred nodes for the nightly backfill and pay nothing while we sleep. No warehouse we priced could do both.”

Adaeze Okonkwo

Director of Analytics Engineering, Tessera

Pricing

Pay for scan-seconds, not for a parked warehouse.

Compute is metered while a query runs and bills to zero when it doesn't. You bring the storage; you already pay your cloud for that.

Developer

For prototypes and a single analyst.

$0/mo

Up to 1 TB scanned / mo
Iceberg + Delta read & write
Postgres-wire + Arrow Flight
Community Slack support

Team

For data teams running the lakehouse in production.

$0.008/ scan-GB

Pay per scan, billed to zero idle
Elastic clusters to 128 nodes
Bring-your-own catalog
Row & column governance
99.9% SLA, priority support

Enterprise

For multi-account estates and regulated data.

Custom

Reserved + spot compute pools
VPC-isolated or self-hosted engine
SSO, SCIM & query audit logs
Data residency + private catalog
Named solutions architect

Questions, answered.

Do I have to load data into Quarry first?

No, and that's the whole point. Quarry reads Iceberg, Delta, and raw Parquet directly from your S3, GCS, or Azure bucket. Register a table, or convert existing Parquet in place with one statement, and query it immediately. There is no ingestion step and no second copy of your data.

How does Quarry hit warehouse latency on object storage?

A vectorized columnar engine plus aggressive predicate and projection pushdown. Quarry reads only the columns and row groups a query touches, prunes partitions straight from table metadata, and caches hot footers locally, so most queries scan a small fraction of the table and return in well under a second.

Will my existing tools work?

Yes. Quarry speaks the Postgres wire protocol and Arrow Flight SQL, so dbt, Tableau, Superset, DBeaver, and Python notebooks connect with the drivers they already ship. Standard ANSI SQL means most dashboards and dbt models run unchanged.

What happens to my data if I leave?

Nothing, because it never left. Your tables sit in your bucket in open Iceberg or Delta format the entire time. Quarry is the engine, not the vault. Any other lakehouse engine can read the same tables tomorrow with zero migration.

How is this different from a cloud data warehouse?

A warehouse owns a proprietary copy of your data and bills you to store and compute on it together. Quarry leaves the data open in storage you already pay for and meters only the compute a query consumes, scaling to zero when idle. Open formats, separated billing, nothing to lock you in.

Point Quarry at your bucket. Run the first query.

Connect a catalog, register a table, and watch a petabyte scan come back in seconds. No data to move, no credit card, no sales call to get started.

Query the datawhere italready lives.