Quarry is a vectorized SQL engine that reads Iceberg and Delta tables straight out of your S3, GCS, or Azure bucket. No nightly load, no proprietary copy, no warehouse to feed and babysit. Point it at object storage and get warehouse latency on petabytes you never moved.
SELECT region, sum(net_revenue)
FROM iceberg.sales.fact_orders
WHERE order_date >= DATE '2026-01-01'
GROUP BY region ORDER BY 2 DESC;
scanned 2.7 TB → read 41 GB (98.5% pruned)
vectorized · 18 nodes · 0 bytes copied
8 rows · 612 ms · $0.004 computeData platform teams run their lakehouse on Quarry
Quarry splits the fast part from the stored part. The engine is ours and tuned to the metal; the tables stay open, in your bucket, in formats any tool can still read in five years.
Quarry reads Parquet in batches of thousands of values at once and pushes predicates and projections down to the file footer. It touches only the columns and row groups a query needs, so a filter on one day of a year-long table reads gigabytes, not terabytes, and returns before the dashboard finishes painting.
First-class Apache Iceberg and Delta Lake: snapshots, schema evolution, hidden partitioning, and time-travel, all read and written the open way.
Your data stays in S3, GCS, or Azure Blob. Quarry brings the compute to it. No ingestion bill, no second copy, no vendor sitting between you and your own bytes.
Spin up sixty-four nodes for a backfill, drop to zero when idle. You pay for scan-seconds, never for a warehouse left running overnight.
Window functions, CTEs, lateral joins, and JSON, all standard SQL your team already writes. Existing dashboards and dbt models connect unchanged.
What separating storage from compute actually buys you
Quarry slots underneath the tools you already run. It is the compute layer, not another silo you have to sync, govern, and pay for twice.
A Postgres-wire and Arrow Flight SQL interface means dbt, Tableau, Superset, and your notebooks connect with the drivers they ship with today.
Point Quarry at Glue, Unity, Polaris, or Hive, or use Quarry's REST catalog. One source of truth for every engine that reads the lake.
Land Kafka and CDC streams into Iceberg with exactly-once commits, then query the same table the moment a row arrives.
Row filters, column masking, and table grants enforced in the engine and audited per query, applied before a single byte leaves your bucket.
From a 9am dashboard refresh to a six-hour reprocessing job, Quarry sizes the cluster to the question and tears it down the moment the answer lands.
Dashboards and ad-hoc exploration straight on Iceberg. No extract, no cube, no pre-aggregation to keep in sync.
Run thousand-model dbt projects against the lake with incremental merges and snapshot isolation built in.
Burst to hundreds of nodes for a one-time reprocess, then scale to zero. Pay for the hours you scanned and nothing else.
Materialize training sets and serve features over Arrow Flight straight into your notebooks and model store.
Serve tenant-isolated queries behind your product with row-level filters enforced inside the engine.
Register existing Parquet in place and convert to Iceberg with one statement. No re-ingest, no downtime.
“We retired a six-figure warehouse contract and pointed Quarry at the Iceberg tables already sitting in S3. Same dashboards, sub-second now, and the ingestion pipeline we babysat for two years is simply gone.”
“A year-long fact table that used to take ninety seconds returns in under one. Pushdown reads about two percent of the bytes. Our analysts noticed the speed; finance noticed the bill first.”
“Storage and compute being genuinely separate changed how we plan capacity. We burst to two hundred nodes for the nightly backfill and pay nothing while we sleep. No warehouse we priced could do both.”
Compute is metered while a query runs and bills to zero when it doesn't. You bring the storage; you already pay your cloud for that.
For prototypes and a single analyst.
For data teams running the lakehouse in production.
For multi-account estates and regulated data.
No, and that's the whole point. Quarry reads Iceberg, Delta, and raw Parquet directly from your S3, GCS, or Azure bucket. Register a table, or convert existing Parquet in place with one statement, and query it immediately. There is no ingestion step and no second copy of your data.
A vectorized columnar engine plus aggressive predicate and projection pushdown. Quarry reads only the columns and row groups a query touches, prunes partitions straight from table metadata, and caches hot footers locally, so most queries scan a small fraction of the table and return in well under a second.
Yes. Quarry speaks the Postgres wire protocol and Arrow Flight SQL, so dbt, Tableau, Superset, DBeaver, and Python notebooks connect with the drivers they already ship. Standard ANSI SQL means most dashboards and dbt models run unchanged.
Nothing, because it never left. Your tables sit in your bucket in open Iceberg or Delta format the entire time. Quarry is the engine, not the vault. Any other lakehouse engine can read the same tables tomorrow with zero migration.
A warehouse owns a proprietary copy of your data and bills you to store and compute on it together. Quarry leaves the data open in storage you already pay for and meters only the compute a query consumes, scaling to zero when idle. Open formats, separated billing, nothing to lock you in.
Connect a catalog, register a table, and watch a petabyte scan come back in seconds. No data to move, no credit card, no sales call to get started.