Spiral Logo

The data warehouse for pre-training.

Maximize
Model FLOPs
Utilization

Multimodal data

Quickly ingest
any data,
any size.

Process and Enrich

Append columns
(& rows) without
rewriting existing data

Process and Enrich

Enrich objects with
properties

Scale without worry

Scale to millions of
columns, without
upfront schema design

Avoid I/O bottlenecks

Saturate your GPUs

Run an interactive query that loads more bytes per second into an H100 than if you precompute the result and save it as Parquet on local disk.

No more custom
data access layers

All the flexibility you need, out of the box.

Selective read
Parameterized push-down read

Integrates with tools you love

Familiar, interoperable standards.

SparkDaskModalDuckDBPolarsPyTorchPandasArrowIcebergRay

FOR COMPLEX DATA
AT MACHINE SCALE

Spiral is built on Vortex

Based on decades of database research, Vortex is the next-generation, open-source columnar format. We designed it to bridge storage & compute for the next generation of data processing, and were honored to donate it to the Linux Foundation to maximize its impact.

  • Pareto-optimal performance: Faster than Apache Parquet for virtually any workload.
  • Future-proof: Intentionally extensible to stay on the bleeding edge.
  • Interoperable: Works seamlessly with existing data ecosystems and tools.