Encoding geospatial data in Apache Parquet.
Apache Parquet is a powerful column-oriented data format, built from the ground up to as a modern alternative to CSV files. GeoParquet is an incubating Open Geospatial Consortium (OGC) standard that adds interoperable geospatial types (Point, Line, Polygon) to Parquet.
Read the specification for the v1.0.0-beta.1 release (or see the metadata schema). Find links to older releases on the release page.
For more information see the goals and features section of the readme in the GeoParquet repository. There is also a nice deep dive on Parquet and GeoParquet in this blog post: Introducing the GeoParquet data format, and we'll be soon expanding this website with more details.
Following GeoParquet's structure enables interoperability between any system that reads or writes spatial data in Parquet
Data science workflows benefit from columnar data formats, and geospatial analysis can tap into its innovations
Snowflake, BigQuery, RedShift, DataBricks can all work together seamlessly with the same geospatial data format
GeoParquet is rapidly maturing, with a number of new software libraries and tools coming online.
There are many sources of GeoParquet data, with more and more coming online all the time. If you have or know of a good source of GeoParquet data please let us know!