Quick Start¶
Get started with geoparquet-io in 5 minutes.
Installation¶
uv pip install geoparquet-io
See the Installation Guide for more options.
Basic Workflow¶
1. Convert to GeoParquet (Optional)¶
If you're starting with Shapefile, GeoJSON, GeoPackage, or CSV/TSV, convert to optimized GeoParquet:
gpio convert input.shp output.parquet
This automatically applies all best practices: - ZSTD compression - 100k row groups - Bbox column with metadata - Hilbert spatial ordering - GeoParquet 1.1.0 metadata
Skip Hilbert ordering for faster conversion of large files:
gpio convert large.gpkg output.parquet --skip-hilbert
2. Inspect Your File¶
Take a look at what's in your GeoParquet file:
gpio inspect myfile.parquet
This shows you:
- File size and row count
- Coordinate reference system (CRS)
- Bounding box
- Column schema with types
Add --head 10 to preview the first 10 rows, or --stats for column statistics.
3. Check Quality¶
Validate your file against GeoParquet best practices:
gpio check all myfile.parquet
This checks:
- Spatial ordering
- Compression settings
- Bbox metadata structure
- Row group optimization
4. Optimize Existing Files¶
If you already have GeoParquet files, enhance them with spatial indices.
Add a bounding box column for faster spatial queries:
gpio add bbox input.parquet output.parquet
Sort data using a Hilbert curve for better spatial locality:
gpio sort hilbert input.parquet sorted.parquet
5. Add Spatial Indices¶
Enhance your data with additional spatial indexing:
# Add H3 hexagonal cell IDs (resolution 9 ≈ 105m² cells)
gpio add h3 input.parquet output_h3.parquet --resolution 9
# Add KD-tree partition IDs (auto-selects optimal partition count)
gpio add kdtree input.parquet output_kdtree.parquet
# Add country codes via spatial join
gpio add admin-divisions buildings.parquet buildings_with_countries.parquet
6. Partition Large Datasets¶
Split large files into manageable partitions:
# Preview what partitions would be created
gpio partition admin buildings.parquet --preview
# Partition by country code
gpio partition admin buildings.parquet output_dir/
# Partition by H3 cells at resolution 7 (~5km² cells)
gpio partition h3 points.parquet output_dir/ --resolution 7
# Partition by KD-tree (auto-balanced spatial partitions)
gpio partition kdtree large_file.parquet output_dir/
7. Chain Commands with Pipes¶
gpio commands can be chained together using Unix pipes. This eliminates intermediate files and provides significant performance improvements:
# Extract, add bbox, and sort in one pipeline
gpio extract --limit 10000 input.parquet | gpio add bbox - | gpio sort hilbert - output.parquet
# Spatial filter, add indices, then partition
gpio extract --bbox "-122.5,37.5,-122.0,38.0" input.parquet | \
gpio add quadkey - | \
gpio partition string --column quadkey --chars 4 - output_dir/
# Add multiple indices in sequence
gpio add bbox input.parquet | gpio add h3 --resolution 9 - | gpio add quadkey - output.parquet
Use - as input to read from stdin. Output is auto-detected when piped. See the Piping Guide for details.
Common Patterns¶
Convert and Validate¶
# 1. Convert from Shapefile/GeoJSON/GeoPackage/CSV
gpio convert input.shp output.parquet
# 2. Verify it meets best practices
gpio check all output.parquet
# 3. Inspect the results
gpio inspect output.parquet
Convert, Fix, and Upload¶
# 1. Convert to GeoParquet
gpio convert input.shp data.parquet
# 2. Validate and fix issues
gpio check all data.parquet --fix --fix-output data_fixed.parquet
# 3. Upload to cloud storage
gpio publish upload data_fixed.parquet s3://bucket/data.parquet --profile prod
Quality Check → Optimize → Validate¶
For existing GeoParquet files:
# 1. Check current state
gpio check all input.parquet
# 2. Optimize (using pipes - no intermediate files)
gpio add bbox input.parquet | gpio sort hilbert - optimized.parquet
# 3. Verify improvements
gpio check all optimized.parquet
Inspect → Enhance → Partition¶
# 1. Understand your data
gpio inspect buildings.parquet --stats
# 2. Add country codes
gpio add admin-divisions buildings.parquet buildings_enhanced.parquet
# 3. Split by country
gpio partition admin buildings_enhanced.parquet by_country/
Preview Before Processing¶
Always use --preview to understand what will happen:
# Preview partitioning strategy
gpio partition string input.parquet --column region --preview
# Preview with analysis
gpio partition h3 input.parquet --resolution 8 --preview
# If satisfied, run without --preview
gpio partition h3 input.parquet output/ --resolution 8
Using the Python API¶
The Python API provides the best performance by keeping data in memory:
import geoparquet_io as gpio
# Read and inspect a file
table = gpio.read('data.parquet')
table.info() # Print summary
# Transform and write
gpio.read('input.parquet') \
.add_bbox() \
.sort_hilbert() \
.write('optimized.parquet')
# Chain multiple operations
gpio.read('input.parquet') \
.extract(limit=10000) \
.add_bbox() \
.add_quadkey(resolution=12) \
.sort_hilbert() \
.write('output.parquet')
# Convert from other formats
gpio.convert('data.gpkg') \
.add_bbox() \
.sort_hilbert() \
.write('output.parquet')
# Upload to cloud storage
gpio.read('data.parquet') \
.add_bbox() \
.upload('s3://bucket/data.parquet')
The Python API is up to 5x faster than CLI operations because data stays in memory. See the Python API documentation for the full reference.
Getting Help¶
Every command has detailed help:
# General help
gpio --help
# Command group help
gpio add --help
gpio partition --help
# Specific command help
gpio add bbox --help
gpio partition h3 --help
Next Steps¶
Now that you know the basics, explore:
- Python API Reference - Full Python API documentation
- User Guide - Detailed documentation for all features
- CLI Reference - Complete command reference
- Examples - Real-world usage patterns
- Spatial Performance - Understanding bbox, sorting, and partitioning