Converting to GeoParquet¶
The convert command transforms vector formats into optimized GeoParquet files with all best practices applied automatically.
CLI vs Python Behavior
The CLI gpio convert applies Hilbert sorting by default for optimal spatial queries.
The Python gpio.convert() does NOT sort by default - chain .sort_hilbert() explicitly if needed.
Basic Usage¶
gpio convert input.shp output.parquet
Automatically applies:
- ZSTD compression (level 15)
- 100,000 row groups
- Bbox column with proper metadata
- Hilbert spatial ordering
- GeoParquet 1.1.0 metadata
import geoparquet_io as gpio
# Convert with Hilbert sorting (recommended)
gpio.convert('input.shp').sort_hilbert().write('output.parquet')
# Or without sorting (faster but less optimal for spatial queries)
gpio.convert('input.shp').write('output.parquet')
Supported Input Formats¶
Auto-detected by file extension:
- Shapefile (.shp)
- GeoJSON (.geojson, .json)
- GeoPackage (.gpkg)
- File Geodatabase (.gdb)
- CSV/TSV (.csv, .tsv, .txt) - See CSV/TSV Support below
Any format supported by DuckDB's spatial extension can be read.
Remote Files¶
Read from cloud storage or HTTPS:
# Convert remote file
gpio convert https://example.com/data.geojson local.parquet
# Convert from S3
gpio convert s3://bucket/input.parquet local-optimized.parquet
See Remote Files Guide for authentication setup.
Options¶
Skip Hilbert Ordering¶
For faster conversion when spatial ordering isn't critical:
gpio convert large.gpkg output.parquet --skip-hilbert
Trade-off: Faster conversion but less optimal for spatial queries.
Custom Compression¶
Control compression type and level:
# GZIP compression
gpio convert input.shp output.parquet --compression GZIP --compression-level 6
# Uncompressed (not recommended)
gpio convert input.geojson output.parquet --compression UNCOMPRESSED
Available compression types:
- ZSTD (default, level 15) - Best compression + speed balance
- GZIP (level 1-9) - Wide compatibility
- BROTLI (level 1-11) - High compression
- LZ4 - Fastest decompression
- SNAPPY - Fast compression
- UNCOMPRESSED - No compression
Verbose Output¶
Track progress and see detailed information:
gpio convert input.gpkg output.parquet --verbose
Shows: - Geometry column detection - Dataset bounds calculation - Bbox column creation - Hilbert ordering progress - File size and validation
Examples¶
Basic Shapefile Conversion¶
gpio convert buildings.shp buildings.parquet
Output:
Converting buildings.shp...
Done in 2.3s
Output: buildings.parquet (4.2 MB)
✓ Output passes GeoParquet validation
import geoparquet_io as gpio
gpio.convert('buildings.shp').sort_hilbert().write('buildings.parquet')
Large Dataset Without Hilbert¶
gpio convert large_dataset.gpkg output.parquet --skip-hilbert
import geoparquet_io as gpio
# Python doesn't sort by default, so just skip sort_hilbert()
gpio.convert('large_dataset.gpkg').write('output.parquet')
Skips Hilbert ordering for faster processing on large files.
Custom Compression Settings¶
gpio convert roads.geojson roads.parquet \
--compression ZSTD \
--compression-level 22 \
--verbose
Maximum ZSTD compression with progress tracking.
Convert and Inspect¶
# Convert
gpio convert input.shp output.parquet
# Verify
gpio inspect output.parquet
# Validate
gpio check all output.parquet
CSV/TSV Support¶
Auto-detects geometry columns. WKT columns (wkt, geometry, geom) checked first, then lat/lon pairs (lat/lon, latitude/longitude).
# Auto-detect WKT or lat/lon
gpio convert points.csv points.parquet
# Explicit columns
gpio convert data.csv out.parquet --wkt-column geom_wkt
gpio convert data.csv out.parquet --lat-column lat --lon-column lng
# Custom delimiter
gpio convert data.txt out.parquet --delimiter "|"
CRS and Validation¶
Default: WGS84 (EPSG:4326). Override with --crs for WKT data:
gpio convert projected.csv out.parquet --crs EPSG:3857
Validates lat/lon ranges (-90 to 90, -180 to 180). Warns on large coordinates suggesting projected CRS.
Invalid Geometries¶
Fails on invalid WKT by default. Skip with --skip-invalid:
gpio convert messy.csv out.parquet --skip-invalid
Skips invalid rows, disables Hilbert ordering. Mixed geometry types supported.
Delimiters¶
Auto-detects comma and tab. Override with --delimiter for semicolon, pipe, or any single character.
gpio convert data.csv out.parquet --delimiter ";"
Performance¶
The convert command uses DuckDB's spatial extension - the fastest option for GeoParquet conversion, especially for large files.
Benchmarks on representative datasets:
| Dataset | Size | Features | DuckDB | PyOGRIO | ogr2ogr | Fiona |
|---|---|---|---|---|---|---|
| GAUL L2 Shapefile | 739 MB | 45k | 4.6s | 5.9s | 4.1s | 187s |
| Argentina Roads | 1.1 GB | 3.5M | 30s | 66s | 117s | 349s |
DuckDB also uses significantly less memory than alternatives (near-zero vs 600MB-2GB for GeoPandas).
To run your own benchmarks:
gpio benchmark input.geojson --iterations 3
See gpio benchmark for details.
See Also¶
- CLI Reference: convert
- benchmark command - Compare conversion performance
- add command - Add indices to existing GeoParquet
- sort command - Sort existing GeoParquet spatially
- check command - Validate and fix GeoParquet files