Checking Best Practices¶
CLI Only
The check commands are currently only available via the CLI. See issue #151 for Python API roadmap.
The check commands validate GeoParquet files against best practices.
Run All Checks¶
gpio check all myfile.parquet
Runs all validation checks:
- Spatial ordering
- Compression settings
- Bbox structure and metadata
- Row group optimization
Individual Checks¶
Spatial Ordering¶
gpio check spatial myfile.parquet
Checks if data is spatially ordered using random sampling. Spatially ordered data improves:
- Query performance
- Compression ratios
- Cloud access patterns
Compression¶
gpio check compression myfile.parquet
Validates geometry column compression settings.
Bbox Structure¶
gpio check bbox myfile.parquet
Verifies:
- Bbox column structure
- GeoParquet metadata version
- Bbox covering metadata
Row Groups¶
gpio check row-group myfile.parquet
Checks row group size optimization for cloud-native access.
STAC Validation¶
gpio check stac output.json
Validates STAC Item or Collection JSON:
- STAC spec compliance
- Required fields
- Asset href resolution (local files)
- Best practices
Options¶
# Verbose output with details
gpio check all myfile.parquet --verbose
# Custom sampling for spatial check
gpio check spatial myfile.parquet --random-sample-size 200 --limit-rows 1000000
Checking Partitioned Data¶
When checking a directory containing partitioned data, you can control how many files are checked:
# By default, checks only the first file
gpio check all partitions/
# Output: Checking first file (of 4 total). Use --check-all or --check-sample N for more.
# Check all files in the partition
gpio check all partitions/ --check-all
# Check a sample of files (first N files)
gpio check all partitions/ --check-sample 3
--fix not available for partitions
The --fix option only works with single files. To fix issues in partitioned data, first consolidate with gpio extract, apply fixes, then re-partition if needed.
See Also¶
- CLI Reference: check
- add command - Add spatial indices
- sort command
- stac command - Generate STAC metadata