Skip to content

extract Command

For detailed usage and examples, see the Extract User Guide.

Quick Reference

gpio extract --help

This will show all available options for the extract command.

Options

Column Selection

  • --include-cols COLS - Comma-separated columns to include (geometry and bbox auto-added unless in --exclude-cols)
  • --exclude-cols COLS - Comma-separated columns to exclude (can be used with --include-cols to exclude geometry/bbox)

Spatial Filtering

  • --bbox BBOX - Bounding box filter: xmin,ymin,xmax,ymax
  • --geometry GEOM - Geometry filter: GeoJSON, WKT, @filepath, or - for stdin
  • --use-first-geometry - Use first geometry if FeatureCollection contains multiple

SQL Filtering

  • --where CLAUSE - DuckDB WHERE clause for filtering rows
  • --limit N - Maximum number of rows to extract

Output Options

# Compression settings
--compression [ZSTD|GZIP|BROTLI|LZ4|SNAPPY|UNCOMPRESSED]
--compression-level [1-22]

# Row group sizing
--row-group-size [exact row count]
--row-group-size-mb [target size like '256MB' or '1GB']

# Workflow options
--dry-run          # Preview SQL without executing
--verbose          # Detailed output
--preview          # Preview results (partition commands)
--hive             # Use Hive-style partitioning
--overwrite        # Overwrite existing files
--profile NAME     # AWS profile for S3 operations

Examples

# Extract all data
gpio extract input.parquet output.parquet

# Extract specific columns
gpio extract data.parquet output.parquet --include-cols id,name,area

# Exclude columns
gpio extract data.parquet output.parquet --exclude-cols internal_id,temp

# Filter by bounding box
gpio extract data.parquet output.parquet --bbox -122.5,37.5,-122.0,38.0

# Filter by geometry from file
gpio extract data.parquet output.parquet --geometry @boundary.geojson

# SQL WHERE filter
gpio extract data.parquet output.parquet --where "population > 10000"

# Combined filters
gpio extract data.parquet output.parquet \
  --include-cols id,name \
  --bbox -122.5,37.5,-122.0,38.0 \
  --where "status = 'active'"

# Extract from remote file
gpio extract s3://bucket/data.parquet output.parquet --bbox 0,0,10,10

# Preview query with dry run
gpio extract data.parquet output.parquet \
  --where "name LIKE '%Hotel%'" \
  --dry-run

Column Selection Behavior

  • include-cols only: Select specified columns + geometry + bbox (if exists)
  • exclude-cols only: Select all columns except specified
  • Both: Select include-cols, but exclude-cols can remove geometry/bbox
  • Geometry and bbox always included unless explicitly excluded

Spatial Filtering Details

  • --bbox: Uses bbox column for fast filtering when available (bbox covering), otherwise calculates from geometry
  • --geometry: Supports inline GeoJSON/WKT, file reference (@filepath), or stdin (-)
  • CRS warning: Tool warns if bbox looks like lat/long but data uses projected CRS

WHERE Clause Notes

  • Accepts any valid DuckDB SQL WHERE expression
  • Column names with special characters need double quotes in SQL: "crop:name"
  • Shell escaping varies by platform - see User Guide for examples
  • Dangerous SQL keywords (DROP, DELETE, etc.) are blocked for safety