Python API Overview¶

geoparquet-io provides a powerful Python API for programmatic access to all functionality. The API offers the best performance by keeping data in memory as Arrow tables.

Quick Example¶

import geoparquet_io as gpio

# Read, transform, and write in a fluent chain
gpio.read('input.parquet') \
    .add_bbox() \
    .sort_hilbert() \
    .write('output.parquet')

API Options¶

The Python API offers three ways to work with GeoParquet data:

1. Fluent Table API (Recommended)¶

The primary API for most users. Provides chainable methods on a Table object:

import geoparquet_io as gpio

# Chain operations fluently
result = gpio.read('input.parquet') \
    .extract(limit=10000) \
    .add_bbox() \
    .add_h3(resolution=9) \
    .sort_hilbert()

result.write('output.parquet')

See Python API Reference for full documentation.

2. Pure Functions (ops module)¶

For integration with existing PyArrow workflows:

import pyarrow.parquet as pq
from geoparquet_io.api import ops

table = pq.read_table('input.parquet')
table = ops.add_bbox(table)
table = ops.sort_hilbert(table)

See Python API Reference - ops module for details.

3. Pipeline Composition¶

Build reusable transformation pipelines:

from geoparquet_io.api import pipe, read

preprocess = pipe(
    lambda t: t.add_bbox(),
    lambda t: t.add_h3(resolution=9),
    lambda t: t.sort_hilbert(),
)

result = preprocess(read('input.parquet'))
result.write('output.parquet')

See Python API Reference - Pipeline Composition for details.

Key Functions¶

Function	Description
`gpio.read(path)`	Read a GeoParquet file into a Table
`gpio.read_partition(path)`	Read a Hive-partitioned dataset
`gpio.convert(path)`	Convert Shapefile/GeoJSON/GeoPackage/CSV to Table
`gpio.pipe(*funcs)`	Create a reusable transformation pipeline

Table Methods¶

Method	Description
`.add_bbox()`	Add bounding box column
`.add_h3(resolution)`	Add H3 hexagonal cell column
`.add_quadkey(resolution)`	Add quadkey tile column
`.add_kdtree()`	Add KD-tree partition column
`.sort_hilbert()`	Sort by Hilbert space-filling curve
`.sort_quadkey()`	Sort by quadkey
`.sort_column(name)`	Sort by any column
`.extract(...)`	Filter columns and rows
`.reproject(target_crs)`	Reproject to different CRS
`.write(path)`	Write to GeoParquet file
`.upload(url)`	Upload to cloud storage
`.partition_by_h3()`	Partition into H3-based files
`.partition_by_quadkey()`	Partition into quadkey-based files

Performance¶

The Python API provides the best performance:

Approach	Time (75MB, 400K rows)	Notes
CLI (file-based)	34s	Each command writes intermediate file
CLI (piped)	16s	Arrow IPC streaming between commands
Python API	7s	In-memory, no I/O overhead

Advanced: Core Module Access¶

For power users who need direct access to file-based functions:

from geoparquet_io.core.add_bbox_column import add_bbox_column
from geoparquet_io.core.hilbert_order import hilbert_order

add_bbox_column(
    input_parquet="input.parquet",
    output_parquet="output.parquet",
    bbox_name="bbox",
    verbose=True
)

See Core Functions Reference for all available functions.

Next Steps¶

Python API Reference - Complete method documentation
Examples - Usage patterns and examples
Spatial Performance Guide - Understanding bbox, sorting, and partitioning