Skip to content

PyCanopy

A declarative spatial query layer for Polars. Rust core, Python API.

What is PyCanopy

PyCanopy brings spatial queries — range, kNN, joins, polygon containment — into the Polars ecosystem without leaving Python. You declare operations in any order; the query planner reorders, fuses, and pushes them down before execution. The index type (KD-tree, R-tree, grid, or brute force) is selected automatically by a cost model calibrated to your hardware.

Why PyCanopy

PyCanopy GeoPandas DuckDB SedonaDB Spatial Polars
Polars-native, no SQL or conversion ✗ (SQL) ✗ (SQL)
Spatial query planner (reorder, fuse, pushdown) ✓ (SQL)
Index vs scan decided by cost model
Adaptive index (KD-tree / R-tree / grid)

Benchmarks

Apache SpatialBench is the industry-standard single-node spatial query benchmark, maintained by the Apache Sedona project. Results below are from a single m7i.2xlarge (8 vCPU, 32 GB), the same instance type used in the published baseline.

SF1 (~6M trips): PyCanopy wins 7/12 queries.

Apache SpatialBench SF1

SF10 (~60M trips): PyCanopy wins 5/12 queries.

Apache SpatialBench SF10

Full results tables with per-query times are on the Benchmarks page.

Accepted input formats

Format Example
numpy (N, 2) array np.array([[x, y], ...])
GeoArrow PyArrow array pa.StructArray or FixedSizeList<2>
geopandas GeoSeries gdf.geometry
shapely Points / Polygons / MultiPolygons [Point(x, y), ...]
list of (x, y) tuples [(x, y), ...]
Separate coordinate sequences Engine.from_coords(xs, ys)
WKB point column (Binary) SpatialFrame.from_wkb_points(df, "geom")
WKB polygon column (Binary) SpatialFrame.from_wkb_polygons(df, "geom")