Skip to content

Aggregations

Aggregation specs used with .group_by(...).agg(...). Each spec reduces over a streamed spatial join without materialising the full pair frame.

import pycanopy as pc

result = (
    zones.lazy()
    .within_join(trips, x_col="lon", y_col="lat")
    .group_by(["zone_id"])
    .agg(
        n=pc.agg.count(),
        total_fare=pc.agg.sum("fare"),
        avg_fare=pc.agg.mean("fare"),
        min_fare=pc.agg.min("fare"),
        max_fare=pc.agg.max("fare"),
    )
)

pycanopy.agg

Aggregation specs for the fused aggregate-join (SpatialGroupBy.agg). Specs are associative so partials fold over the streamed join without materialising the full pair frame.

AggSpec dataclass

One associative aggregation: a kind and the column it reads (None for count).

inputs property

Source columns this spec reads, for the join keep-set.

Returns:

Type Description
set[str]

The set of source column names, empty for count.

combine(name)

Build the cross-morsel exprs that re-aggregate this spec's partials.

Parameters:

Name Type Description Default
name str

Output column name this aggregation produces.

required

Returns:

Type Description
list[Expr]

Exprs re-aggregating this spec's prefixed intermediate columns.

finalize(name)

Build the expr producing the named output from the combined partials.

Parameters:

Name Type Description Default
name str

Output column name this aggregation produces.

required

Returns:

Type Description
Expr

Expr yielding the named output column.

partial(name)

Build the per-morsel aggregation exprs for this spec.

Parameters:

Name Type Description Default
name str

Output column name this aggregation produces.

required

Returns:

Type Description
list[Expr]

Exprs producing this spec's prefixed intermediate columns.

count()

Count rows (pairs) per group, like Polars pl.len().

Returns:

Type Description
AggSpec

An AggSpec for the count aggregation.

max(column)

Maximum of a column per group.

Parameters:

Name Type Description Default
column str

Name of the column to reduce.

required

Returns:

Type Description
AggSpec

An AggSpec for the max aggregation.

mean(column)

Mean of a column per group, ignoring nulls.

Parameters:

Name Type Description Default
column str

Name of the column to average.

required

Returns:

Type Description
AggSpec

An AggSpec for the mean aggregation.

min(column)

Minimum of a column per group.

Parameters:

Name Type Description Default
column str

Name of the column to reduce.

required

Returns:

Type Description
AggSpec

An AggSpec for the min aggregation.

sum(column)

Sum a column per group.

Parameters:

Name Type Description Default
column str

Name of the column to sum.

required

Returns:

Type Description
AggSpec

An AggSpec for the sum aggregation.