Skip to content

render_shapes silently drops unannotated shapes when coloring by a table column (inconsistent with points/labels, which use na_color) #710

@timtreis

Description

@timtreis

Summary

When shapes are colored by a column from an annotating table, shapes whose instance has no row in the table are silently dropped from the plot (not rendered at all). This is inconsistent with the points and labels paths, which keep unannotated elements and render them with na_color. It also contradicts the ecosystem convention (scanpy/squidpy grey missing values rather than deleting them) and looks like silent data loss.

Current behavior by render type

render type table-color code path unannotated element
shapes _join_table_for_element(..., how="inner") (pl/utils.py) dropped (not drawn)
points points.merge(color_values, how="left") (pl/render.py) kept → na_color
labels raster mask + instance→color map kept → na_color

Even within shapes it is inconsistent: coloring by a column on the element's own dataframe (no table join) draws all shapes, while coloring by a table column drops the unannotated ones.

Reproduction

import numpy as np, pandas as pd, geopandas as gpd
from shapely.geometry import Point
from anndata import AnnData
from spatialdata import SpatialData
from spatialdata.models import ShapesModel, TableModel
import spatialdata_plot  # noqa
import matplotlib.pyplot as plt

rng = np.random.default_rng(0)
n = 20
geom = gpd.GeoDataFrame(
    {"geometry": [Point(*xy) for xy in rng.random((n, 2)) * 100], "radius": np.ones(n) * 2},
    index=pd.Index(range(n)),
)
inst = np.arange(12)  # table annotates only 12 of the 20 shapes
ad = AnnData(
    X=rng.random((12, 3)).astype("float32"),
    obs=pd.DataFrame({"region": pd.Categorical(["shapes"] * 12), "instance_id": inst, "val": rng.random(12)}),
)
ad.var_names = [f"g{i}" for i in range(3)]
sdata = SpatialData(
    shapes={"shapes": ShapesModel.parse(geom)},
    tables={"t": TableModel.parse(ad, region="shapes", region_key="region", instance_key="instance_id")},
)
sdata.pl.render_shapes("shapes", color="val").pl.show()
# -> only 12 of 20 shapes are drawn; the 8 unannotated shapes silently disappear

Expected behavior

Unannotated shapes should be kept and rendered with na_color, matching the points/labels paths and the scanpy/squidpy convention. Dropping (if ever desired) should be opt-in, not the silent default — mirroring how groups already works (non-matching are filtered, but na_color=... keeps them visible).

Why it matters

  • Silent data loss / surprise: coloring spots by a sparsely-detected gene makes spots vanish with no warning.
  • Breaks the labels ↔ shapes interchangeability premise (notebooks/examples/labels_shapes_interchangeability.ipynb): the same cells render differently as labels (greyed) vs shapes (deleted) under partial annotation.
  • No contract pins the drop — no test asserts it; it looks incidental (a side effect of using an inner join for row alignment), not a deliberate design.

Notes / links

  • The drop comes from the how="inner" join in _join_table_for_element.
  • Performance tie-in: PR perf(color): extract one aligned column instead of copying the whole table #709 (fast single-column color extraction) deliberately preserves the current drop behavior (verified pixel-identical), so this is independent of it. The cleanest place to fix the behavior is together with removing the structural join (a follow-up to perf(color): extract one aligned column instead of copying the whole table #709): not inner-joining and extracting the column directly makes unannotated instances reindex to NaN → na_color and removes the remaining per-render table copy — i.e. correctness + the rest of the speedup in one change. This will change visual baselines for partially-annotated data and warrants a changelog note.

Possibly related (separate)

Rendering partially-annotated points appears to raise ValueError: Observations annot. 'obs' must have as many rows as X has rows (20), but has 12 rows from the points AnnData construction (pl/render.py, where X=points[["x","y"]] has all rows but obs=matched_table.obs has only the annotated rows). This may be a separate latent bug in the points partial-annotation path — worth confirming and possibly its own issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions