Skip to content

perf(color): extract one aligned column instead of copying the whole table#709

Merged
timtreis merged 1 commit into
mainfrom
feat/fast-color-extraction
Jun 10, 2026
Merged

perf(color): extract one aligned column instead of copying the whole table#709
timtreis merged 1 commit into
mainfrom
feat/fast-color-extraction

Conversation

@timtreis

Copy link
Copy Markdown
Member

What

Coloring shapes by a table column resolved the value through get_values(sdata, table_name), which joins the annotating table to the element. That join's table[joined_indices, :].copy() does an out-of-order sparse CSR row-gather to reorder rows to element order — which dominates large renders (~370 ms on Visium/Xenium-width tables). We only need one column aligned to the element, not a full-table copy.

This adds _extract_color_column and uses it in _set_color_source_vec for the shapes + table-origin case.

How

def _extract_color_column(table, value_key, *, origin, element, element_name, table_layer=None):
    region_key, instance_key = table.uns["spatialdata_attrs"]["region_key"], ...["instance_key"]
    mask = table.obs[region_key].to_numpy() == element_name
    inst = table.obs[instance_key].to_numpy()[mask]
    # var -> X/layers column; obs -> obs column (keeps Categorical for the legend path)
    return pd.Series(col[mask], index=inst).reindex(element.index)   # element order + NaN-fill

Gated to isinstance(element, GeoDataFrame) + table origin (obs/var). Points already use the preloaded_color_data shortcut; labels (raster element, no instance-id index) keep get_values.

Correctness

Bit-identical to get_values, verified:

  • real data (visium_hne genes + obs, curio categorical cluster): identical values, index alignment, dtype
  • 6 unit tests (TestExtractColorColumn): var (X), obs numeric, obs categorical (dtype preserved), shuffled table order (realigns), missing instances (→ NaN)

Existing test_plot_* shapes baselines are unaffected (output identical). Also fixes a latent bug: the previous element=sdata[table_name] fast shortcut did not realign rows, so it silently mis-colored when the table's instance order ≠ the element's geometry order.

Speedup (70k shapes / real data)

column get_values _extract_color_column
gene (sparse X) 189 ms 13.8 ms
obs column 222 ms 0.2 ms

~370 ms saved per colored shapes render on Visium/Xenium-scale tables.

Scope

Phase 1 of the table-copy investigation (plans/investigation-table-join-copy.md). It removes the color re-join. The structural _join_table_for_element (render.py:618) still runs and is a separate, independently-scoped follow-up (it's entangled with element alignment, the outline path, and the #1099 workaround). An upstream get_values single-column fast-path / copy=False join in spatialdata would help all consumers but isn't required here.

…table

Coloring shapes by a table column resolved the value via get_values(sdata, table_name), which joins
the table to the element. That join's table[indices, :].copy() does an out-of-order sparse CSR
row-gather that dominates large renders (~370 ms on Visium/Xenium-width tables).

Add _extract_color_column: region-mask the annotating table, read the single column (var from X /
layers, or obs preserving categorical dtype) and reindex to the element's instance order (NaN for
unannotated instances). Wire it into _set_color_source_vec for the shapes (GeoDataFrame) +
table-origin (obs/var) case; points already use the preloaded shortcut, labels keep get_values.

Bit-identical to get_values (verified on visium_hne + curio; 6 unit tests across var / obs /
categorical / shuffled-order / missing-instances), 14-1000x faster on the extraction itself. Also
fixes a latent bug: the previous element=sdata[table_name] shortcut did not realign rows, silently
mis-coloring when table order != element order.

Phase 1 of the table-copy investigation; slimming the structural _join_table_for_element is a
separate follow-up.
@codecov-commenter

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.42%. Comparing base (cb91f41) to head (053262a).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #709      +/-   ##
==========================================
+ Coverage   76.36%   76.42%   +0.05%     
==========================================
  Files          14       14              
  Lines        4316     4330      +14     
  Branches     1004     1006       +2     
==========================================
+ Hits         3296     3309      +13     
- Misses        663      664       +1     
  Partials      357      357              
Files with missing lines Coverage Δ
src/spatialdata_plot/pl/utils.py 69.12% <100.00%> (+0.15%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants