perf(color): extract one aligned column instead of copying the whole table#709
Merged
Conversation
…table Coloring shapes by a table column resolved the value via get_values(sdata, table_name), which joins the table to the element. That join's table[indices, :].copy() does an out-of-order sparse CSR row-gather that dominates large renders (~370 ms on Visium/Xenium-width tables). Add _extract_color_column: region-mask the annotating table, read the single column (var from X / layers, or obs preserving categorical dtype) and reindex to the element's instance order (NaN for unannotated instances). Wire it into _set_color_source_vec for the shapes (GeoDataFrame) + table-origin (obs/var) case; points already use the preloaded shortcut, labels keep get_values. Bit-identical to get_values (verified on visium_hne + curio; 6 unit tests across var / obs / categorical / shuffled-order / missing-instances), 14-1000x faster on the extraction itself. Also fixes a latent bug: the previous element=sdata[table_name] shortcut did not realign rows, silently mis-coloring when table order != element order. Phase 1 of the table-copy investigation; slimming the structural _join_table_for_element is a separate follow-up.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #709 +/- ##
==========================================
+ Coverage 76.36% 76.42% +0.05%
==========================================
Files 14 14
Lines 4316 4330 +14
Branches 1004 1006 +2
==========================================
+ Hits 3296 3309 +13
- Misses 663 664 +1
Partials 357 357
🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Coloring shapes by a table column resolved the value through
get_values(sdata, table_name), which joins the annotating table to the element. That join'stable[joined_indices, :].copy()does an out-of-order sparse CSR row-gather to reorder rows to element order — which dominates large renders (~370 ms on Visium/Xenium-width tables). We only need one column aligned to the element, not a full-table copy.This adds
_extract_color_columnand uses it in_set_color_source_vecfor the shapes + table-origin case.How
Gated to
isinstance(element, GeoDataFrame)+ table origin (obs/var). Points already use thepreloaded_color_datashortcut; labels (raster element, no instance-id index) keepget_values.Correctness
Bit-identical to
get_values, verified:cluster): identical values, index alignment, dtypeTestExtractColorColumn): var (X), obs numeric, obs categorical (dtype preserved), shuffled table order (realigns), missing instances (→ NaN)Existing
test_plot_*shapes baselines are unaffected (output identical). Also fixes a latent bug: the previouselement=sdata[table_name]fast shortcut did not realign rows, so it silently mis-colored when the table's instance order ≠ the element's geometry order.Speedup (70k shapes / real data)
get_values_extract_color_column~370 ms saved per colored shapes render on Visium/Xenium-scale tables.
Scope
Phase 1 of the table-copy investigation (
plans/investigation-table-join-copy.md). It removes the color re-join. The structural_join_table_for_element(render.py:618) still runs and is a separate, independently-scoped follow-up (it's entangled with element alignment, the outline path, and the #1099 workaround). An upstreamget_valuessingle-column fast-path /copy=Falsejoin in spatialdata would help all consumers but isn't required here.