Skip to content

X shape check slices to first 2 shape values#2432

Open
eroell wants to merge 7 commits into
scverse:mainfrom
eroell:x-shape-slice-ndim
Open

X shape check slices to first 2 shape values#2432
eroell wants to merge 7 commits into
scverse:mainfrom
eroell:x-shape-slice-ndim

Conversation

@eroell

@eroell eroell commented May 10, 2026

Copy link
Copy Markdown
Contributor
  • Release note not necessary because:

Before

higher-than-2D X was rejected at construction; you could not hold a higher-dimensional X in memory:

import numpy as np
import anndata as ad
ad.AnnData(X=np.zeros((3, 4, 5)))
# ValueError: too many values to unpack (expected 2)
#   (from `n_obs, n_vars = X.shape`)

adata = ad.AnnData(X=np.zeros((3, 4)))
adata.X = np.zeros((3, 4, 5))
# FutureWarning: Automatic reshaping when setting X will be removed ...
# -> then ValueError, since (3, 4, 5) can't be reshaped to (3, 4)

higher-than-2D layers slipped through the in-memory shape check (only axes 0 and 1 are validated) and the writer happily wrote them to disk, silently violating the spec.

After

In-memory: higher-than-2D X and layers warns but still succeeds (the shape check uses X.shape[:2]).
Writing a non-2D X / layer now hard-fails.
Reading a non-conforming file warns but still succeeds:

adata = ad.AnnData(X=np.zeros((3, 4, 5)))   # now OK in memory
# UserWarning: X must be 2-dimensional, but got an array with shape (3, 4, 5) (ndim=3). Storing higher-dimensional arrays in `X` or `layers` violates the AnnData specification, and cannot be written to disk.

adata.X = np.zeros((3, 4, 5))               # also OK
# UserWarning: X must be 2-dimensional, but got an array with shape (3, 4, 5) (ndim=3). Storing higher-dimensional arrays in `X` or `layers` violates the AnnData specification, and cannot be written to disk.

adata.layers["L"] = np.zeros((3, 4, 5))     # also OK
# UserWarning: Layer 'L' must be 2-dimensional, but got an array with shape (3, 4, 5) (ndim=3). Storing higher-dimensional arrays in `X` or `layers` violates the AnnData specification, and cannot be written to disk.

adata.write_h5ad("out.h5ad")
# ValueError: X must be 2-dimensional, but got an array with shape (3, 4, 5) (ndim=3). Storing higher-dimensional arrays in `X` or `layers` violates the AnnData specification, and cannot be written to disk.

ad.read_h5ad("legacy_non_conforming.h5ad")
# UserWarning: X must be 2-dimensional, but got an array with shape ...
# -> still returns the AnnData

Same applies to layers["L"] (error/warning message says Layer 'L' instead of X), and to the zarr IO path.

@codecov

codecov Bot commented May 10, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 94.91525% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.69%. Comparing base (159f859) to head (cc29324).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/anndata/_io/zarr.py 50.00% 2 Missing ⚠️
src/anndata/_core/storage.py 95.23% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2432      +/-   ##
==========================================
+ Coverage   85.62%   85.69%   +0.06%     
==========================================
  Files          49       49              
  Lines        7680     7730      +50     
==========================================
+ Hits         6576     6624      +48     
- Misses       1104     1106       +2     
Files with missing lines Coverage Δ
src/anndata/_core/aligned_mapping.py 94.49% <100.00%> (+0.30%) ⬆️
src/anndata/_core/anndata.py 86.90% <100.00%> (+0.08%) ⬆️
src/anndata/_io/h5ad.py 93.36% <100.00%> (+0.12%) ⬆️
src/anndata/_io/specs/methods.py 91.39% <100.00%> (+0.02%) ⬆️
src/anndata/_core/storage.py 95.31% <95.23%> (-0.04%) ⬇️
src/anndata/_io/zarr.py 80.80% <50.00%> (-0.64%) ⬇️

@eroell eroell marked this pull request as ready for review May 11, 2026 12:57
@flying-sheep flying-sheep added this to the 0.12.15 milestone May 18, 2026
@ilan-gold ilan-gold modified the milestones: 0.12.15, 0.12.17 May 18, 2026
shape = getattr(value, "shape", None)
if shape is None:
return None
ndim = len(shape)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also means that an array of shape (m, n, 1) can't be written to disk anymore, before this was ok.

Do you want to allow this or not?

@eroell

eroell commented May 19, 2026

Copy link
Copy Markdown
Contributor Author

Seems I can't request reviews, but I'd be interested in your comments at this stage :)

Comment thread src/anndata/_io/specs/methods.py Outdated
# Older / non-conforming files may contain higher-dimensional `X` or
# `layers`. The on-disk spec forbids that; surface it as a warning so
# the user knows, but still construct the AnnData with what's there.
_warn_if_x_or_layers_3d_kwargs(d)

@ilan-gold ilan-gold May 19, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this warning into the construction of the AnnData object instead of reading (also in the other spots) so that people who do this in-memory will know their data is technically unwritable

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted to this, I updated the first PR comment accordingly to this behaviour. Assignment to .X and .layers[<key>] raise this warning now, too.

@flying-sheep flying-sheep left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not much to add to what Ilan said, thank you this looks clean!

Comment on lines +606 to 609
if hasattr(value, "shape") and value.shape[:2] != self.shape:
msg = "Automatic reshaping when setting X will be removed in the future."
warn(msg, FutureWarning)
value = value.reshape(self.shape)

@flying-sheep flying-sheep May 19, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the error thrown by value.reshape(self.shape) is probably ugly/misleading if value.ndim > self.ndim, no? Please check and if the error is confusing, and manually throw a clearer one in that case.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted to this. The reshaping behaviour from before is maintained:

adata = ad.AnnData(X=np.zeros((2, 4)))
adata.X = np.zeros((4, 2))
# FutureWarning: Automatic reshaping when setting X will be removed in the future.

while higher-than-2-dim is more strict about it, and throws a clearer error

adata = ad.AnnData(X=np.zeros((2, 4)))
adata.X = np.zeros((4, 2, 1))
# ValueError: Cannot set `X` from an array of shape (4, 2, 1): its leading two dimensions (4, 2) do not match the AnnData shape (2, 4). Automatic reshaping is only supported for 2-D inputs.

Comment thread tests/test_x_layers_2d.py Outdated
Comment on lines +27 to +34
DISK_FORMATS = [
pytest.param("h5ad", id="h5ad"),
pytest.param("zarr", id="zarr"),
]
WHICH_ATTRS = [
pytest.param("X", id="X"),
pytest.param("layers", id="layers"),
]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these can also just be (parametrized) fixtures

@eroell eroell requested review from flying-sheep and ilan-gold June 4, 2026 10:38

@ilan-gold ilan-gold left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a release note and semantic commit title (see failing CI) :)


def __setitem__(self, key: str | None, value: Value) -> None:
super().__setitem__(key, value)
if key in self._data:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this check? Could you add a comment as to why?

def __bool__(self) -> bool:
return not self.keys() <= {None}

def _warn_if_spec_violation(self, key: str | None, val: Value) -> None:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's weird to make this a general method when it only does one specific thing. There are lots of ways to violate the spec

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it makes sense to move this to LayersBase and rename it and _spec_violation_message to be a bit more clear

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

layers can be 3D on-disk

3 participants