X shape check slices to first 2 shape values#2432
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2432 +/- ##
==========================================
+ Coverage 85.62% 85.69% +0.06%
==========================================
Files 49 49
Lines 7680 7730 +50
==========================================
+ Hits 6576 6624 +48
- Misses 1104 1106 +2
|
| shape = getattr(value, "shape", None) | ||
| if shape is None: | ||
| return None | ||
| ndim = len(shape) |
There was a problem hiding this comment.
this also means that an array of shape (m, n, 1) can't be written to disk anymore, before this was ok.
Do you want to allow this or not?
|
Seems I can't request reviews, but I'd be interested in your comments at this stage :) |
| # Older / non-conforming files may contain higher-dimensional `X` or | ||
| # `layers`. The on-disk spec forbids that; surface it as a warning so | ||
| # the user knows, but still construct the AnnData with what's there. | ||
| _warn_if_x_or_layers_3d_kwargs(d) |
There was a problem hiding this comment.
I would move this warning into the construction of the AnnData object instead of reading (also in the other spots) so that people who do this in-memory will know their data is technically unwritable
There was a problem hiding this comment.
Adjusted to this, I updated the first PR comment accordingly to this behaviour. Assignment to .X and .layers[<key>] raise this warning now, too.
flying-sheep
left a comment
There was a problem hiding this comment.
Not much to add to what Ilan said, thank you this looks clean!
| if hasattr(value, "shape") and value.shape[:2] != self.shape: | ||
| msg = "Automatic reshaping when setting X will be removed in the future." | ||
| warn(msg, FutureWarning) | ||
| value = value.reshape(self.shape) |
There was a problem hiding this comment.
I feel like the error thrown by value.reshape(self.shape) is probably ugly/misleading if value.ndim > self.ndim, no? Please check and if the error is confusing, and manually throw a clearer one in that case.
There was a problem hiding this comment.
Adjusted to this. The reshaping behaviour from before is maintained:
adata = ad.AnnData(X=np.zeros((2, 4)))
adata.X = np.zeros((4, 2))
# FutureWarning: Automatic reshaping when setting X will be removed in the future.while higher-than-2-dim is more strict about it, and throws a clearer error
adata = ad.AnnData(X=np.zeros((2, 4)))
adata.X = np.zeros((4, 2, 1))
# ValueError: Cannot set `X` from an array of shape (4, 2, 1): its leading two dimensions (4, 2) do not match the AnnData shape (2, 4). Automatic reshaping is only supported for 2-D inputs.| DISK_FORMATS = [ | ||
| pytest.param("h5ad", id="h5ad"), | ||
| pytest.param("zarr", id="zarr"), | ||
| ] | ||
| WHICH_ATTRS = [ | ||
| pytest.param("X", id="X"), | ||
| pytest.param("layers", id="layers"), | ||
| ] |
There was a problem hiding this comment.
these can also just be (parametrized) fixtures
ilan-gold
left a comment
There was a problem hiding this comment.
Please add a release note and semantic commit title (see failing CI) :)
|
|
||
| def __setitem__(self, key: str | None, value: Value) -> None: | ||
| super().__setitem__(key, value) | ||
| if key in self._data: |
There was a problem hiding this comment.
Why do we need this check? Could you add a comment as to why?
| def __bool__(self) -> bool: | ||
| return not self.keys() <= {None} | ||
|
|
||
| def _warn_if_spec_violation(self, key: str | None, val: Value) -> None: |
There was a problem hiding this comment.
I think it's weird to make this a general method when it only does one specific thing. There are lots of ways to violate the spec
There was a problem hiding this comment.
Maybe it makes sense to move this to LayersBase and rename it and _spec_violation_message to be a bit more clear
layerscan be 3D on-disk #2430Before
higher-than-2D
Xwas rejected at construction; you could not hold a higher-dimensionalXin memory:higher-than-2D
layersslipped through the in-memory shape check (only axes 0 and 1 are validated) and the writer happily wrote them to disk, silently violating the spec.After
In-memory: higher-than-2D
Xandlayerswarns but still succeeds (the shape check usesX.shape[:2]).Writing a non-2D
X/ layer now hard-fails.Reading a non-conforming file warns but still succeeds:
Same applies to
layers["L"](error/warning message saysLayer 'L'instead ofX), and to thezarrIO path.