Skip to content

Add support for DelayedArray reading#387

Open
lazappi wants to merge 31 commits intodevelfrom
feature/delayed-arrray
Open

Add support for DelayedArray reading#387
lazappi wants to merge 31 commits intodevelfrom
feature/delayed-arrray

Conversation

@lazappi
Copy link
Collaborator

@lazappi lazappi commented Dec 1, 2025

Related to: Fixes #128

Description

Add support for backed objects using DelayedArray matrices:

  • Add a HDF5File class to store file handles and paths with convenient methods for opening/closing them
  • Use the {HDF5Array} package for reading dense and sparse arrays from HDF5 files
  • Add a backed argument to HDF5AnnData
  • Add the backed option to InMemoryAnnData
  • Conversion of backed AnnData objects to SingleCellExperiment/Seurat

Other features that could be added:

  • Add support for writing DelayedArray matrices
  • Add SparseArray as a format for in-memory matrices

Checklist

Before review

  • Update and regenerate man pages
  • Add/update tests
  • Add/update examples in vignettes
  • Pass CI checks

Before merge

  • Update NEWS
  • Bump devel version

@lazappi
Copy link
Collaborator Author

lazappi commented Dec 2, 2025

/style

@lazappi
Copy link
Collaborator Author

lazappi commented Dec 3, 2025

/style

@lazappi lazappi marked this pull request as ready for review December 3, 2025 11:55
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds DelayedArray support for backed HDF5 objects in the anndataR package. The implementation introduces a new HDF5File R6 class to manage HDF5 file handles and adds a backed argument throughout the codebase to enable disk-backed matrix operations using the HDF5Array package.

Key Changes

  • New HDF5File class: Manages HDF5 file handles with automatic open/close functionality using withr's deferred execution
  • Backed mode support: Adds backed parameter to read_h5ad(), HDF5AnnData$new(), and as_HDF5AnnData() to return DelayedArray matrices instead of loading data into memory
  • Refactored file handling: All HDF5 read/write operations now use the HDF5File class instead of raw file handles, improving resource management and enabling backed array support

Reviewed changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
R/HDF5File.R New R6 class for managing HDF5 file handles with deferred open/close operations
R/HDF5AnnData.R Updated to use HDF5File class, added backed mode support, improved mode handling (read-only vs read-write)
R/read_h5ad_helpers.R Added backed parameter to reading functions, implemented DelayedArray returns via HDF5Array package
R/write_h5ad_helpers.R Refactored to use HDF5File objects instead of raw handles
R/write_hdf5_helpers.R Refactored to use HDF5File objects, added helper functions for file operations
R/read_h5ad.R Added backed parameter and explicit file open/close for conversions
R/as_SingleCellExperiment.R Added .as_SCE_process_pairs_mapping() to convert DelayedArrays for SelfHits compatibility
R/as_Seurat.R Updated to handle DelayedArray conversions for graphs
R/utils.R Updated to_R_matrix() to handle DelayedArray inputs with allow_backed parameter
tests/testthat/test-*.R Updated all tests to use HDF5File objects, added comprehensive tests for backed mode
man/*.Rd Updated documentation for new backed parameter
DESCRIPTION Added DelayedArray and HDF5Array to Suggests

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@LouiseDck
Copy link
Collaborator

LouiseDck commented Jan 27, 2026

I tried this out and for reading backed H5AD files it works beautifully! Even converting them to SCE works nicely, but I couldn't get the reverse working. Is there something I'm missing or is this not implemented yet?

library(anndataR)
library(SingleCellExperiment)

test_backed <- read_h5ad(path = "output.h5ad", as = "HDF5AnnData", backed = TRUE)
sce_backed <- test_backed$as_SingleCellExperiment()
class(assay(sce_backed, "counts"))
#> [1] "DelayedMatrix"
#> attr(,"package")
#> [1] "DelayedArray"

back_ad <- as_AnnData(sce_backed)
#> Error in `private$.validate_aligned_mapping()`:
#> ! Unexpected shape for layers[['counts']]
#> ℹ Expected [50, 100], got [100, 50]
back_ad <- as_AnnData(sce_backed, output_class = "HDF5AnnData", file = "output.h5ad", backed = FALSE)
#> Warning: An non-empty file is opened in read/write mode. Use with caution, as this can
#> lead to data corruption.
#> Error in `private$.validate_aligned_mapping()`:
#> ! Unexpected shape for layers[['counts']]
#> ℹ Expected [50, 100], got [100, 50]

Created on 2026-01-27 with reprex v2.1.1

@lazappi
Copy link
Collaborator Author

lazappi commented Feb 2, 2026

Nope, this PR only implements reading via DelayedArray. Writing is more complicated and I didn't have time to look into it properly.

@lazappi lazappi changed the title Add DelayedArray support Add supoort for DelayedArray reading Feb 2, 2026
@rcannood rcannood changed the title Add supoort for DelayedArray reading Add support for DelayedArray reading Feb 2, 2026
Copy link
Member

@rcannood rcannood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but the CI needs to be fixed.

@rcannood
Copy link
Member

rcannood commented Feb 11, 2026

@lazappi Could you take a look at merging devel into this PR? 🙇

@lazappi
Copy link
Collaborator Author

lazappi commented Feb 11, 2026

@rcannood I've fixed the conflicts but it's possible there are some bugs, some of changes were in the same things and a bit tricky to resolve

@lazappi
Copy link
Collaborator Author

lazappi commented Feb 12, 2026

/style

@rcannood rcannood force-pushed the feature/delayed-arrray branch from bea756c to ec9ddfb Compare February 24, 2026 10:51
Copy link
Member

@rcannood rcannood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the backed option to InMemoryAnnData

Just so I understand, what do you mean by this?

Add a backed argument to HDF5AnnData

I wonder whether "backed" is the right name for this. Maybe something about the delayedarray or the data being loaded lazily. The HDF5AnnData backend is already backed, no?

Let me think about some of the changes in this PR for a sec...

@lazappi
Copy link
Collaborator Author

lazappi commented Feb 25, 2026

Add the backed option to InMemoryAnnData

Just so I understand, what do you mean by this?

Add a backed argument to HDF5AnnData

I wonder whether "backed" is the right name for this. Maybe something about the delayedarray or the data being loaded lazily. The HDF5AnnData backend is already backed, no?

Let me think about some of the changes in this PR for a sec...

HDF5AnnData is backed in the sense that it is connected to a file but when you request a matrix it is still pulled into memory, with backed = TRUE you get a DelayedArray instead. I didn't really think about the name so happy to consider alternatives but I think that's what Python anndata uses.

@rcannood
Copy link
Member

rcannood commented Feb 25, 2026

Alright, thanks for the clarification!

We need to briefly think about where it makes sense to use DelayedArrays. For instance, if we convert an HDF5AnnData (with delayedarrays) to InMemoryAnnData, I feel like it'd make sense to convert all of the delayedarrays to in memory matrices, since at that stage it wouldn't make sense for the matrix to still be backed by an hdf5 file.

@github-actions
Copy link

🐰 Bencher Report

Branchfeature/delayed-arrray
Testbedubuntu-latest

⚠️ WARNING: Truncated view!

The full continuous benchmarking report exceeds the maximum length allowed on this platform.

🐰 View full continuous benchmarking report in Bencher

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add DelayedArray support

4 participants