Description
Motivation
The sparse
(also called PyData/Sparse) development team have been working on integration efforts with the ecosystem, most notably with SciPy, scikit-learn and others, with CuPy, PyTorch, JAX and TensorFlow also on the radar. One of the challenges we were facing was the lack of (possibly zero-copy) interchange between the different sparse array implementations. We believe this may be a pain point for many sparse array implementations moving forward.
This mirrors an issue seen for dense arrays previously, where the DLPack protocol was the one of the first things to be standardised. We're hoping to achieve community consensus for a similar problem.
Luckily, all sparse array formats (with the possible exception of DOK) are usually collections of dense arrays underneath. In addition, this problem has been solved for on-disk arrays before by the binsparse specification. @willow-ahrens is a co-author of that spec, and is also a collaborator for the sparse
work.
Proposal
We propose introducing two new methods to the array-API compliant sparse array objects (such as those in sparse
), which are described below.
__binsparse_descriptor__
The returned item is a dict
equivalent to a parsed JSON binsparse
descriptor of an array.
__binsparse__
The second item is a dict[str, Array]
of __dlpack__
compatible arrays, which are the constituent arrays of the sparse array. The key represents the equivalent key in the descriptor.
Introduction of from_binsparse
function.
If a library supports sparse arrays, its from_binsparse
method should support accepting (when possible, zero-copy) versions of objects that follow this __binsparse__
protocol, and have an equivalent sparse format within the library.
Psuedocode implementation
Here's a psuedocode example using two libraries, xp1
and xp2
, both supporting sparse arrays:
# In library code:
xp2_sparray = xp2.from_binsparse(xp1_sparray, ...)
# This psuedocode impl is common between `xp1` and `xp2`
def from_binsparse(x: object, /, *, device: device | None = None, copy: bool | None = None) -> array:
binsparse_descr = getattr(x, "__binsparse_descriptor__", None)
binsparse_impl = getattr(x, "__binsparse__", None)
if binsparse_impl is None or binsparse_descr is None:
raise TypeError(...)
binsparse_descriptor = binsparse_descr()
# Will raise an error if the format/descriptor is unsupported.
sparse_type = _type_from_binsparse_descriptor(binsparse_descriptor)
constituent_arrays = binsparse_impl()
my_constituent_arrays = {
k: from_dlpack(arr, device=device, copy=copy) for k, arr in constituent_arrays.items()
}
return sparse_type.from_strided_arrays(my_constituent_arrays, shape=...)
Parallel implementation in sparse
: pydata/sparse#764
Parallel implementation in SciPy: scipy/scipy#22553
Alternative solutions
There are formats for on-disk sparse-array interchange [1] [2]; but none for in-memory interchange. binsparse
is the one that comes closest to offering in-memory interchange.
Pinging possibly interested parties:
- @mtsokol @ivirshup (for
scipy.sparse
) - @willow-ahrens @mtsokol (from
binsparse
andfinch-tensor
/sparse
) - @leofang (for
cupyx.sparse
) - @pearu (for
torch.sparse
) - @jakevdp (for JAX/TensorFlow)
Updated on 2024.10.09 as agreed in #840 (comment).
Metadata
Metadata
Assignees
Labels
Type
Projects
Status