Description
Based on scipy/scipy#8162 (comment) I'm opening this issue in order to debate the merits and demerits of writing multiple implementations for every operation.
I'm specifically referring to the third bullet point in that comment, quoted below:
It'll remain important to give users a choice of the sparse storage format for efficiency reasons, but we'll want a design that doesn't guarantee that all operations on a given format will produce results of the same format. This is especially relevant when considering operations that produce results with different dimensions than the input, like slicing or extracting a diagonal.
I'm taking this to mean that we don't want to implement every single operation for every single format, we can convert to and return the most appropriate type.
I believe this is wise advice, on the whole. For example:
- Writing/modification is best done in DOK. We could require that users do
arr.asformat('dok')
before writing to an array. - Elemwise is best done in CSD or COO (still have some Numba kinks to figure out for CSD). We could automatically convert to and return objects of these formats.
- Reductions are best done in CSD (which basically makes them equivalent to
ufunc.reduceat
), but will usually return COO. - Slicing is extremely inefficient in DOK, but can be done after conversion to COO, for example. CSD slicing is also easy.
triu
,tril
,nonzero
are best in COO, and essentially require conversion before they're actually used.
On the whole, I'd like for these conversions to be automatic. Writing is the only thing I can think of that will require explicit conversion to DOK.
Edit: Link to parallel SciPy-dev thread: https://mail.python.org/pipermail/scipy-dev/2018-May/022834.html