Add CSD

`CSD` stands for Compressed Sparse Dimensions. It is a multidimensional generalization of `CSR`, `CSC` and `COO`.

`CSD` has a few "compressed" dimensions and a few "uncompressed" dimensions. The "compressed" dimensions are linearized and go into `indptr` similar to CSR/CSC. The "uncompressed" dimensions (corresponding to `indices` in CSR/CSC, but there can be multiple) will be stored in `coords`.

The plan is to:

- Write `CSD`.
- Test it.
- Make `COO` and others inherit from it.

Task checklist:

Implement CSD (parametrize tests over compressed axes).:

- [ ] Element-wise
- [ ] Reductions
- [ ] Slicing
- [ ] Change compressed dimensions

Post-Implementation (parametrize tests over types):

- [ ] Make COO inherit from CSD
- [ ] Make CSR inherit from CSD
- [ ] Make CSC inherit from CSD

#  Compressed Sparse Dimensions — Format Specification
Compressed Sparse Dimensions (CSD) is an multidimensional sparse array storage format.

Let’s say that the number of dimensions is `ndim` and `compressed_axes` represents the axes/dimensions that are compressed, as a tuple. 

It contains three main arrays, one of them two-dimensional:
* `data` (one dimensional, `(nnz,)`)
* `coords`  (two-dimensional, `(ndim - len(compressed_axes), nnz)`)
* `indptr` (one dimensional `(reduce(operator.mul, (shape[a] for a in compressed_axes), 1) + 1,)`)
## `data`/`coords`/`indptr` from an `ndarray`
Relating to a normal NumPy array, we can specify `data` and `coords` directly. Let’s say `arr` is the Numpy array to convert.

```python
non_compressed_axes = tuple(a for a in range(ndim) if a not in set(compressed_axis))
coo_coords = np.where(arr)
data = arr[coo_coords]
axes_order = compressed_axes + non_compressed_axes
if axes_order != tuple(range(ndim)):
    lin_coo_coords = linear_loc(coo_coords[np.asarray(axes_order)], tuple(self.shape[a] for a in axes_order))
    idx = np.argsort(lin_coo_coords)
    data = data[idx]
    coords = coords[:, idx]

coords = coo_coords[np.asarray(non_compressed_axes)].astype(np.min_scalar_type(max(non_compressed_axes)))
compressed_linear_shape = reduce(operator.mul, (shape[a] for a in compressed_axes), 1)
compressed_linearized_coords = linear_loc(coo_coords[np.array(compressed_axes)], compressed_axes)
indptr = np.empty(compressed_linear_shape + 1, dtype=np.min_scalar_type(compressed_linear_shape))
indptr[0] = 0
np.cumsum(np.bincount(compressed_linearized_coords, minlength=compressed_linear_shape), out=indptr[1:])
```
## Interpreting the arrays
For CSR, the interpretation is simple: `indices` represents the `columns`, `data` the respective data. `indptr[i]:indptr[i+1]` represents the slice corresponding to row `i` for which `columns` and `data`  are valid.

We can extend this definition to CSD as well.  `coords` represent the non-compressed axes, and `data` the respective data. `indptr` can be interpreted the same way, replacing the word “row” with “linearized compressed axes index”, but it turns out that there is a simpler way to work with things.
### Alternative representation for `indptr`
Consider the following two arrays produced from `indptr`:
```python
starts = indptr[:-1].reshape((shape[a] for a in compressed_axes))
stops indptr[1:].reshape((shape[a] for a in compressed_axes))
```
Note that these are views, no copies are made. This gives a simpler and more intuitive interpretation for `indptr`. For every possible index corresponding to the compressed axes: Indexing into `starts` gives the starts for that index and `stops` gives the stops.
## The “multiple-COO” view
For every value in `starts`/`stops`, `coords[:, start_val:stop_val]` and `data[start_val:stop_val]` can be seen as a mini-COO array, and can thus have any COO-related algorithms applied to it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CSD #125

Compressed Sparse Dimensions — Format Specification

`data`/`coords`/`indptr` from an `ndarray`

Interpreting the arrays

Alternative representation for `indptr`

The “multiple-COO” view

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add CSD #125

Description

Compressed Sparse Dimensions — Format Specification

data/coords/indptr from an ndarray

Interpreting the arrays

Alternative representation for indptr

The “multiple-COO” view

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`data`/`coords`/`indptr` from an `ndarray`

Alternative representation for `indptr`