Open
Description
Preface
I do not think that I am the best person to champion this effort, as I am far from the most informed person here on Lazy arrays. I'm probably missing important things, but I would like to start this discussion as I think that it is an important topic.
The problem
The problem of mixing computation requiring data-dependent properties with lazy execution is discussed in detail elsewhere:
- https://data-apis.org/array-api/draft/design_topics/lazy_eager.html
- https://data-apis.org/array-api/draft/design_topics/data_dependent_output_shapes.html#data-dependent-output-shapes
- Handling materialization of lazy arrays #748
- Calculate number of unique values in a lazy array #834
A possible solution
Add the function materialize(x: Array)
to the top level of the API. Behaviour:
- for eagerly-executed arrays, this would be a no-op
- for lazy arrays, this would force computation such that the data is available in the returned array (which is of the same array type?)
- for "100% lazy" arrays (Handling materialization of lazy arrays #748 (comment)), this would raise an exception
Prior art
- Dask:
- https://docs.dask.org/en/stable/generated/dask.array.Array.compute_chunk_sizes.html computes chunk sizes / shape, working in-place and leaving the array as a Dask array.
- https://docs.dask.org/en/stable/generated/dask.array.Array.compute.html materialises the in-memory equivalent of the dask array, returning e.g. a NumPy array.
- JAX:
- ?
- others?
Concerns
- I think the main concern is whether eager-only libraries will agree to adding a no-op into the API. There is precedent for that type of change (e.g.
device
kwargs in NumPy), but perhaps this is too obtrusive? - As far as I can tell there isn't a standard way to do this across lazy libraries. Does JAX just do this automatically when it would be needed? Do other libraries have this capability?
Alternatives
- Do nothing. The easy option, but it leaves us unable to support lazy arrays when data-dependent properties are used in computation (maybe that is okay?)
- An alternative API. Maybe spelled like
compute*
or a method on the array object. Maybe with options for partial materialization (if that's a thing)?