Open
Description
The following kernel fails on gpus in ClimaAtmos:
@. u³⁰_halflevel = divide_by_ρa(
ρ_level * u³_halflevel - mapreduce(*, +, ρaʲs_level, u³ʲs_halflevel),
ρ_level,
ρ_level * u³_halflevel,
ρ_level,
turbconv_model,
)
Seen in this build: https://buildkite.com/clima/climaatmos-ci/builds/17608#018e5841-01f3-4410-a32d-a1276535c614
We could technically get this running by either
- Removing
always_inline
from the finite difference kernels in ClimaCore, or - Figuring out a way to get mapreduce to work on the gpu with
always_inline
Since inlining has proven to be a performance benefit (and it may be necessary for broadcast fusion to benefit), we should opt for solution 2.