Skip to content

Implement a columnwise shmem operator #2328

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

charleskawczynski
Copy link
Member

This PR abstracts all but the physics from CliMA/ClimaAtmos.jl#3655, and implements columnwise!.

This operator can be used to assign a large number of broadcast operations, all within the same kernel.

The idea is that we can use this function

  • once in implicit_tendency! (which will set all of the tendencies in a single kernel)
  • 2 or 3 times in remaining_tendency! (before / after dss, before / after horizontal terms)

Some preliminary results from the dry baro wave:

unfused & multiple broadcasts (set-precomputed + implicit): 4.31997 ms
single fused broadcast without shmem: 7.18 ms
single fused broadcast with shmem for state: 3.47 ms
single fused broadcast with shmem for state + LG: 4.73 ms
single fused broadcast with shmem for state + needed parts of LG: 3.93 ms

For ldiv! and wfact!, I think we may need to rethink their implementations.

The only optimizations I've done in this operator is shared memory. There is probably a lot of room in tuning and improving memory access. A big benefit of this approach is that this will drastically reduce the number of kernel launches, and sequential memory reads.

This approach also helps push us towards CliMA/ClimaAtmos.jl#3594.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant