Description
I was investigating the cause of latency in ClimaAtmos by looking at config/model_configs/diagnostic_edmfx_test_box.yml
.
The cache for this configuration takes 70 seconds to compile on my computer. I looked into this and found that the implicit cache takes 16 seconds. Of these 16 seconds, 10 are to compile a single function compute_kinetic
.
So, I made a ClimaCore
reproducer:
using ClimaCore.CommonSpaces
import ClimaCore
import ClimaCore: Fields, Geometry, Operators, Spaces
import LinearAlgebra: dot
space = ExtrudedCubedSphereSpace(; z_elem = 10, z_min = 0, z_max = 1, radius = 10, h_elem = 10, n_quad_points = 4, staggering = CellCenter(), )
ᶜuₕ = similar(zeros(space), Geometry.Covariant12Vector{Float64})
ᶠu₃ = similar(zeros(Spaces.face_space(space)), Geometry.Covariant3Vector{Float64})
ᶠu³ = similar(zeros(Spaces.face_space(space)), Geometry.Contravariant3Vector{Float64})
ᶠuₕ³ = similar(zeros(Spaces.face_space(space)), Geometry.Contravariant3Vector{Float64})
ᶜu = similar(zeros(space), Geometry.Covariant123Vector{Float64})
fill!(parent(ᶜuₕ), 0)
fill!(parent(ᶠu₃), 0)
fill!(parent(ᶠuₕ³), 0)
fill!(parent(ᶜu), 0)
fill!(parent(ᶠu³), 0)
ᶜK = zeros(space)
# Warm InterpolateF2C up
_ = @. Operators.InterpolateF2C()(Geometry.Covariant123Vector(ᶠu₃))
function mytest(ᶜu, ᶠuₕ³, ᶠu³, ᶜuₕ)
@time @. ᶜu = Geometry.Covariant123Vector(ᶜuₕ) + Operators.InterpolateF2C()(Geometry.Covariant123Vector(ᶠu₃))
@time @. ᶠu³ = ᶠuₕ³ + Geometry.Contravariant3Vector(ᶠu₃)
@time @. ᶜK = 1 / 2 * (
dot(Geometry.Covariant123Vector(ᶜuₕ), Geometry.Contravariant123Vector(ᶜuₕ)) +
Operators.InterpolateF2C()(dot(Geometry.Covariant123Vector(ᶠu₃), Geometry.Contravariant123Vector(ᶠu₃))) +
2 * dot(Geometry.Contravariant123Vector(ᶜuₕ), Operators.InterpolateF2C()(Geometry.Covariant123Vector(ᶠu₃)))
)
end
@time mytest(ᶜu, ᶠuₕ³, ᶠu³, ᶜuₕ)
This results in:
1.578459 seconds (18.89 M allocations: 908.468 MiB, 33.42% gc time, 99.98% compilation time)
1.419649 seconds (11.95 M allocations: 617.694 MiB, 21.31% gc time, 99.97% compilation time)
6.432258 seconds (86.22 M allocations: 3.801 GiB, 27.18% gc time, 99.96% compilation time)
This
ᶜu = Geometry.Covariant123Vector(ᶜuₕ) + Operators.InterpolateF2C()(Geometry.Covariant123Vector(ᶠu₃))
takes 1.5 seconds to compile and leads to 1 GB of inference allocations. Note that I have already called the interpolation routine in the line before, so the second term should be already compiled. If I substitute the interpolate call with the result of the previous line, I get
1.030550 seconds (11.70 M allocations: 605.150 MiB, 29.07% gc time, 99.95% compilation time)
Which tells me that having to infer the additional operator cost 50 % more time and inference allocations.
Compiling the full expression for the kinetic energy takes 6.5 seconds and has almost 4 GB of inference allocation.
This seems excessive for these relatively simple operations.
Note also this difference:
stored = @. Operators.InterpolateF2C()(Geometry.Covariant123Vector(ᶠu₃))
stored2 = @. Geometry.Covariant123Vector(ᶜuₕ)
@time @. ᶜu = stored2 + stored
The result is
0.180751 seconds (1.30 M allocations: 71.706 MiB, 99.73% compilation time)
But
stored = @. Operators.InterpolateF2C()(Geometry.Covariant123Vector(ᶠu₃))
stored2 = @. Geometry.Covariant123Vector(ᶜuₕ)
@time @. ᶜu = Geometry.Covariant123Vector(ᶜuₕ) + stored
is
0.915434 seconds (10.90 M allocations: 566.447 MiB, 26.39% gc time, 99.95% compilation time)