Open
Description
sum(f,A)
performs significantly worse than sum(f.(A))
for integer inputs to certain transcendental functions on x86 (maybe specific to AMD?)
julia> versioninfo()
Julia Version 1.11.0-DEV.237
Commit 958da95647 (2023-08-07 21:48 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 32 × AMD Ryzen 9 3950X 16-Core Processor
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, znver2)
Threads: 47 on 32 virtual cores
Environment:
LD_PRELOAD = /lib/x86_64-linux-gnu/libc_malloc_debug.so.0
JULIA_NUM_THREADS = 32
JULIA_EDITOR = vim
julia> a=collect(1:1000000);
julia> @btime sum(sin.(a))
8.574 ms (4 allocations: 7.63 MiB)
-0.11710952409815278
julia> @btime sum(sin,a)
12.908 ms (1 allocation: 16 bytes)
-0.11710952409817987
julia> @btime sum(log.(a))
8.066 ms (4 allocations: 7.63 MiB)
1.2815518384658169e7
julia> @btime sum(log,a)
6.302 ms (1 allocation: 16 bytes)
1.281551838465817e7
Different story on Apple Silicon
julia> versioninfo()
Julia Version 1.11.0-DEV.237
Commit 958da95647 (2023-08-07 21:48 UTC)
Platform Info:
OS: macOS (arm64-apple-darwin23.0.0)
CPU: 8 × Apple M1
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
Threads: 1 on 4 virtual cores
Environment:
JULIA_EDITOR = vim
julia> a=collect(1:1000000);
julia> @btime sum(sin.(a))
6.621 ms (4 allocations: 7.63 MiB)
-0.11710952409819408
julia> @btime sum(sin,a)
5.972 ms (1 allocation: 16 bytes)
-0.11710952409817987