Skip to content

x86 mapreduce performance anomaly #50827

Open
@chrstphrbrns

Description

@chrstphrbrns

sum(f,A) performs significantly worse than sum(f.(A)) for integer inputs to certain transcendental functions on x86 (maybe specific to AMD?)

julia> versioninfo()
Julia Version 1.11.0-DEV.237
Commit 958da95647 (2023-08-07 21:48 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 32 × AMD Ryzen 9 3950X 16-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver2)
  Threads: 47 on 32 virtual cores
Environment:
  LD_PRELOAD = /lib/x86_64-linux-gnu/libc_malloc_debug.so.0
  JULIA_NUM_THREADS = 32
  JULIA_EDITOR = vim

julia> a=collect(1:1000000);

julia> @btime sum(sin.(a))
  8.574 ms (4 allocations: 7.63 MiB)
-0.11710952409815278

julia> @btime sum(sin,a)
  12.908 ms (1 allocation: 16 bytes)
-0.11710952409817987

julia> @btime sum(log.(a))
  8.066 ms (4 allocations: 7.63 MiB)
1.2815518384658169e7

julia> @btime sum(log,a)
  6.302 ms (1 allocation: 16 bytes)
1.281551838465817e7

Different story on Apple Silicon

julia> versioninfo()
Julia Version 1.11.0-DEV.237
Commit 958da95647 (2023-08-07 21:48 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin23.0.0)
  CPU: 8 × Apple M1
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
  Threads: 1 on 4 virtual cores
Environment:
  JULIA_EDITOR = vim

julia> a=collect(1:1000000);

julia> @btime sum(sin.(a))
  6.621 ms (4 allocations: 7.63 MiB)
-0.11710952409819408

julia> @btime sum(sin,a)
  5.972 ms (1 allocation: 16 bytes)
-0.11710952409817987

Metadata

Metadata

Assignees

No one assigned

    Labels

    foldsum, maximum, reduce, foldl, etc.mathsMathematical functionsperformanceMust go faster

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions