Skip to content

unify reduce/reducedim empty case behaviors #55628

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

mbauman
Copy link
Member

@mbauman mbauman commented Aug 29, 2024

This PR implements a very simple rule for empty dimensional reductions: they should behave exactly the same as their whole-array counterparts. If the whole-array reduction errors, the dimensional flavor should also error. If the whole-array reduction gives a value, then the dimensional version should just return the appropriately-shaped array filled with it.

Previously, Julia tried to guess at initial values for empty dimensional reductions in a slightly different way than whole-array reductions. This change unifies those behaviors, such that mapreduce_empty is called for empty dimensional reductions, just like it is called for empty whole-array reductions.

Beyond the unification (which is valuable in its own right), this change enables some downstream simplifications to dimensional reductions and is likely to be the "most breaking" public behavior in a refactoring targeting more correctness improvements (#55318).

It's a little convoluted to understand the exact motivation, utility, and net effect of this change. Here are two case studies, demonstrating the behaviors prior to this change:

julia> minimum(zeros(0,42))
ERROR: ArgumentError: reducing over an empty collection is not allowed; consider supplying `init` to the reducer

julia> minimum(zeros(0,42), dims=1)
ERROR: ArgumentError: reducing over an empty collection is not allowed; consider supplying `init` to the reducer

julia> minimum(zeros(0,1), dims=2)
0×1 Matrix{Float64}
julia> sum(x->pi, zeros(0,1))
ERROR: ArgumentError: reducing over an empty collection is not allowed; consider supplying `init` to the reducer

julia> sum(x->pi, zeros(0,1), dims=1)
1×1 Matrix{Int64}:
 0

julia> sum(x->pi, zeros(0,1), dims=2)
0×1 Matrix{Int64}

This PR makes all six cases above errors. This is a useful and important change because it's precisely this guessing of eltype and initial array values that lead to the many correctness errors that #55318 fixes. It's also worth noting that the incremental widening approach in #55318 could support the empty-return case, but it would do so at the expense of type stability because the "reducing over an empty collection" error branch adds a possible Array{Union{}} return value.

@mbauman mbauman added breaking This change will break code fold sum, maximum, reduce, foldl, etc. labels Aug 29, 2024
@mbauman
Copy link
Member Author

mbauman commented Aug 29, 2024

Once CI passes (modulo a SparseArrays test), let's run Nanosoldier.

@mbauman mbauman added the needs pkgeval Tests for all registered packages should be run with this change label Aug 29, 2024
@mbauman
Copy link
Member Author

mbauman commented Aug 29, 2024

@nanosoldier runtests()

@nanosoldier
Copy link
Collaborator

The package evaluation job you requested has completed - possible new issues were detected.
The full report is available.

@mbauman
Copy link
Member Author

mbauman commented Sep 3, 2024

OK, that went better than I expected. There are three root failures as far as I can see:

@mbauman
Copy link
Member Author

mbauman commented Sep 3, 2024

OK, that's fascinating. All of those examples would be fixed by #52004, which was only blocked by precisely the same sparse array failures that this PR faces. I'm not a huge fan of #52004, but perhaps there's a middle ground by just preserving the eltype for minimum/maximum/extrema for the empty return array case.

@mbauman mbauman added the triage This should be discussed on a triage call label Sep 3, 2024
@JeffBezanson
Copy link
Member

Triage is ok with making all of these errors. The result for minimum(zeros(0,1), dims=2) is arguably correct, but that only applies to functions that pick an element from the array rather than applying a more general function, so I see we may not be able to accommodate that.

@LilithHafner LilithHafner removed the triage This should be discussed on a triage call label Sep 26, 2024
@mbauman mbauman force-pushed the mb/mapreducedim_empty branch 2 times, most recently from e92a570 to 53f3328 Compare April 21, 2025 14:44
@mbauman
Copy link
Member Author

mbauman commented Apr 21, 2025

OK, let's see what PkgEval sees here now.

@nanosoldier runtests()

@nanosoldier
Copy link
Collaborator

The package evaluation job you requested has completed - possible new issues were detected.
The full report is available.

Report summary

❗ Packages that crashed

1 packages crashed only on the current version.

  • An internal error was encountered: 1 packages

3 packages crashed on the previous version too.

✖ Packages that failed

31 packages failed only on the current version.

  • Package fails to precompile: 6 packages
  • Package has test failures: 4 packages
  • Package tests unexpectedly errored: 2 packages
  • Test duration exceeded the time limit: 19 packages

1438 packages failed on the previous version too.

✔ Packages that passed tests

21 packages passed tests only on the current version.

  • Other: 21 packages

5240 packages passed tests on the previous version too.

~ Packages that at least loaded

11 packages successfully loaded only on the current version.

  • Other: 11 packages

2835 packages successfully loaded on the previous version too.

➖ Packages that were skipped altogether

918 packages were skipped on the previous version too.

@mbauman
Copy link
Member Author

mbauman commented Apr 24, 2025

Well that new report looks awfully consistent with what we saw last fall — those three packages are still failing. Looks like we additionally snagged C3D this time, with an empty reduction error on (effectively) mapreduce(==, &, Float64[], Float64[]). That was actually an issue in the (now separated) #58185, and that PR additionally has commit 69e4dee which addresses (and tests) it.

I think the periodic graphs example is a perfect motivation. It calls both maximum(x; init) and maximum(y; dims) on sequential lines! Why should they require init for the former but not the latter? From PeriodicGraphs.jl/src/algorithms/dimensionality.jl#L310-L311:

    maxden = maximum(denominator.(_invmat); dims=2)
    invmat::Matrix{Int} = maximum(maxden; init=0) > 1 ? (maxden .* _invmat) : _invmat

@mbauman
Copy link
Member Author

mbauman commented Apr 24, 2025

Given that triage gave their 👍 and the limited ecosystem impact here I'm going to swap the labels here to "minor change".

But this isn't ready yet; we need to merge #58185 ✅ and JuliaSparse/SparseArrays.jl#615 first.

@mbauman mbauman added minor change Marginal behavior change acceptable for a minor release DO NOT MERGE Do not merge this PR! and removed breaking This change will break code needs pkgeval Tests for all registered packages should be run with this change labels Apr 24, 2025
mbauman added a commit to mbauman/GMT.jl that referenced this pull request Apr 29, 2025
There's a proposed "minor change" to Julia v1.13 that would make `extrema([], dims=2)` (and other such reductions) an error, akin to how `extrema([])` is an error when `init` is not specified.  This package was flagged as having (at least) one such usage here; it errored in [PkgEval](https://s3.amazonaws.com/julialang-reports/nanosoldier/pkgeval/by_hash/813bcf3_vs_c3e7b1b/report.html).  See JuliaLang/julia#55628 for more details.

This addresses that first such usage; there may be [other places](https://github.com/search?q=repo%3AGenericMappingTools%2FGMT.jl+%2F%28%3F%3Aextrema%7Cminimum%7Cmaximum%29.*dims%2F&type=code) that similarly need to change by either supplying `init` or skipping the empty case entirely.
mbauman added a commit to mbauman/PeriodicGraphs.jl that referenced this pull request Apr 29, 2025
There's a proposed "minor change" to Julia v1.13 that would make `maximum([], dims=2)` (and other such reductions) an error, akin to how `maximum([])` is an error when `init` is not specified.  This package was flagged as having (at least) one such usage here; it errored in [PkgEval](https://s3.amazonaws.com/julialang-reports/nanosoldier/pkgeval/by_hash/813bcf3_vs_c3e7b1b/report.html).  See JuliaLang/julia#55628 for more details.
mbauman added a commit to mbauman/OnlinePortfolioSelection.jl that referenced this pull request Apr 29, 2025
There's a proposed "minor change" to Julia v1.13 that would make `maximum([], dims=2)` (and other such reductions) an error, akin to how `maximum([])` is an error when `init` is not specified.  This package was flagged as having (at least) one such usage here; it errored in [PkgEval](https://s3.amazonaws.com/julialang-reports/nanosoldier/pkgeval/by_hash/813bcf3_vs_c3e7b1b/report.html).  See JuliaLang/julia#55628 for more details.
Liozou pushed a commit to Liozou/PeriodicGraphs.jl that referenced this pull request Apr 30, 2025
There's a proposed "minor change" to Julia v1.13 that would make `maximum([], dims=2)` (and other such reductions) an error, akin to how `maximum([])` is an error when `init` is not specified.  This package was flagged as having (at least) one such usage here; it errored in [PkgEval](https://s3.amazonaws.com/julialang-reports/nanosoldier/pkgeval/by_hash/813bcf3_vs_c3e7b1b/report.html).  See JuliaLang/julia#55628 for more details.
joa-quim pushed a commit to GenericMappingTools/GMT.jl that referenced this pull request May 1, 2025
There's a proposed "minor change" to Julia v1.13 that would make `extrema([], dims=2)` (and other such reductions) an error, akin to how `extrema([])` is an error when `init` is not specified.  This package was flagged as having (at least) one such usage here; it errored in [PkgEval](https://s3.amazonaws.com/julialang-reports/nanosoldier/pkgeval/by_hash/813bcf3_vs_c3e7b1b/report.html).  See JuliaLang/julia#55628 for more details.

This addresses that first such usage; there may be [other places](https://github.com/search?q=repo%3AGenericMappingTools%2FGMT.jl+%2F%28%3F%3Aextrema%7Cminimum%7Cmaximum%29.*dims%2F&type=code) that similarly need to change by either supplying `init` or skipping the empty case entirely.
mbauman added 2 commits May 9, 2025 14:38
Previously, Julia tried to guess at initial values for empty dimensional reductions in a slightly different way than whole-array reductions. This change unifies those behaviors, such that `mapreduce_empty` is called for empty dimensional reductions, just like it is called for empty whole-array reductions.

Beyond the unification (which is valuable in its own right), this change enables some downstream simplifications to dimensional reductions and is likely to be the "most breaking" public behavior in a refactoring targeting more correctness improvements.
@mbauman mbauman force-pushed the mb/mapreducedim_empty branch from adfe6f8 to 0e924cf Compare May 9, 2025 18:39
@mbauman
Copy link
Member Author

mbauman commented May 10, 2025

Despite the label here, this is ready to go. It just requires coordination with SparseArrays. I think the best approach would be to merge JuliaSparse/SparseArrays.jl#615 first. It’s failing over there, but those exact same tests pass in this PR’s CI. Then once that’s merged, we can adjust the second commit here to just bump the stdlib normally.

It’d be good to get a final review here in any case, and I don’t have commit rights to SparseArrays myself.

@oscardssmith oscardssmith removed the DO NOT MERGE Do not merge this PR! label May 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fold sum, maximum, reduce, foldl, etc. minor change Marginal behavior change acceptable for a minor release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants