Skip to content

Refactor adjsort into adjsortperm #1183

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
May 21, 2025

Conversation

halleysfifthinc
Copy link
Contributor

@halleysfifthinc halleysfifthinc commented Apr 7, 2025

Benchmarks have been updated!

Using BenchmarkTools, I measure the following speed and memory improvements in adjsortperm over adjsort.

Benchmarking setup

Using the two meshes from this gist.

quads = GeoIO.load("quads.obj").geometry;
mixed = GeoIO.load("quads_tris.obj").geometry;
quads_connec = collect(faces(topology(quads), 2))
mixed_connec = collect(faces(topology(mixed), 2))

Comparison of adjsortperm (PR) vs adjsort (master) for a 72 tri, 36 quad torus:

BenchmarkTools.TrialJudgement: 
  time:   -97.80% => improvement (5.00% tolerance)
  memory: -98.38% => improvement (1.00% tolerance)
Details

adjsort master

julia> @benchmark adjsort(faces) setup=(faces=shuffle(mixed_connec)) seconds=15 evals=1 samples=15000
BenchmarkTools.Trial: 15000 samples with 1 evaluation per sample.
 Range (min … max):  788.378 μs …  21.293 ms  ┊ GC (min … max): 0.00% … 95.07%
 Time  (median):     868.029 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   926.702 μs ± 501.305 μs  ┊ GC (mean ± σ):  4.58% ±  7.58%

   ▁▂▃▄▆▇██▇▆▅▄▃▃▂▂▂▁▁▁▁▁▁▁     ▁▂▁                             ▃
  ▇█████████████████████████████████▆▆▆▆▅▅▆▅▆▇▆▆▆▆▅▆▆▄▄▅▄▄▁▅▁▁▃ █
  788 μs        Histogram: log(frequency) by time       1.38 ms <

 Memory estimate: 451.41 KiB, allocs estimate: 8673.

adjsortperm PR

julia> @benchmark adjsortperm(faces) setup=(faces=shuffle(mixed_connec)) seconds=15 evals=1 samples=15000
BenchmarkTools.Trial: 15000 samples with 1 evaluation per sample.
 Range (min … max):  14.086 μs …  9.780 ms  ┊ GC (min … max): 0.00% … 99.40%
 Time  (median):     19.307 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   20.389 μs ± 79.752 μs  ┊ GC (mean ± σ):  3.18% ±  0.81%

                ▁▄▄▅▇███▇▆▅▅▄▃▁▁                               
  ▁▁▁▁▁▁▂▂▂▃▃▅▆██████████████████▆▆▅▅▄▄▄▄▃▃▃▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁ ▄
  14.1 μs         Histogram: frequency by time        27.7 μs <

 Memory estimate: 7.32 KiB, allocs estimate: 25.

Comparison of adjsortperm (PR) vs adjsort (master) for a 72 quad torus:

BenchmarkTools.TrialJudgement: 
  time:   -91.12% => improvement (5.00% tolerance)
  memory: -95.41% => improvement (1.00% tolerance)
Details

adjsort master

julia> @benchmark adjsort(faces) setup=(faces=shuffle(quads_connec)) seconds=15 evals=1 samples=15000
BenchmarkTools.Trial: 15000 samples with 1 evaluation per sample.
 Range (min … max):  117.852 μs …  13.738 ms  ┊ GC (min … max): 0.00% … 97.99%
 Time  (median):     125.267 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   144.715 μs ± 270.531 μs  ┊ GC (mean ± σ):  8.43% ±  4.73%

  ▂▇██▇▆▅▄▄▄▃▃▂▁           ▁▂▂▁▁▁ ▁▂▂▂▂▁▁                       ▃
  ████████████████▇▇▇▇▆▅▇▇███████████████▇▇▇▇▆▅▅▄▅▅▃▁▃▃▁▃▅▆▆▅▆▆ █
  118 μs        Histogram: log(frequency) by time        242 μs <

 Memory estimate: 147.16 KiB, allocs estimate: 2691.

adjsortperm PR

julia> @benchmark adjsortperm(faces) setup=(faces=shuffle(quads_connec)) seconds=15 evals=1 samples=15000
BenchmarkTools.Trial: 15000 samples with 1 evaluation per sample.
 Range (min … max):   9.238 μs …  6.570 ms  ┊ GC (min … max): 0.00% … 99.14%
 Time  (median):     12.273 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   12.852 μs ± 53.564 μs  ┊ GC (mean ± σ):  3.38% ±  0.81%

                ▁▂▅▆▆▇▆█▆▆▅▅▃▂▁                                
  ▁▁▁▁▁▁▁▂▂▃▃▄▆▇███████████████▇▇▅▅▄▃▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  9.24 μs         Histogram: frequency by time        17.5 μs <

 Memory estimate: 6.76 KiB, allocs estimate: 24.

Copy link

codecov bot commented Apr 7, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.43%. Comparing base (6f1cd35) to head (b4e2e15).
Report is 3 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1183      +/-   ##
==========================================
+ Coverage   88.39%   88.43%   +0.04%     
==========================================
  Files         196      196              
  Lines        6133     6155      +22     
==========================================
+ Hits         5421     5443      +22     
  Misses        712      712              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@juliohm juliohm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @halleysfifthinc ! Thank you for trying to improve the performance of this constructor. That is super welcome.

Could you please comment on the need to enforce type annotations? Do they really contribute to the performance gains you are seeing? If we could avoid type annotations as much as possible, that would be ideal.

I left some review comments in a first round of review. Happy to review the other parts of the code later.

@halleysfifthinc
Copy link
Contributor Author

halleysfifthinc commented Apr 8, 2025

The type-annotations were added based on investigation with Cthulhu.jl. The type-annotations benefit the later code by allowing the compiler to work with more specific types1. Justification for each annotation given in the review comments.

In Blender, I created a test mesh (torus, 72 quads), and the benchmarks vs no type annotations is:

All quads, no type-annots benchmark

BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min  max):  626.260 μs   32.464 ms  ┊ GC (min  max): 0.00%  97.20%
 Time  (median):     674.720 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   762.418 μs ± 731.223 μs  ┊ GC (mean ± σ):  7.88% ±  8.53%

   ▃▅▇██▇▆▅▅▄▄▃▂▂▂▁▁▁      ▁▃▄▃▂▁▁▁                             ▂
  ███████████████████████▇████████████▇▆▄▆▅▅▅▆▅▆▄▆▄▅▃▅▃▅▅▅▃▆▅▆▃ █
  626 μs        Histogram: log(frequency) by time       1.11 ms <

 Memory estimate: 573.52 KiB, allocs estimate: 14665.

BenchmarkTools.TrialJudgement: 
  time:   +323.22% => regression (5.00% tolerance)
  memory: +271.06% => regression (1.00% tolerance)

I then split half the quads into tris (36 quads, 72 tris) and got:

Quads and tris; no type-annots benchmark

BenchmarkTools.Trial: 7528 samples with 1 evaluation per sample.
 Range (min  max):  3.641 ms   35.721 ms  ┊ GC (min  max): 0.00%  87.22%
 Time  (median):     3.837 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.971 ms ± 970.236 μs  ┊ GC (mean ± σ):  2.81% ±  7.91%

  ▆█▆▂  ▂▁                                                    ▁
  █████▇██▆▅▆▄▅▅▅▄▄▁▄▃▄▅▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆▆▇ █
  3.64 ms      Histogram: log(frequency) by time      10.7 ms <

 Memory estimate: 1.32 MiB, allocs estimate: 39578.

BenchmarkTools.TrialJudgement: 
  time:   +348.65% => regression (5.00% tolerance)
  memory: +216.15% => regression (1.00% tolerance)

Footnotes

  1. It seems that the type annotations alone (without the other changes) might actually make things slower, but when combined, enhances the benefit of the other changes.

@juliohm
Copy link
Member

juliohm commented Apr 8, 2025

That is very nice. Could you please share a MWE so that I can reproduce benchmark results and test the PR locally? :)

@halleysfifthinc
Copy link
Contributor Author

Quads and mixed faces meshes here.

using Meshes, GeoIO, BenchmarkTools

quads = GeoIO.load("quads.obj").geometry
quads_tris = GeoIO.load("quads_tris.obj").geometry

@benchmark topoconvert(HalfEdgeTopology, quads)
@benchmark topoconvert(HalfEdgeTopology, quads_tris)

adjsort is responsible for ~85% of the runtime and ~50% of allocations (count), but the HalfEdgeTopology(::AbstractVector{Tuple{HalfEdge,HalfEdge}}) constructor contributes ~80% of total allocated amount (from the Dicts). So although 47fa3f4 only removes a few individual allocations, they are larger allocations and save a decent amount of memory.

@halleysfifthinc halleysfifthinc force-pushed the halfedge-perf branch 2 times, most recently from 1d9cf83 to 71a2312 Compare April 14, 2025 23:40
@halleysfifthinc
Copy link
Contributor Author

I reimplemented adjsort as adjsortperm, which now returns a sort permutation (faster, and removes the need for the indexin call, which was a non-negligible runtime contributor). Before/after benchmarks have been updated in the top comment (its quite a bit faster).

Along the way, I found a weird mesh/topo that is manifold, however, the old adjsort led to one face being missing from the final converted topology. Despite my best efforts, I could not determine what/where/why the problem was. This gist contains two meshes which are identical save for the triangle orders and orientations (same vertices). I am happy to include the bad mesh as a test if you wish.

@halleysfifthinc
Copy link
Contributor Author

Failed CI doesn't seem related.

@juliohm
Copy link
Member

juliohm commented Apr 15, 2025

@halleysfifthinc thank you for refactoring the code.

I refactored it a bit more to make sure the new algorithm is easy to read. Do we still need to reintroduce type annotations to improve performance? Appreciate it if you could take another look. If type annotations are necessary, we could start with annotating the return type of adjsortperm, by typing einds::Vector{Vector{Int}} at the last line of the function.

If we could encapsulate these type annotations in the adjsortperm function, that would decrease our chances to break type instability in the future.

@juliohm
Copy link
Member

juliohm commented Apr 15, 2025

Along the way, I found a weird mesh/topo that is manifold, however, the old adjsort led to one face being missing from the final converted topology.

Do you mean that the old implementation had a bug? Is it present in the new implementation in this PR?

@halleysfifthinc halleysfifthinc marked this pull request as draft April 17, 2025 02:05
@halleysfifthinc
Copy link
Contributor Author

Just pushed yet another refactor, this version is much faster still (and as an unintentional bonus, now actually properly separates connected components, which as previously discussed, is something I need).

I converted to a draft, as I am still unsatisfied with some behaviors, but can't continue spending time on this for the moment.

I will update the benchmarks when I get a chance.

@juliohm
Copy link
Member

juliohm commented Apr 17, 2025

That is really amazing @halleysfifthinc. The new connected components function is quite useful in other contexts. We should probably convert it to a utility function after this PR is merged. Looking forward to it!

@halleysfifthinc
Copy link
Contributor Author

Benchmarks updated (now specifically for adjsort/adjsortperm)!

Re: connected_components utility, I agree. I'd like to see it exported, or public at least, so I can use it and avoid the Graphs.jl dependency.

Do you mean that the old implementation had a bug? Is it present in the new implementation in this PR?

Yes, the old adjsort was leading to one face not being added/recognized in the final HalfEdgeTopology. That specific problem is no longer present with the new adjsortperm implementation.

However, the old adjsort was only part of the problem, I believe the main HalfEdgeTopology(::Vector{<:Connectivity} function is incorrect/unreliable for certain mixes of face orientations. I have a mesh1 which has inconsistencies after being converted to a HalfEdgeTopology (next or prev edges not matching the current elem). I have been able to determine that the bug is related to inconsistent orientations among the faces, but I am struggling to find any logical errors in the existing orientation handling code here and above

if !CCW[e]
# reinsert pairs in CCW orientation
for i in 1:n
half4pair[(v[i + 1], v[i])] = HalfEdge(v[i + 1], eleminds[e])
end
end

Footnotes

  1. I can't share the mesh, and I don't understand the cause of the bug well enough to simplify the mesh like I did for the meshes attached to my earlier comment)

@juliohm
Copy link
Member

juliohm commented Apr 21, 2025

Benchmarks updated (now specifically for adjsort/adjsortperm)!

🚀

Please let us know when the PR is ready for review @halleysfifthinc 🙂

@halleysfifthinc
Copy link
Contributor Author

tl;dr: The face ordering with the new connected_components/adjsortperm isn't adjacently sorted under the strictest definition (subsequent faces share at least one already existing edge). Obviously, it is your call whether this is acceptable, but the updated HalfEdgeTopology(::Vector{<:Connectivity}) function handles the current (and previous) behavior. (Although nearly all of the performance gains are from the refactored adjacency sort.)

I determined that the bug I was hunting is because adjacency is now functionally defined as any face that has >2 previously observed vertices. This allows a face (triangle 4 in the photo) to be added, even if all of its edges are new/previously unobserved. The orientation of this triangle cannot be checked against existing faces, and we can end up with irreparable inconsistencies when the "hole" (triangle 5 in the photo) is filled (if the orientation of triangle 4 is ultimately inconsistent with the correct orientation of triangle 5).

That bug is not present in the original code on master.

@juliohm
Copy link
Member

juliohm commented Apr 29, 2025

Thank you for the update @halleysfifthinc. Before I spend time reviewing the minor details in the diff, could you please confirm the nature of the changes in this PR?

Is it correct to say that there are

  1. Changes in the HalfEdgeTopology constructor with vector of Connectivity to improve allocations and type stability
  2. Changes in the adjsortperm to improve runtime in terms of a new connected_components function

and nothing else?

What about the behavior change you mentioned? Where is it coming from? You mean that the new adjsortperm implementation is not equivalent to the current implementation in master? What is the new definition of adjacency if not >2 vertices in common?

I wonder if we could split this PR into smaller PRs that are easier to review and test. The HalfEdgeTopology is used in various topological relations and algorithms, and we should do our best to avoid introducing new bugs.

@halleysfifthinc
Copy link
Contributor Author

Is it correct to say that there are

Yes that is a correct summary. I'm happy to split this up for easier review if you would like.

You mean that the new adjsortperm implementation is not equivalent to the current implementation in master?

Yes, the new adjsortperm does not produce identical sorts to the current implementation in master. This is partially by design (the current adjsort is not stable due to the reverse iteration order, which means that sorting a sorted Vector{<:Connectivity} still changes the face order).

What is the new definition of adjacency if not >2 vertices in common?

The old definition was theoretically defined by >2 previously observed vertices, but due to the way adjsort was implemented, the actual functioning definition was >1 previously observed edge (defined by a pair of previously observed vertices). The distinction is subtle (and therefore took me a while to realize), but you can have a face that shares 2 previously observed vertices without those vertices having been previously observed as an edge/pair (e.g. triangle 4 in the above photo diagram).

I won't have the time to investigate the new test failures for a few days.

@juliohm
Copy link
Member

juliohm commented Apr 29, 2025

I'm happy to split this up for easier review if you would like.

That would be super appreciated. Smaller PRs with all tests passing can be quickly merged and shared with all users of the package via patch releases. If you can identify subsets of changes that just improve performance, without behavior change, we can quickly review and approve.

@juliohm
Copy link
Member

juliohm commented May 9, 2025

@halleysfifthinc did you have a chance to split this PR into smaller ones? 🙂

@juliohm
Copy link
Member

juliohm commented May 12, 2025

#1193 merged 💯

@halleysfifthinc
Copy link
Contributor Author

Would it be correct to assume that the new algorithm to convert vector of connectivity into vector of half-edges is robust to any order of elements?

As far as I can logically determine, that assumption is correct.

If we can have an idea of average runtime and correctness with and without adjsort, that would be awesome.

Definitely. There is still a benefit to pre-sorting the elements. I compared a pre-shuffled connectivity vector to a pre-sorted connectivity vector (using sort=false for both cases), and I found:

# all quad torus
BenchmarkTools.TrialJudgement: 
  time:   -18.08% => improvement (5.00% tolerance)
  memory: -6.09% => improvement (1.00% tolerance)

# quads and tris torus
BenchmarkTools.TrialJudgement: 
  time:   -40.78% => improvement (5.00% tolerance)
  memory: -36.61% => improvement (1.00% tolerance)

As far as "correctness", the only differences between sorted or not will be the order of the edges. I don't have a strong opinion, but I think the possibility of multiple equally valid sort permutations (e.g. for connect.([(1, 2, 3), (3, 4, 2), (2, 1, 5), (6, 3, 1)])) makes a reasonable case against making a strong (or any) commitment for edge index stability.

@halleysfifthinc halleysfifthinc marked this pull request as ready for review May 21, 2025 01:15
@halleysfifthinc halleysfifthinc requested a review from juliohm May 21, 2025 01:15
comps = [Int[firstindex(elems)]]

# initialize list of seen vertices
seen = Set(indices(first(elems)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to simplify the code in a few places without looking into Cthulhu.jl outputs. Please feel free to comment if you see opportunities to improve performance further.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous manual seen initialization is faster vs seen = Set(indices(first(elems))):

BenchmarkTools.TrialJudgement: 
  time:   -17.02% => improvement (5.00% tolerance)
  memory: -0.21% => invariant (1.00% tolerance)

@juliohm
Copy link
Member

juliohm commented May 21, 2025

All test changes are correct ✅

I will benchmark the PR against master in a few minutes.

comps = [Int[firstindex(elems)]]

# initialize list of seen vertices
seen = Set(indices(first(elems)))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous manual seen initialization is faster vs seen = Set(indices(first(elems))):

BenchmarkTools.TrialJudgement: 
  time:   -17.02% => improvement (5.00% tolerance)
  memory: -0.21% => invariant (1.00% tolerance)

Copy link
Member

@juliohm juliohm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another amazing speedup 💯 🚀

@juliohm juliohm merged commit 841cbf0 into JuliaGeometry:master May 21, 2025
16 checks passed
@juliohm
Copy link
Member

juliohm commented May 21, 2025

Patch released: JuliaRegistries/General#131456

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants