Replace FixedBitSet with SortedVecSet in Access #18955

Trashtalk217 · 2025-04-26T23:53:06Z

Third time's the charm: adapted from #16784

Objective

The access bitsets are currently efficient because the ComponentIds are dense: they are incremented from 0 and should remain small. To handle large amounts of components more efficiently, we should switch to a solution that scales with the number of components that are matched with, instead of the total number of components in the world. This is also a more intuitive performance profile.

Solution

We replace FixedBitSets with sorted vectors. The vectors should remain relatively small since queries usually don't involve hundreds of components.
These allow us to do union, difference, and intersection operations in O(n), with n the number of components the query matches against, instead of O(N) where N is the total number of components.

Benchmarking

Using cargo run --profile stress-test --example many_components <entities> <components> <systems> I got the following results:

entities	components	systems	a-c ids	pr	main	change
50000	100	100	240029	64.1 (4.4%)	148.0 (12.0%)	-56.7%
50000	100	1600	240017	0.81 (0.4%)	8.99 (8.5%)	-91.0%
50000	2000	100	246550	3834.2 (5.5%)	2427.2 (11.1%)	58.0%
50000	2000	1600	246733	29.8 (3.5%)	141.0 (9.2%)	-78.9%
1000000	100	100	`4576063`	0.22 (0.6%)	8.31 (2.7%)	-97.3%
500000	100	400	2313963	0.034 (0.5%)	2.92 (6.0%)	-98.8%
500000	2000	100	2445538	282.6 (8.8%)	187.3 (6.0%)	50.9%
500000	2000	400	2445738	6.46 (5.9%)	47.5 (7.8%)	-86.2%

The numbers for pr and main are average FPS. The percentages next to them are the standard deviations expressed as a percentage of the average. The change is the percentage of (pr - main) / main. A negative percentage means a loss of performance in comparison to main.

Interpretation

The numbers are quite dramatic (making me doubt some of the tests I've done), assuming, however, that the tests are good enough, we see a large difference between main and this pr. Before testing, I predicted that this implementation would perform better with a larger number of components. I was correct here, but it takes a very large number of components before this effect is noticeable. Meanwhile, the performance loss at a low amount of components is horrific, and it also scales poorly with added system.

It should be noted that many_components is specifically designed to stress test this part of the code, and it will have less of an impact on normal applications.

therealbnut

It’d be good to write a benchmark to verify that it does improve performance. A bitset should be able to stream set operations for 64 indices at a time, which is probably slower in many cases, but is still pretty fast. We may want some future work with a hybrid or new structure that’s better than both, but we’ll want benchmarks to know for sure.

Trashtalk217 · 2025-04-27T13:13:49Z

I've done the benchmarking and I'm open to suggestions. For improvements. I know that Flecs has reserved components (or entities, since components and entities share the same id space), and we could potentially go for a hybrid approach as @therealbnut suggested, where, as an example, the first 256 component ids are reserved using bit sets and the others are saved via sorted vecs.

I'm sceptical about doing that in this PR, however, as it would require reserved components first (and maybe components as entities).

Although now I'm thinking about it, it might be possible.

therealbnut · 2025-04-28T05:43:59Z

Sorry, for the confusion, I was actually meaning for you to add benchmark code, for reproducibility. It turns out the original PR #14385 also has a lot of similar benchmarking suggestions from Cart.

therealbnut · 2025-04-28T05:53:35Z

The performance testing you have run is interesting though, it seems to indicate that performance is much worse until you're dealing with large numbers. Which I guess confirms the original hypothesis that high-bits or very sparse sets will cause issues with the current code. Although the current approach in this PR also causes issues for lower numbers.

I don't know if this is the best approach, but for the hybrid I was imagining something more conceptually like Map<HighBits, Set<LowBits>>, where Map is something sorted to allow for cheap intersections (BTreeMap, SortedVec<(Key, Value)>, etc.), and Set is something with packed bits (BitSet, etc.). That way for several thousand contiguous indices near zero Map will only have a single entry and will just have a small indirection cost on top of what we have on main.

This is a crate with the sort of thing I was thinking, I haven't used it, but it may help explain: https://crates.io/crates/hibitset

bushrat011899 · 2025-04-28T11:07:02Z

It'd be interesting to see what the beak-even point is for this approach. It might make sense to investigate a hybrid technique, where the first 512 or so ComponentId's will be stored in the existing bitset, and then anything higher than that gets stored in the vec-set. That 512 is just a guess at the break-even point.

Conceptually though, I agree that Bevy should more gracefully handle cases with high component counts. It's not hard to encounter these performance issues, especially with dynamic components.

chescock · 2025-04-28T16:53:09Z

I don't know if this is the best approach, but for the hybrid I was imagining something more conceptually like Map<HighBits, Set<LowBits>>, where Map is something sorted to allow for cheap intersections (BTreeMap, SortedVec<(Key, Value)>, etc.), and Set is something with packed bits (BitSet, etc.). That way for several thousand contiguous indices near zero Map will only have a single entry and will just have a small indirection cost on top of what we have on main.

I wonder whether we could make that work using a simple usize bitmask for Set<LowBits>, and the same SortedVecSet as this PR for the Map. usize would only pack 32 or 64 indexes at a time, but it would limit the worst case increase in size to 2x, both compared to FixedBitSet and to SortedVecSet!

chescock · 2025-04-28T13:51:06Z

crates/bevy_ecs/src/query/access.rs

@@ -111,7 +74,7 @@ impl<T: SparseSetIndex> Clone for Access<T> {
        self.archetypal.clone_from(&source.archetypal);
    }
 }
-
+/*


Did you mean to delete these lines instead?

chescock · 2025-04-28T13:58:15Z

crates/bevy_ecs/src/query/access.rs

-            resource_read_and_writes: FixedBitSet::new(),
-            resource_writes: FixedBitSet::new(),
-            archetypal: FixedBitSet::new(),
+            component_read_and_writes: SortedVecSet::<ACCESS_SMALL_VEC_SIZE>::new(),


I think you can leave off the ::<ACCESS_SMALL_VEC_SIZE> in most places and let type inference figure it out.

Suggested change

component_read_and_writes: SortedVecSet::<ACCESS_SMALL_VEC_SIZE>::new(),

component_read_and_writes: SortedVecSet::new(),

chescock · 2025-04-28T14:10:35Z

crates/bevy_ecs/src/query/access.rs

    }
 }

 impl<T: SparseSetIndex> From<Vec<T>> for AccessConflicts {
    fn from(value: Vec<T>) -> Self {
-        Self::Individual(value.iter().map(T::sparse_set_index).collect())
+        let mut conflicts = SortedVecSet::new();


Would it make sense to impl FromIterator for SortedVecSet so we can keep using collect here?

This is also O(N^2), since the insert is O(N). You could resolve that by collecting into a Vec and calling SortedVecSet::from_vec.

chescock · 2025-04-28T14:12:55Z

crates/bevy_ecs/src/storage/sorted_vec_set.rs

+    }
+}
+
+impl<const N: usize> Default for SortedVecSet<N> {


Could this be a #[derive(Default)]?

chescock · 2025-04-28T14:29:10Z

crates/bevy_ecs/src/storage/sorted_vec_set.rs

+            match self.0[i].cmp(&other.0[j]) {
+                Ordering::Less => i += 1,
+                Ordering::Greater => {
+                    self.0.insert(i, other.0[j]);


I noted this on the previous version, but this is an O(N^2) algorithm. I sketched out an O(N) algorithm in that comment that might work.

therealbnut · 2025-04-29T05:22:32Z

@Trashtalk217 @chescock a proof of concept of the SortedVecMap<usize> collection type if you're interested to try: rust playground.

alice-i-cecile · 2025-05-05T17:16:27Z

For the record, I'm broadly on board with this but don't have the bandwidth to properly review. Ping me / label this PR when y'all are convinced that this is ready.

Trashtalk217 added 2 commits April 26, 2025 23:54

initial implementation

138511f

taplo fmt

567b00e

therealbnut reviewed Apr 27, 2025

View reviewed changes

Trashtalk217 requested review from bushrat011899, chescock and ItsDoot April 27, 2025 13:27

Trashtalk217 added A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Apr 27, 2025

chescock reviewed Apr 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace FixedBitSet with SortedVecSet in Access #18955

Replace FixedBitSet with SortedVecSet in Access #18955

Trashtalk217 commented Apr 26, 2025 •

edited

Loading

therealbnut left a comment

Trashtalk217 commented Apr 27, 2025

therealbnut commented Apr 28, 2025 •

edited

Loading

therealbnut commented Apr 28, 2025

bushrat011899 commented Apr 28, 2025

chescock commented Apr 28, 2025

chescock Apr 28, 2025

chescock Apr 28, 2025

chescock Apr 28, 2025

chescock Apr 28, 2025

chescock Apr 28, 2025

therealbnut commented Apr 29, 2025 •

edited

Loading

alice-i-cecile commented May 5, 2025

	component_read_and_writes: SortedVecSet::<ACCESS_SMALL_VEC_SIZE>::new(),
	component_read_and_writes: SortedVecSet::new(),

Replace FixedBitSet with SortedVecSet in Access #18955

Are you sure you want to change the base?

Replace FixedBitSet with SortedVecSet in Access #18955

Conversation

Trashtalk217 commented Apr 26, 2025 • edited Loading

Objective

Solution

Benchmarking

Interpretation

therealbnut left a comment

Choose a reason for hiding this comment

Trashtalk217 commented Apr 27, 2025

therealbnut commented Apr 28, 2025 • edited Loading

therealbnut commented Apr 28, 2025

bushrat011899 commented Apr 28, 2025

chescock commented Apr 28, 2025

chescock Apr 28, 2025

Choose a reason for hiding this comment

chescock Apr 28, 2025

Choose a reason for hiding this comment

chescock Apr 28, 2025

Choose a reason for hiding this comment

chescock Apr 28, 2025

Choose a reason for hiding this comment

chescock Apr 28, 2025

Choose a reason for hiding this comment

therealbnut commented Apr 29, 2025 • edited Loading

alice-i-cecile commented May 5, 2025

Trashtalk217 commented Apr 26, 2025 •

edited

Loading

therealbnut commented Apr 28, 2025 •

edited

Loading

therealbnut commented Apr 29, 2025 •

edited

Loading