-
-
Notifications
You must be signed in to change notification settings - Fork 3.9k
implement UniqueEntityVec #17549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement UniqueEntityVec #17549
Conversation
I'd like to add support for this as a relations storage (for e.g. Children) too. I think that's a really powerful and sensible invariant to uphold there (and something we should probably swap to internally in a follow-up). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent stuff: this hits all the notes I was hoping for here. Great docs, clear and unsurprising API, easy conversion back into a normal vec, simple construction from trusted set-like data structures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I left a few questions and suggestions, but nothing blocking.
/// This type is best obtained by its `FromEntitySetIterator` impl, via either | ||
/// `EntityIterator::collect_set` or `UniqueEntityVec::from_entity_iter`. | ||
/// | ||
/// While this type can be constructed via `Iterator::collect`, doing so is inefficient, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The O(N^2) FromIterator
and Extend
implementations seem like a bit of a performance footgun. Would it make sense to only offer them as named functions so that folks don't use them accidentally?
... Oh, I see, FromIterator
is a supertrait of FromEntitySetIterator
. Why is that a supertrait? UniqueEntityVec
seems like an example of a type that should be FromEntitySetIterator
but should not be FromIterator
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FomEntitySetIterator
is a subtrait of FromIterator
because it can't exist without it!
It represents "This can be constructed from an iterator, however we can do better by skipping validation".
Any call to collect
/from_iter
really should evaluate to the FromEntitySetIterator
version. And that is what supertrait item shadowing is for! That RFC has been accepted, and is in the process of moving through the rust dev pipeline.
The O(n^2) impls aren't necessarily wrong either, given that this struct is not bounded on either Ord
or Hash
. While rare, we want to leave the possibility for T
to only implement Eq
open.
Performance-wise, these impls can make sense for very low N, where the construction cost of things like a HashSet
outweigh the few direct comparisons.
I did document that normal collect
should be avoided on the struct itself, and the very concept of the struct should tell that extending this struct is not free, however I've added further documentation to these trait method impls.
It'd be nice if unintentional use of collect
over collect_set
could be linted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FomEntitySetIterator
is a subtrait ofFromIterator
because it can't exist without it!
It represents "This can be constructed from an iterator, however we can do better by skipping validation".
I don't follow. Why couldn't it instead represent "this can only be constructed from an iterator of unique values"?
Not every function that accepts an EntitySetIterator
can accept a plain Iterator
. That's why we have these traits in the first place! So it seems perfectly reasonable to have a collection that can be created from an EntitySetIterator
but not from a plain Iterator
. If you want to build one from a non-unique iterator then you should call a method to validate or de-duplicate it first.
given that this struct is not bounded on either
Ord
orHash
It's bounded on TrustedEntityBorrow
, though, so we can always get the Entity
and hash or sort that.
I did document that normal
collect
should be avoided on the struct itself
Yeah, but not everyone is going to read the documentation. :) They'll just write .collect()
to make the type checker happy and assume it's good enough. It would be good to push them into the pit of success instead!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FomEntitySetIterator
is a subtrait ofFromIterator
because it can't exist without it!
It represents "This can be constructed from an iterator, however we can do better by skipping validation".I don't follow. Why couldn't it instead represent "this can only be constructed from an iterator of unique values"?
Not every function that accepts an
EntitySetIterator
can accept a plainIterator
. That's why we have these traits in the first place! So it seems perfectly reasonable to have a collection that can be created from anEntitySetIterator
but not from a plainIterator
. If you want to build one from a non-unique iterator then you should call a method to validate or de-duplicate it first.
At first I also thought to exclude these FromIterator
and Extend
impls, however what made add them back is the interaction with the greater ecosystem: Rust generally structures things by composing them, and excluding these impls means we lose on composition:
What is a collection? Different crates and traits define this differently. Sometimes everything that impls FromIterator
, i.e. can be "collected" is a collection. Sometimes, if something can be made into an iterator (IntoIterator
), it is a collection.
Sometimes if can be extended (Extend
) it is a collection, and often, it is some composition of those three.
By excluding these trait impls, UniqueEntityVec
will not qualify as a "collection" in other places.
An important example is RelationshipSourceCollection
: This trait defines "collection" differently than the EntitySet
logic does. Whether that is the best way to do it remains to be seen! However right now, it has a safe add
method.
If UniqueEntityVec
does not have a safe way to add elements, then it should not be RelationshipSourceCollection
, because the safety contract cannot be propagated to a safe trait method.
The same goes for other traits: If a FromIterator
impl does not exist, then an unsatisfied bound might make it not at all be possible to use UniqueEntityVec
. In many cases a default/provided trait method impl can be overridden, collect
is just one of the unfortunate exceptions.
I agree that this a footgun, but it is a temporary, documented, and a lintable one! (I've added UniqueEntityVec::from_iter
and EntitySetIterator::collect::<UniqueEntityVec>
to disallowed-methods
in clippy-toml).
It is definitely a trade-off, but I lean towards not making useful things impossible, and linting/documenting sketchy uses instead.
/// This type is best obtained by its `FromEntitySetIterator` impl, via either | ||
/// `EntityIterator::collect_set` or `UniqueEntityVec::from_entity_iter`. | ||
/// | ||
/// While this type can be constructed via `Iterator::collect`, doing so is inefficient, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, would it be useful to have a method that does a safe O(N log N) collection using sort
and dedup
? It can't be the default since it reorders the input, but I think that will often be the most efficient approach for untrusted input, and I don't think the current API exposes a way to do it without unsafe
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do have another trait I want to add, ToEntitySetIterator
. Since all iterators can be made unique, it is simplest to do so with a trait, it would do this with a HashSet
though, since that can be built iteratively, and you could define a set of entities to exclude (like parent).
That would be the main way to get a trusted EntitySet
from untrusted input. I think there are still specific cases where a sort
+ dedup
would be preferred, but I'd consider them rare enough that it is okay to leave them up to the user to then do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this more, I bet what we'll actually want for sort
+ dedup
is something like SortedVecSet
from #16784. Then you could collect
into a SortedVecSet
want that kind of deduplication, and do a cheap trusted conversion to UniqueEntityVec
if you need that instead.
(That's all future stuff, obviously, and shouldn't affect this PR.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you think about it, "all elements are in sorted order" is a property a collection holds, just like "all elements are unique".
As long as a property belongs to a collection, it can be managed and preserved the same way we now do for uniqueness.
Although, for "sortedness" you'd additionally need to carry along "what it is sorted by".
Having that information would then be useful for insertion, modification and again, skipping validation.
You could even compose these properties, like in a SortedUniqueVec
.
Definitely useful, though whether that should be done can be left for the future!
…ds, clarify reason
# Objective In bevyengine#16547, we added `EntitySet`s/`EntitySetIterator`s. We can know whenever an iterator only contains unique entities, however we do not yet have the ability to collect and reuse these without either the unsafe `UniqueEntityIter::from_iterator_unchecked`, or the expensive `HashSet::from_iter`. An important piece for being able to do this is a `Vec` that maintains the uniqueness property, can be collected into, and is itself `EntitySet`. A lot of entity collections are already intended to be "unique", but have no way of expressing that when stored, other than using an aforementioned `HashSet`. Such a type helps by limiting or even removing the need for unsafe on the user side when not using a validated `Set` type, and makes it easier to interface with other infrastructure like f.e. `RelationshipSourceCollection`s. ## Solution We implement `UniqueEntityVec`. This is a wrapper around `Vec`, that only ever contains unique elements. It mirrors the API of `Vec`, however restricts any mutation as to not violate the uniqueness guarantee. Meaning: - Any inherent method which can introduce new elements or mutate existing ones is now unsafe, f.e.: `insert`, `retain_mut` - Methods that are impossible to use safely are omitted, f.e.: `fill`, `extend_from_within` A handful of the unsafe methods can do element-wise mutation (`retain_mut`, `dedup_by`), which can be an unwind safety hazard were the element-wise operation to panic. For those methods, we require that each individual execution of the operation upholds uniqueness, not just the entire method as a whole. To be safe for mutable usage, slicing and the associated slice methods require a matching `UniqueEntitySlice` type , which we leave for a follow-up PR. Because this type will deref into the `UniqueEntitySlice` type, we also offer the immutable `Vec` methods on this type (which only amount to a handful). "as inner" functionality is covered by additional `as_vec`/`as_mut_vec` methods + `AsRef`/`Borrow` trait impls. Like `UniqueEntityIter::from_iterator_unchecked`, this type has a `from_vec_unchecked` method as well. The canonical way to safely obtain this type however is via `EntitySetIterator::collect_set` or `UniqueEntityVec::from_entity_set_iter`. Like mentioned in bevyengine#17513, these are named suboptimally until supertrait item shadowing arrives, since a normal `collect` will still run equality checks.
# Objective Follow-up to #17549 and #16547. A large part of `Vec`s usefulness is behind its ability to be sliced, like sorting f.e., so we want the same to be possible for `UniqueEntityVec`. ## Solution Add a `UniqueEntitySlice` type. It is a wrapper around `[T]`, and itself a DST. Because `mem::swap` has a `Sized` bound, DSTs cannot be swapped, and we can freely hand out mutable subslices without worrying about the uniqueness invariant of the backing collection! `UniqueEntityVec` and the relevant `UniqueEntityIter`s now have methods and trait impls that return `UniqueEntitySlice`s. `UniqueEntitySlice` itself can deref into normal slices, which means we can avoid implementing the vast majority of immutable slice methods. Most of the remaining methods: - split a slice/collection in further unique subsections/slices - reorder the slice: `sort`, `rotate_*`, `swap` - construct/deconstruct/convert pointer-like types: `Box`, `Arc`, `Rc`, `Cow` - are comparison trait impls As this PR is already larger than I'd like, we leave several things to follow-ups: - `UniqueEntityArray` and the related slice methods that would return it - denoted by "chunk", "array_*" for iterators - Methods that return iterators with `UniqueEntitySlice` as their item - `windows`, `chunks` and `split` families - All methods that are capable of actively mutating individual elements. While they could be offered unsafely, subslicing makes their safety contract weird enough to warrant its own discussion. - `fill_with`, `swap_with_slice`, `iter_mut`, `split_first/last_mut`, `select_nth_unstable_*` Note that `Arc`, `Rc` and `Cow` are not fundamental types, so even if they contain `UniqueEntitySlice`, we cannot write direct trait impls for them. On top of that, `Cow` is not a receiver (like `self: Arc<Self>` is) so we cannot write inherent methods for it either.
# Objective Follow-up to bevyengine#17549 and bevyengine#16547. A large part of `Vec`s usefulness is behind its ability to be sliced, like sorting f.e., so we want the same to be possible for `UniqueEntityVec`. ## Solution Add a `UniqueEntitySlice` type. It is a wrapper around `[T]`, and itself a DST. Because `mem::swap` has a `Sized` bound, DSTs cannot be swapped, and we can freely hand out mutable subslices without worrying about the uniqueness invariant of the backing collection! `UniqueEntityVec` and the relevant `UniqueEntityIter`s now have methods and trait impls that return `UniqueEntitySlice`s. `UniqueEntitySlice` itself can deref into normal slices, which means we can avoid implementing the vast majority of immutable slice methods. Most of the remaining methods: - split a slice/collection in further unique subsections/slices - reorder the slice: `sort`, `rotate_*`, `swap` - construct/deconstruct/convert pointer-like types: `Box`, `Arc`, `Rc`, `Cow` - are comparison trait impls As this PR is already larger than I'd like, we leave several things to follow-ups: - `UniqueEntityArray` and the related slice methods that would return it - denoted by "chunk", "array_*" for iterators - Methods that return iterators with `UniqueEntitySlice` as their item - `windows`, `chunks` and `split` families - All methods that are capable of actively mutating individual elements. While they could be offered unsafely, subslicing makes their safety contract weird enough to warrant its own discussion. - `fill_with`, `swap_with_slice`, `iter_mut`, `split_first/last_mut`, `select_nth_unstable_*` Note that `Arc`, `Rc` and `Cow` are not fundamental types, so even if they contain `UniqueEntitySlice`, we cannot write direct trait impls for them. On top of that, `Cow` is not a receiver (like `self: Arc<Self>` is) so we cannot write inherent methods for it either.
Objective
In #16547, we added
EntitySet
s/EntitySetIterator
s. We can know whenever an iterator only contains unique entities, however we do not yet have the ability to collect and reuse these without either the unsafeUniqueEntityIter::from_iterator_unchecked
, or the expensiveHashSet::from_iter
.An important piece for being able to do this is a
Vec
that maintains the uniqueness property, can be collected into, and is itselfEntitySet
.A lot of entity collections are already intended to be "unique", but have no way of expressing that when stored, other than using an aforementioned
HashSet
. Such a type helps by limiting or even removing the need for unsafe on the user side when not using a validatedSet
type, and makes it easier to interface with other infrastructure like f.e.RelationshipSourceCollection
s.Solution
We implement
UniqueEntityVec
.This is a wrapper around
Vec
, that only ever contains unique elements. It mirrors the API ofVec
, however restricts any mutation as to not violate the uniqueness guarantee. Meaning:insert
,retain_mut
fill
,extend_from_within
A handful of the unsafe methods can do element-wise mutation (
retain_mut
,dedup_by
), which can be an unwind safety hazard were the element-wise operation to panic. For those methods, we require that each individual execution of the operation upholds uniqueness, not just the entire method as a whole.To be safe for mutable usage, slicing and the associated slice methods require a matching
UniqueEntitySlice
type , which we leave for a follow-up PR.Because this type will deref into the
UniqueEntitySlice
type, we also offer the immutableVec
methods on this type (which only amount to a handful). "as inner" functionality is covered by additionalas_vec
/as_mut_vec
methods +AsRef
/Borrow
trait impls.Like
UniqueEntityIter::from_iterator_unchecked
, this type has afrom_vec_unchecked
method as well.The canonical way to safely obtain this type however is via
EntitySetIterator::collect_set
orUniqueEntityVec::from_entity_set_iter
. Like mentioned in #17513, these are named suboptimally until supertrait item shadowing arrives, since a normalcollect
will still run equality checks.