Python: Remove imprecise container steps #17493
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
We used to have taint steps from any element of a collection to the entire collection (see here).
These are very imprecise, leading to false positives (e.g. seen here and here).
They are also at odds with how other languages treat collections, see our issue about this.
We wish to keep the semantics, that if a collection is tainted, then all elements are considered tainted. Therefor we now try to not taint collections, if we have precise information about which elements are tainted.
For a list, if an element is tainted, we do not know which one, so any read is potentially reading tainted information.
There is not much difference between the list having content and the list being tainted.
But for a dictionary, if an entry is tainted and we know which one, only reads of the appropriate key is reading tainted information. All other reads should ideally be considered safe (they used to not be). If we do not know that other keys are safe, e.g. if the collection came from an untrusted source, we can taint the collection itself, and all reads will be considered dangerous. So for collections with precise content, there is a big difference between having content and the collection being tainted.
Thus we wish to remove these imprecise taint steps for tuples and dictionaries, where we track content precisely (we keep them for lists and sets, where content is imprecise anyway).
Changes
In this PR we do the following:
tupleStoreStep
anddictStoreStep
fromcontainerStep
These are imprecise compared to the content being precise.AdditionalTaintStep
that now only covers when the full container is tainted.Status:
Potential confusions:
Improvements: