Skip to content

[3.13] GH-132380: Avoid locking in _PyType_LookupRef() #132381

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: 3.13
Choose a base branch
from

Conversation

nascheme
Copy link
Member

@nascheme nascheme commented Apr 11, 2025

Change find_name_in_mro() so we don't need hold TYPE_LOCK. Change _PyType_LookupRef() so that TYPE_LOCK is acquired only if the type doesn't have a version tag already assigned. If the version is already assigned, we can avoid acquiring TYPE_LOCK.

Change find_name_in_mro() so we don't need hold TYPE_LOCK.  Change
_PyType_LookupRef() so that TYPE_LOCK is acquired only if the type
doesn't have a version tag already assigned.  If the version is already
assigned, we can avoid acquiring TYPE_LOCK.
@nascheme nascheme added performance Performance or resource usage 3.13 bugs and security fixes topic-free-threading labels Apr 11, 2025
@colesbury
Copy link
Contributor

I'm not sure I fully understand this, but a few preliminary thoughts:

  • I think this splits up the (slow path) lookup and update of the MRO cache into two separate lock acquisitions. That doesn't seem safe to me. I think that would allow interleavings in which we permanently replace a good cache entry with a stale entry.
  • Why is the PR targeting 3.13? Is the problem limited to 3.13?
  • If performance issue is just with non-interned unicode strings, I think we can address that by other strategies. For example, we could first looking up the canonical interned string (if it exists). Or we could modify the cache lookup to fall back to a string data comparison if the lookup key is not interned.

@nascheme
Copy link
Member Author

I'm not sure I fully understand this, but a few preliminary thoughts:

  • I think this splits up the (slow path) lookup and update of the MRO cache into two separate lock acquisitions. That doesn't seem safe to me. I think that would allow interleavings in which we permanently replace a good cache entry with a stale entry.

Yes, I think the replacement could happen. If I understand the interleaving you are thinking of, two lookups on the same type could be racing with each other, call them A and B. The A starts first and assigns a version tag that will later be out-of-date. The B lookup starts after but completes first, adding its valid entry to the cache. The A lookup completes after, overriding the good cache entry with a stale one.

I don't see this being a major problem. We validate the cache entry on lookup, comparing entry version and type version, and so it's not correctness issue. We would just have to do the slow lookup again and re-add it to the cache. Maybe I don't understand the interleaving you are thinking of. I don't think it's a permanent replacement.

Another scenario is that you replace a valid cache entry with a stale one due to hash table collision. Again, I don't think that's permanent. The next lookup for that previous entry will miss the cache, do the slow lookup and re-add the entry to the cache.

We could avoid adding the invalid cache entry by checking if assigned_version matches the current type version, after find_name_in_mro() completes. That test adds a few extra instructions but you are on the slow path and it probably doesn't matter. I figured adding a known-to-be stale cache entry is not that big a deal.

  • Why is the PR targeting 3.13? Is the problem limited to 3.13?

The main branch has the same contention problem. I have a similar fix as part of GH-131174. I can make a separate PR if we want to fix it separately. I'm hoping I'm close to making GH-131174 acceptable to merge.

  • If performance issue is just with non-interned unicode strings, I think we can address that by other strategies. For example, we could first looking up the canonical interned string (if it exists). Or we could modify the cache lookup to fall back to a string data comparison if the lookup key is not interned.

If my approach is safe, I think it provides benefits beyond just the non-interned string case. I would expect that most types don't get mutated after a certain point. With this change, once their type version tag is assigned, _PyType_LookupRef() no longer needs to acquire the global types lock.

@nascheme
Copy link
Member Author

Thinking more about this change: because the types lock is not held during find_name_in_mro(), modifying class attributes while doing a class lookup is not atomic. However, this matches the GIL enabled behavior as well. E.g. the attached script will occasionally raise AttributeError.

race_mro_lookup.py.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes performance Performance or resource usage topic-free-threading
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants