Description
Bug report
Bug description:
There are two problems with importlib.reload
regarding thread safety, and both are race conditions.
I naturally assume the function should be thread-safe, consistently with other components of importlib
.
I never ran into someone reloading the same module from different threads, but I can imagine such a scenario happen in a large, self-aware codebase.
Problem 1: Race condition when the same module is reloaded from two threads simultaneously
This guard in importlib.reload
is probably supposed to do two things:
cpython/Lib/importlib/__init__.py
Lines 110 to 112 in 85036c8
- If a module tries to reload itself (before
_bootstrap._exec
finishes execution), return it immediately. - If two or more threads try to reload one module at a time, only let one of them through and return still-reloaded module to the other threads. In other words, make other threads access the not-yet-reloaded module eagerly, while it is being reloaded by the first thread to have initiated the reload.
Assuming (2) is what the snippet is supposed to do, I'm getting consistent failures to fulfill that goal.
To reproduce:
# ./repro.py
import importlib
import threading
import reloadee
counter = 0
reloading = True
threading.Thread(target=importlib.reload, args=(reloadee,)).start()
threading.Thread(target=importlib.reload, args=(reloadee,)).start()
# ./reloadee.py
import __main__
if getattr(__main__, "reloading", False):
count = __main__.counter = __main__.counter + 1
if count > 1:
print(f"race condition: reloaded again")
Place the above files in the current working directory.
If python repro.py
(test on whichever supported Python version you wish) outputs race condition: reloaded again
, the repro script hit a race condition, because the module reloadee
was reloaded more than once, by separate threads.
If it outputs nothing, it's a case where the race condition was missed.
Consequently, running
(for _ in {1..100}; do python repro.py; done) | wc -l
consistently yields numbers in range 80-90 on my personal machine.
Note
There is no case of module contents being executed in parallel, because _bootstrap._exec
executes only one module at a time (thanks to _ModuleLockManager
):
cpython/Lib/importlib/_bootstrap.py
Lines 845 to 848 in 3d9f9ae
Problem 2: Classic LBYL race condition
Because the current implementation doesn't synchronize _RELOADING
access, it is easy to see that between those lines
cpython/Lib/importlib/__init__.py
Lines 110 to 111 in 85036c8
the reloaded module (_RELOADING[name]
) can disappear from _RELOADING
, in case a different thread executes
cpython/Lib/importlib/__init__.py
Line 134 in 85036c8
I think you can imagine this being pretty likely to happen either :)
It is nicely explained in the glossary.
But let me know if we still need a repro for that one too.
Suggested solution
A simple RLock()
(not Lock()
, see point 1. in Problem 1 – same thread that invoked reload
can re-run it, and thus should re-enter the lock without being blocked) only around the critical section
cpython/Lib/importlib/__init__.py
Lines 110 to 112 in 85036c8
should suffice.
Please let me know what you think about that suggestion. I'm wondering how to test this future fix specifically – I tend to like threading.Barrier
and time.sleep
to ensure 100% reproducibility of cases, but maybe there's a better way?
Expect a PR from me as soon as the decision (fix / document thread-unsafety) is made!
CPython versions tested on:
3.9, 3.10, 3.11, 3.12, 3.13, 3.14, CPython main branch
Operating systems tested on:
Linux, macOS, Windows