Skip to content

threading.Thread.native_id for forking thread wrong after fork #132542

Open
@Oberon00

Description

@Oberon00

Bug report

Bug description:

On Linux, native thread IDs are unique across processes. This means that the native thread ID of a forking thread necessarily changes at forking. The threading module initializes the thread ID only when the thread is started. Hence, after forking, the old thread ID is retained, i.e. it becomes wrong for the forked process.

import threading, os

print(os.getpid(), threading.current_thread(), threading.current_thread().native_id,
    os.listdir(f"/proc/{os.getpid()}/task"), flush=True)
res = os.fork()
kind = "c" if res == 0 else "p"
print(os.getpid(), kind, threading.current_thread(), threading.current_thread().native_id,
    os.listdir(f"/proc/{os.getpid()}/task"), flush=True)

This prints for example:

$ python3.13 ~/p/fork.py 
227148 <_MainThread(MainThread, started 139735605239936)> 227148 ['227148']
227148 p <_MainThread(MainThread, started 139735605239936)> 227148 ['227148']
227150 c <_MainThread(MainThread, started 139735605239936)> 227148 ['227150']

Notice that the child (c) still prints the same native thread ID 227148 as the the parent while in fact the thread ID has changed to 227150 and the old native thread ID is not present at all in the child process.

This is not specific to the main thread though, as the following variant of the sample demonstrates:

import threading, os

def run():
    print(os.getpid(), "t", threading.current_thread(), threading.current_thread().native_id,
        os.listdir(f"/proc/{os.getpid()}/task"), flush=True)
    res = os.fork()
    kind = "c" if res == 0 else "p"
    print(os.getpid(), kind, threading.current_thread(), threading.current_thread().native_id,
        os.listdir(f"/proc/{os.getpid()}/task"), flush=True)

print(os.getpid(), "m", threading.current_thread(), threading.current_thread().native_id,
        os.listdir(f"/proc/{os.getpid()}/task"), flush=True)
th = threading.Thread(target=run)
th.start()
th.join()

This multi-threaded sample prints for example:

230968 m <_MainThread(MainThread, started 140596997587072)> 230968 ['230968']
230968 t <Thread(Thread-1 (run), started 140596994700992)> 230969 ['230968', '230969']
230970 c <Thread(Thread-1 (run), started 140596994700992)> 230969 ['230970']
/home/labuser/p/fork.py:6: DeprecationWarning: This process (pid=230968) is multi-threaded, use of fork() may lead to deadlocks in the child.
  res = os.fork()
230968 p <Thread(Thread-1 (run), started 140596994700992)> 230969 ['230968', '230969']

This bug was previously seen in #82888 with the multiprocessing module but only worked around in #17088 for the multiprocessing module in and it is still present when using any other way of forking.

CPython versions tested on:

3.13, 3.8

Operating systems tested on:

Linux

Proposed Solution

This could be solved with an atfork_child handler if you don't care about the ID being wrong in earlier atfork_child handlers. Safer but more complex would be marking the thread as being in-forking in atfork-prepare and then re-query the native thread IDevery time until atfork-child/parent unmarks the thread.

The native thread ID in the C API Thread state might also require a fix, I did not check it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    extension-modulesC modules in the Modules dirtype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions