Skip to content

Failed assertion in Python/legacy_tracing.c:431 on a free-threading build #127235

Open
@devdanzin

Description

@devdanzin

Crash report

What happened?

It's possible to cause an abort on !_PyMem_IsPtrFreed(tstate) while running with PYTHON_GIL=0 by calling the following code:

import threading

def errback(*args, **kw):
    raise ValueError('error')

for x in range(200):
    threading._start_joinable_thread(errback)
    try:
        threading.setprofile_all_threads("")
    except:
        pass

Abort message:

python: Python/legacy_tracing.c:431: is_tstate_valid: Assertion `!_PyMem_IsPtrFreed(tstate)' failed.
Aborted

Not sure this isn't an expected failure mode.
Found using fusil by @vstinner.

CPython versions tested on:

3.13, 3.14, CPython main branch

Operating systems tested on:

Linux

Output from running 'python -VV' on the command line:

Python 3.14.0a2+ experimental free-threading build (heads/main-dirty:a13e94d84bf, Nov 23 2024, 07:16:19) [GCC 11.4.0]

Activity

added
type-crashA hard crash of the interpreter, possibly with a core dump
on Nov 24, 2024
ZeroIntensity

ZeroIntensity commented on Nov 25, 2024

@ZeroIntensity
Member

Does this occur without use of the private _start_joinable_thread?

colesbury

colesbury commented on Nov 25, 2024

@colesbury
Contributor

I'm pretty sure that this is not limited to free threading. It's not safe to access other PyThreadStates without holding the runtime lock because those threads may exit. I think that we release the lock here because of reentrancy/deadlock concerns, but it's still not safe.

The reproducers involving the GIL are slightly more complicated because PyThreadState_DeleteCurrent is called with the GIL, which prevents most of these crashes. However, PyThreadState_Delete is called without the GIL and can lead to the crash.

cpython/Python/ceval.c

Lines 2402 to 2421 in 3e7ce6e

void
PyEval_SetProfileAllThreads(Py_tracefunc func, PyObject *arg)
{
PyThreadState *this_tstate = _PyThreadState_GET();
PyInterpreterState* interp = this_tstate->interp;
_PyRuntimeState *runtime = &_PyRuntime;
HEAD_LOCK(runtime);
PyThreadState* ts = PyInterpreterState_ThreadHead(interp);
HEAD_UNLOCK(runtime);
while (ts) {
if (_PyEval_SetProfile(ts, func, arg) < 0) {
PyErr_FormatUnraisable("Exception ignored in PyEval_SetProfileAllThreads");
}
HEAD_LOCK(runtime);
ts = PyThreadState_Next(ts);
HEAD_UNLOCK(runtime);
}
}

I think we have various reports of the same underlying issue. This issue is a nice reproducer for free threading, though.

For example:

ZeroIntensity

ZeroIntensity commented on Nov 27, 2024

@ZeroIntensity
Member

Ok, I've pondered this issue for a day or so and I think that the simplest solution for this specific case is to just wait for the per-interpreter thread state lock (GH-127115) to get merged. HEAD_LOCK tends to be annoyingly reentrant, so by just switching to that, we should be able to fix any unsafe usage because it shouldn't be nearly as reentrancy prone. (FWIW, I'm somewhat in favor of making the runtime lock recursive with how much of a PITA it is to use, but whatever.)

The much bigger issue at hand here is that PyThreadState_Next and friends are public APIs, so they aren't able to HEAD_LOCK. Somehow, we need to lock inside of those functions rather than relying on the caller to lock the runtime. I don't think there's anything we can do about PyThreadState_Next, because even if we somehow got thread states in a thread-safe way internally, and then magically made it safe to use the thread state without worrying about concurrent modifications (which would be impossible, unless we did something silly like copying the thread state and returning that), there's nothing stopping the list of thread states from being changed in between calls, so it could possibly skip over entries or return duplicates. Our best shot is probably to just deprecate PyThreadState_Next and invent something new and safer for users.

Some other functions we need to worry about in terms of thread safety:

  • PyInterpreterState_ThreadHead: Same issue as PyThreadState_Next, pretty much.
  • PyGILState_GetThisThreadState: I'm not sure how this will work with subinterpreters, because apparently PyGILState doesn't play nicely with them. I could imagine some interesting edge cases coming up related to locking there because apparently GILState checks are disabled if subinterpreters are active.
  • PyThreadState_GetDict: Returns an evil borrowed reference. If the thread state gets switched out, the user can still hold a reference to that dictionary. Even worse, on a GILful build, trying to use that dictionary without having the thread state attached is just not thread safe at all if you try to access it from a different GIL.
  • PyThreadState_GetFrame: Not borrowed, but still has the object thread safety issue with the cross-GIL that I mentioned above. I think for both of these functions it's probably fine to just mention in the docs that you can only use the references while the thread state is attached.
  • PyThreadState_Swap: Not inherently dangerous, but could lead to misue that we should document. This returns a not-attached thread state, which is ripe for abuse by any of the other PyThreadState APIs. (Also, what happens if e.g. the interpreter starts to exit while its detached?)

For the future, I think we should probably steer clear of any support for using a detached thread state. Perhaps we could invent some new API (PyThreadStateView?) that has more clear thread safety requirements and push that for users.

TL;DR we probably need to document and deprecate some non-thread-safe thread state APIs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @colesbury@ZeroIntensity@devdanzin

        Issue actions

          Failed assertion in `Python/legacy_tracing.c:431` on a free-threading build · Issue #127235 · python/cpython