Skip to content

ABA-problem with pointer provenance in lockless queues #121950

Open
@joboet

Description

@joboet

Both lockless queue implementations inside std for Once and RwLock suffer from the ABA problem, resulting in unsoundness.

In both cases, the enqueue operation roughly looks like the following:

struct Node {
    next: *mut Node,
}

struct Queue {
    state: AtomicPtr<Node>,
}

let mut state = queue.state.load();
loop {
    node.next = to_node(state); // Some pointer masking
    let next = to_state(node); // Some bittagging
    match state.compare_exchange(state, next) {
        Ok(_) => break,
        Err(new) => state = next,
    }
}

This code is problematic, because it allows the following to occur:

  1. Thread 1: Append node 1 to the queue.
  2. Thread 2: Load the state (a pointer to node 1) and store it as the next pointer of node 2.
  3. Thread 1: Remove node 1 from the queue and deallocate it/modify it.
  4. Thread 1: Create node 3 at the same address as node 1 and append it to the queue.
  5. Thread 2: Perform the CAS. Because the state has the same bits, this succeeds.

Now, any traversal operation on the queue that attempts to dereference the next pointer of node 2 exhibits UB, as the pointer, while having the right address, only has provenance to node 1, which is no longer valid, but may not access node 3.

This is of little practical concern, as it is extremely unlikely that LLVM will perform a problematic optimization; but nevertheless the code is unsound, both under stacked-borrows and LLVM's semantics.

This is the cause of #121626, where this exact situation led to a (correct) UB report from miri.

@rustbot label +T-libs +I-unsound

Possible solutions

The only solution I see at the moment is to use fuzzy provenance and use from_exposed_provenance inside the traversal operation to make miri/LLVM guess the right provenance.

Activity

added
needs-triageThis issue may need triage. Remove it if it has been sufficiently triaged.
I-unsoundIssue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness
T-libsRelevant to the library team, which will review and decide on the PR/issue.
I-prioritizeIssue: Indicates that prioritization has been requested for this issue.
on Mar 3, 2024
the8472

the8472 commented on Mar 3, 2024

@the8472
Member

Since this is mac-specific why not use a DWCAS? Afaik they don't have any hardware that doesn't have those.

joboet

joboet commented on Mar 3, 2024

@joboet
MemberAuthor

It's not mac-specific, though. Once gets used on all platforms except the futex ones, I believe even Windows. RwLock gets used on NetBSD and other UNIXes as well, and I'd like to use it to get rid of the unsound SGX lock implementations. So we need to find something portable here.

Also, this is a provenance-only problem, so I'd like to avoid anything that makes the machine code less efficient, because in terms of what is actually generated, the current implementation is perfectly bug-free. It "just" violates the memory model.

the8472

the8472 commented on Mar 3, 2024

@the8472
Member

And using the return value from the CAS (the Ok(_)) and working on that instead of state doesn't work either?

joboet

joboet commented on Mar 3, 2024

@joboet
MemberAuthor

No, because the old pointer is stored inside the node, and we can't retroactively update the pointer, because a traversal operation could access the old one immediately after the compare_exchange.

saethlin

saethlin commented on Mar 3, 2024

@saethlin
Member

to make miri/LLVM guess the right provenance.

I don't think LLVM implements expose semantics in any sense, so I suspect doing the expose dance here wouldn't change LLVM's view of the code. (Happy to be corrected on this)

removed
needs-triageThis issue may need triage. Remove it if it has been sufficiently triaged.
on Mar 3, 2024
joboet

joboet commented on Mar 3, 2024

@joboet
MemberAuthor

LLVM's pointer aliasing rules mention that the pointer resulting from inttoptr is based on "all pointer values that contribute (directly or indirectly) to the computation of the pointer’s value". IIUC, that means that because the traversing thread reads the current pointer from state and uses it to load the integer that is then passed to inttoptr, the resulting pointer will also be based on the current pointer, making the code sound.

added
A-atomicArea: Atomics, barriers, and sync primitives
on Mar 3, 2024
RalfJung

RalfJung commented on Mar 3, 2024

@RalfJung
Member

Also, this is a provenance-only problem, so I'd like to avoid anything that makes the machine code less efficient, because in terms of what is actually generated, the current implementation is perfectly bug-free. It "just" violates the memory model.

That is assuming that we don't have optimizations that make full use of the provenance model. Or have you inspected the generated assembly to be sure?

This looks extremely related to rust-lang/unsafe-code-guidelines#480 to me. I wonder if the CAS semantics I proposed here would help.

joboet

joboet commented on Mar 3, 2024

@joboet
MemberAuthor

I inspected the assembly of a simplified version (I removed the Thread handle and park operations). Also, the Once implementation has existed for ~9 years, with no issue encountered.

saethlin

saethlin commented on Mar 3, 2024

@saethlin
Member

The age of an implementation is not an indicator of its soundness with respect to provenance rules. If anything, perhaps the correlation runs the other direction.

I'm willing to review codegen tests for suspicious things like this, if someone can find a way to write them that's both effective and doesn't make them break often. I don't think we usually check in codegen tests for structures as complex as this.

26 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-atomicArea: Atomics, barriers, and sync primitivesI-unsoundIssue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/SoundnessP-mediumMedium priorityT-libsRelevant to the library team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @RalfJung@the8472@apiraino@saethlin@joboet

        Issue actions

          ABA-problem with pointer provenance in lockless queues · Issue #121950 · rust-lang/rust