Description
Hi! So I noticed that TSAN complains about races inside concurrentqueue. I was able to reproduce it with the following minimal example:
#include "concurrentqueue.h"
#include <thread>
#include <vector>
int main() {
moodycamel::ConcurrentQueue<int> q;
std::vector<std::thread> threads;
for (size_t i = 0; i < 5; ++i) {
threads.emplace_back([&q]() {
while (true) {
int item;
q.try_dequeue(item);
}
});
}
for (size_t i = 0; i < 10; ++i) {
threads.emplace_back([&q]() {
while (true) {
q.try_enqueue(1);
}
});
}
for (auto& t : threads) {
t.join();
}
}
...
WARNING: ThreadSanitizer: data race (pid=1709853)
Write of size 4 at 0x728c00000718 by thread T14:
#0 bool moodycamel::ConcurrentQueue<int, moodycamel::ConcurrentQueueDefaultTraits>::ImplicitProducer::enqueue<(moodycamel::ConcurrentQueue<int, moodycamel::ConcurrentQueueDefaultTraits>::AllocationMode)1, int>(int&&) ... concurrentqueue.h:2544:4
#1 bool moodycamel::ConcurrentQueue<int, moodycamel::ConcurrentQueueDefaultTraits>::inner_enqueue<(moodycamel::ConcurrentQueue<int, moodycamel::ConcurrentQueueDefaultTraits>::AllocationMode)1, int>(int&&) ... concurrentqueue.h:1376:94
#2 moodycamel::ConcurrentQueue<int, moodycamel::ConcurrentQueueDefaultTraits>::try_enqueue(int&&) ... concurrentqueue.h:1074:15
#3 main::$_1::operator()() const ... main.cpp:23:19
...
Previous read of size 4 at 0x728c00000718 by thread T5:
#0 bool moodycamel::ConcurrentQueue<int, moodycamel::ConcurrentQueueDefaultTraits>::ImplicitProducer::dequeue<int>(int&) ... concurrentqueue.h:2596:17
#1 bool moodycamel::ConcurrentQueue<int, moodycamel::ConcurrentQueueDefaultTraits>::ProducerBase::dequeue<int>(int&) ... concurrentqueue.h:1720:50
#2 bool moodycamel::ConcurrentQueue<int, moodycamel::ConcurrentQueueDefaultTraits>::try_dequeue<int>(int&) ... concurrentqueue.h:1146:32
#3 main::$_0::operator()() const ... main.cpp:15:19
...
I believe this race (in terms of C++ memory model) happens like this:
T0: Thread A reads from memory slot S.
T1: Thread A releases memory slot S of block X.
T2: Thread B releases last slot of block X, pushes block into the freelist.
T3: Thread C acquires block X from the freelist, writes to slot S.
I guess release
in this and similar places should be acq_rel
(or release
+ acquire fence
on the last increment):
https://github.com/cameron314/concurrentqueue/blob/master/concurrentqueue.h#L1607
to build happens-before relation between the slot release (thread A) and the whole block release (thread B) and subsequent push to the freelist.
(Tested it, TSAN stopped complaining).
I am not so good with C++ memory model, so maybe I am missing something here?