Description
Issue by msimberg
Tuesday Sep 12, 2023 at 10:49 GMT
Originally opened as NVIDIA/stdexec#1076
TL;DR operation states currently have a lower bound on their lifetimes (at least until set_x is called on an associated receiver). Should they have an upper bound?
- Should some adaptors release operation states as soon as they can? Typical examples of adaptors that could do this are the "two-part" adaptors like
schedule_from
,let_value
,ensure_started
,split
,when_all
. - Should all adaptors release operation states as soon as they can?
- Is it always safe to release operation states early or does it lead to issues elsewhere? The order of destruction is changed, but I don't know if this can cause problems somewhere.
FWIW in HPX and pika we release some operation states early to avoid problems like below. We just rediscovered the problems when trying to use stdexec in place of the current implementation.
A few motivating examples below.
split
Once the the r
receiver of of split
has received a completion signal, the op_state2
does not necessarily need to be kept around because the values sent by the predecessor have been stored in the split
shared state (https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2300r7.html#spec-execution.senders.adapt.split). stdexec currently does keep op_state2
around until the shared state itself is released. This can lead to some, IMO, surprising behaviour.
The following example avoids deep recursion in the "forward" direction by explicitly transfering to new tasks. It creates a DAG that roughly looks like this:
o_1 --> o --> o --> o ... o_n
\ \ \ \
o o o o ...
The unintuitive thing is even if the work itself was done without recursing deeply, once all the work is done there will be recursion proportional to n
when the operation states of o_n
to o_1
are released. On regular OS threads this isn't a huge problem, but the stack is quite easy to exhaust on e.g. fibers where the stack is a lot smaller.
using any_sender_of_void =
any_sender_of<ex::set_value_t(int), ex::set_error_t(std::exception_ptr), ex::set_stopped_t()>;
exec::static_thread_pool pool{4};
auto sched = pool.get_scheduler();
std::size_t const n = 10000;
std::vector<any_sender_of_void> senders;
senders.reserve(n);
any_sender_of_void cur = ex::just(1);
for (std::size_t i = 0; i < n; ++i) {
auto split = std::move(cur) | ex::split();
auto transfer1 = split | ex::transfer(sched) | ex::then([](int x) { std::cerr << x << '\n'; });
auto transfer2 = std::move(split) | ex::transfer(sched) | ex::then([](int x) { return x + 1; });
cur = any_sender_of_void(std::move(transfer2));
}
for (auto& s: senders) {
// The last sync_wait ends up freeing a whole stack of n split op states,
// even though they could be freed incrementally because the split op state
// takes ownership of values sent from predecessors
stdexec::sync_wait(std::move(s));
}
schedule_from
/transfer
One can trigger the same problem with a simpler example. The previous example was simplified from a real example, which is why it was a bit more complicated.
schedule_from
also does not release op_state2
as soon as it could release it (https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2300r7.html#spec-execution.senders.adaptors.schedule_from), leading to deep recursion in the destructors of the operation states, despite avoiding deep recursion in the forward direction with new tasks.
using any_sender_of_void =
any_sender_of<ex::set_value_t(int), ex::set_error_t(std::exception_ptr), ex::set_stopped_t()>;
exec::static_thread_pool pool{4};
auto sched = pool.get_scheduler();
std::size_t const n = 10000;
any_sender_of_void cur = ex::just(1);
for (std::size_t i = 0; i < n; ++i) {
cur = any_sender_of_void(
std::move(cur) | ex::transfer(sched) | ex::then([](int x) { return x + 1; }));
}
stdexec::sync_wait(std::move(cur));
let_value
This example shows unintuitive (again, IMO) lifetimes of the values kept alive by the let_value
adaptor. (One of) the use cases of let_value
is to keep something alive for the duration an asynchronous operation returned by the callable passed to let_value
. It does that, but it keeps the values alive until the operation state of let_value
is released, which could be much later. The below example uses memory as the precious resource that's kept alive longer than necessary, but the same applies for anything else that you might want to release as soon as possible (file handles, asynchronous locks, etc.)
One can work around this by manually releasing the values, but that requires knowing that the values are kept alive longer to start with.
exec::static_thread_pool pool{4};
auto sched = pool.get_scheduler();
auto s = ex::schedule(sched) | ex::then([]() { return std::vector<int>(1000000, 42); })
| ex::let_value([](auto& v) {
return ex::just() | ex::then([&]() { /* do something with v */ });
// could do this here:
// | ex::then([&]() { auto v = std::move(v); });
})
| ex::then([]() {
// We are not using v anymore here, but it stays alive in the
// let_value op state
std::this_thread::sleep_for(std::chrono::seconds(5));
});
// The large allocation is freed only when the sync_wait op state is freed
stdexec::sync_wait(std::move(s));
Activity
lewissbaker commentedon Jul 24, 2024
This seems to be a duplicate of (or at least closely related to) #239.
lewissbaker commentedon Jul 24, 2024
@msimberg Would the adoption of the
statement()
algorithm listed in #239 help fix the use-cases listed above?The
split
example could be modified as follows:The alternative is to make
split
always apply logic equivalent tostatement()
around the wrapped operation.Similarly, the
transfer
example could be rewritten as follows:And the
let_value
example could be modified as follows:msimberg commentedon Jul 24, 2024
@lewissbaker yeah, absolutely. That's essentially what we're using at the moment when using stdexec, which doesn't release operation states as early as pika's implementation does.
This is of course only syntactic, but I'm slightly partial to a pipeable version, mostly because that's what I'm used to. I suppose when piped into it's more of
ex::semicolon
, but I doubt that name is going to work very well 🙃inbal2l commentedon Jul 31, 2024
@lewissbaker - do you happen to have an example of wording to reflect this option?
lewissbaker commentedon Jul 31, 2024
We would need to tweak the wording for
shared-state::notify()
in exec.split p16 to have it destroy theop_state
member before callingp->notify()
on the list of registered receivers.This would need a tweak to the data-member of
op_state
to make it anoptional
or some other type that allows the lifetime to be manually managed.inbal2l commentedon Oct 23, 2024
Paper published: "P3373: Of Operation States and Their Lifetimes" (Robert Leahy)
5 remaining items