Skip to content

Upper bound on lifetime of operation states #70

Open
@ericniebler

Description

@ericniebler

Issue by msimberg
Tuesday Sep 12, 2023 at 10:49 GMT
Originally opened as NVIDIA/stdexec#1076


TL;DR operation states currently have a lower bound on their lifetimes (at least until set_x is called on an associated receiver). Should they have an upper bound?

  • Should some adaptors release operation states as soon as they can? Typical examples of adaptors that could do this are the "two-part" adaptors like schedule_from, let_value, ensure_started, split, when_all.
  • Should all adaptors release operation states as soon as they can?
  • Is it always safe to release operation states early or does it lead to issues elsewhere? The order of destruction is changed, but I don't know if this can cause problems somewhere.

FWIW in HPX and pika we release some operation states early to avoid problems like below. We just rediscovered the problems when trying to use stdexec in place of the current implementation.

A few motivating examples below.

split

Once the the r receiver of of split has received a completion signal, the op_state2 does not necessarily need to be kept around because the values sent by the predecessor have been stored in the split shared state (https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2300r7.html#spec-execution.senders.adapt.split). stdexec currently does keep op_state2 around until the shared state itself is released. This can lead to some, IMO, surprising behaviour.

The following example avoids deep recursion in the "forward" direction by explicitly transfering to new tasks. It creates a DAG that roughly looks like this:

o_1 --> o --> o --> o ... o_n
   \     \     \     \
    o     o     o     o ...

The unintuitive thing is even if the work itself was done without recursing deeply, once all the work is done there will be recursion proportional to n when the operation states of o_n to o_1 are released. On regular OS threads this isn't a huge problem, but the stack is quite easy to exhaust on e.g. fibers where the stack is a lot smaller.

  using any_sender_of_void =
    any_sender_of<ex::set_value_t(int), ex::set_error_t(std::exception_ptr), ex::set_stopped_t()>;
  exec::static_thread_pool pool{4};
  auto sched = pool.get_scheduler();

  std::size_t const n = 10000;
  std::vector<any_sender_of_void> senders;
  senders.reserve(n);
  any_sender_of_void cur = ex::just(1);

  for (std::size_t i = 0; i < n; ++i) {
    auto split = std::move(cur) | ex::split();
    auto transfer1 = split | ex::transfer(sched) | ex::then([](int x) { std::cerr << x << '\n'; });
    auto transfer2 = std::move(split) | ex::transfer(sched) | ex::then([](int x) { return x + 1; });
    cur = any_sender_of_void(std::move(transfer2));
  }

  for (auto& s: senders) {
    // The last sync_wait ends up freeing a whole stack of n split op states,
    // even though they could be freed incrementally because the split op state
    // takes ownership of values sent from predecessors
    stdexec::sync_wait(std::move(s));
  }

schedule_from/transfer

One can trigger the same problem with a simpler example. The previous example was simplified from a real example, which is why it was a bit more complicated.

schedule_from also does not release op_state2 as soon as it could release it (https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2300r7.html#spec-execution.senders.adaptors.schedule_from), leading to deep recursion in the destructors of the operation states, despite avoiding deep recursion in the forward direction with new tasks.

  using any_sender_of_void =
    any_sender_of<ex::set_value_t(int), ex::set_error_t(std::exception_ptr), ex::set_stopped_t()>;
  exec::static_thread_pool pool{4};
  auto sched = pool.get_scheduler();

  std::size_t const n = 10000;
  any_sender_of_void cur = ex::just(1);

  for (std::size_t i = 0; i < n; ++i) {
    cur = any_sender_of_void(
      std::move(cur) | ex::transfer(sched) | ex::then([](int x) { return x + 1; }));
  }

  stdexec::sync_wait(std::move(cur));

let_value

This example shows unintuitive (again, IMO) lifetimes of the values kept alive by the let_value adaptor. (One of) the use cases of let_value is to keep something alive for the duration an asynchronous operation returned by the callable passed to let_value. It does that, but it keeps the values alive until the operation state of let_value is released, which could be much later. The below example uses memory as the precious resource that's kept alive longer than necessary, but the same applies for anything else that you might want to release as soon as possible (file handles, asynchronous locks, etc.)

One can work around this by manually releasing the values, but that requires knowing that the values are kept alive longer to start with.

  exec::static_thread_pool pool{4};
  auto sched = pool.get_scheduler();

  auto s = ex::schedule(sched) | ex::then([]() { return std::vector<int>(1000000, 42); })
         | ex::let_value([](auto& v) {
             return ex::just() | ex::then([&]() { /* do something with v */ });
             // could do this here:
             // | ex::then([&]() { auto v = std::move(v); });
           })
         | ex::then([]() {
             // We are not using v anymore here, but it stays alive in the
             // let_value op state
             std::this_thread::sleep_for(std::chrono::seconds(5));
           });
  // The large allocation is freed only when the sync_wait op state is freed
  stdexec::sync_wait(std::move(s));

Activity

lewissbaker

lewissbaker commented on Jul 24, 2024

@lewissbaker
Collaborator

This seems to be a duplicate of (or at least closely related to) #239.

lewissbaker

lewissbaker commented on Jul 24, 2024

@lewissbaker
Collaborator

@msimberg Would the adoption of the statement() algorithm listed in #239 help fix the use-cases listed above?

The split example could be modified as follows:

   using any_sender_of_void =
     any_sender_of<ex::set_value_t(int), ex::set_error_t(std::exception_ptr), ex::set_stopped_t()>;
   exec::static_thread_pool pool{4};
   auto sched = pool.get_scheduler();
 
   std::size_t const n = 10000;
   std::vector<any_sender_of_void> senders;
   senders.reserve(n);
   any_sender_of_void cur = ex::just(1);
 
   for (std::size_t i = 0; i < n; ++i) {
-    auto split = std::move(cur) | ex::split();
+    auto split = ex::statement(std::move(cur)) | ex::split();
     auto transfer1 = split | ex::transfer(sched) | ex::then([](int x) { std::cerr << x << '\n'; });
     auto transfer2 = std::move(split) | ex::transfer(sched) | ex::then([](int x) { return x + 1;  });
     cur = any_sender_of_void(std::move(transfer2));
   }

   for (auto& s: senders) {
     // The last sync_wait ends up freeing a whole stack of n split op states,
     // even though they could be freed incrementally because the split op state
     // takes ownership of values sent from predecessors
     stdexec::sync_wait(std::move(s));
   }

The alternative is to make split always apply logic equivalent to statement() around the wrapped operation.

Similarly, the transfer example could be rewritten as follows:

   using any_sender_of_void =
     any_sender_of<ex::set_value_t(int), ex::set_error_t(std::exception_ptr), ex::set_stopped_t()>;
   exec::static_thread_pool pool{4};
   auto sched = pool.get_scheduler();
 
   std::size_t const n = 10000;
   any_sender_of_void cur = ex::just(1);
 
   for (std::size_t i = 0; i < n; ++i) {
     cur = any_sender_of_void(
-      std::move(cur) | ex::transfer(sched) | ex::then([](int x) { return x + 1; }));
+      ex::statement(std::move(cur)) | ex::transfer(sched) | ex::then([](int x) { return x + 1; }));
   }
 
   stdexec::sync_wait(std::move(cur));

And the let_value example could be modified as follows:

   exec::static_thread_pool pool{4};
   auto sched = pool.get_scheduler();
 
-  auto s = ex::schedule(sched) | ex::then([]() { return std::vector<int>(1000000, 42); })
+  auto s = ex::statement(ex::schedule(sched) | ex::then([]() { return std::vector<int>(1000000, 42); })
          | ex::let_value([](auto& v) {
              return ex::just() | ex::then([&]() { /* do something with v */ });
              // could do this here:
              // | ex::then([&]() { auto v = std::move(v); });
            })
+           }))
          | ex::then([]() {
              // We are not using v anymore here, but it stays alive in the
              // let_value op state
              std::this_thread::sleep_for(std::chrono::seconds(5));
            });
   // The large allocation is freed only when the sync_wait op state is freed
   stdexec::sync_wait(std::move(s));
msimberg

msimberg commented on Jul 24, 2024

@msimberg
Contributor

@lewissbaker yeah, absolutely. That's essentially what we're using at the moment when using stdexec, which doesn't release operation states as early as pika's implementation does.

This is of course only syntactic, but I'm slightly partial to a pipeable version, mostly because that's what I'm used to. I suppose when piped into it's more of ex::semicolon, but I doubt that name is going to work very well 🙃

inbal2l

inbal2l commented on Jul 31, 2024

@inbal2l
Member

The alternative is to make split always apply logic equivalent to statement() around the wrapped operation.

@lewissbaker - do you happen to have an example of wording to reflect this option?

lewissbaker

lewissbaker commented on Jul 31, 2024

@lewissbaker
Collaborator

The alternative is to make split always apply logic equivalent to statement() around the wrapped operation.

@lewissbaker - do you happen to have an example of wording to reflect this option?

We would need to tweak the wording for shared-state::notify() in exec.split p16 to have it destroy the op_state member before calling p->notify() on the list of registered receivers.

This would need a tweak to the data-member of op_state to make it an optional or some other type that allows the lifetime to be manually managed.

inbal2l

inbal2l commented on Oct 23, 2024

@inbal2l
Member

Paper published: "P3373: Of Operation States and Their Lifetimes" (Robert Leahy)

added
needs-paperNeeds a paper to be written
processedprocessed in a meeting
pending-wg21A paper or an LWG issue exits
and removed
needs-paperNeeds a paper to be written
on Oct 23, 2024

5 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    P0Paper existsPaper exists, either WIP or pending-wg21designpending-wg21A paper or an LWG issue exitsprocessedprocessed in a meeting

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @msimberg@lewissbaker@ericniebler@inbal2l

        Issue actions

          Upper bound on lifetime of operation states · Issue #70 · cplusplus/sender-receiver