Description
Issue by LeeHowes
Tuesday Nov 30, 2021 at 01:21 GMT
Originally opened as NVIDIA/stdexec#289
We have currently defined the algorithms to call-through on completion. This means that a set_value
call to then(f)
's receiver will call f inline and then call set_value
on the next receiver in the chain. In simple cases like then
this works fine, in structured cases there is danger here.
Given:
some_sender() | let_value([](){return some_complex_work_sender_that_does_Somethign_on_a_random_scheduler()}) | bulk(complex stuff)
some_sender
will complete, say on context c1
. However, let_value
triggers async work. This may run on a thread pool that is completely unpredictable given the written code. When that async scheduler completes, it is that that bulk
will now run on.
In principle that is fine - bulk should customise on it or fail to do so - but reading that code it is surprising. It is also a potential source of very serious bugs. The reason we put a lot of effort into SemiFuture at Facebook was to disallow attaching continuations to work without calling .via(executor)
first, thereby enforcing a transition. Similarly folly::coro
mandates that co_await
transition back such that the seemingly equivalent coroutine code:
co_await some_sender();
co_await some_complex_work_sender_that_does_Somethign_on_a_random_scheduler();
co_await bulk(complex stuff);
Always explicitly resumes on the scheduler associated with the coroutine. This is an important safety feature to avoid the async work causing a lot of chained work to execute on that remote agent - an agent that is not visible to the author of the code, with unknown properties.
The simplest solution to this would be to enforce that structured constructs that transition context should resume on a known context:
let
would resume on either the context it started on, ie the completion_scheduler of the caller, or, preferably, the scheduler provided byget_scheduler
on the passed receiver.on(s2, on(s, f))
wouldstart()
f
ons
but would transition back tos2
on return.
This would at least cover the default algorithms, making default uses safer. I believe it would also better match the reader's expectation of where code runs. Transitions are still explicit in the sense that if we add a scoped construct we enforce a transition.
In general though, I would like us to find a way to be much more concrete throughout the model about precisely where work is running and when.