Skip to content

Avoiding inline execution in scoped algorithms #111

Open
@ericniebler

Description

@ericniebler

Issue by LeeHowes
Tuesday Nov 30, 2021 at 01:21 GMT
Originally opened as NVIDIA/stdexec#289


We have currently defined the algorithms to call-through on completion. This means that a set_value call to then(f)'s receiver will call f inline and then call set_value on the next receiver in the chain. In simple cases like then this works fine, in structured cases there is danger here.

Given:

some_sender() | let_value([](){return some_complex_work_sender_that_does_Somethign_on_a_random_scheduler()}) | bulk(complex stuff)

some_sender will complete, say on context c1. However, let_value triggers async work. This may run on a thread pool that is completely unpredictable given the written code. When that async scheduler completes, it is that that bulk will now run on.

In principle that is fine - bulk should customise on it or fail to do so - but reading that code it is surprising. It is also a potential source of very serious bugs. The reason we put a lot of effort into SemiFuture at Facebook was to disallow attaching continuations to work without calling .via(executor) first, thereby enforcing a transition. Similarly folly::coro mandates that co_await transition back such that the seemingly equivalent coroutine code:

co_await some_sender();
co_await some_complex_work_sender_that_does_Somethign_on_a_random_scheduler();
co_await bulk(complex stuff);

Always explicitly resumes on the scheduler associated with the coroutine. This is an important safety feature to avoid the async work causing a lot of chained work to execute on that remote agent - an agent that is not visible to the author of the code, with unknown properties.

The simplest solution to this would be to enforce that structured constructs that transition context should resume on a known context:

  • let would resume on either the context it started on, ie the completion_scheduler of the caller, or, preferably, the scheduler provided by get_scheduler on the passed receiver.
  • on(s2, on(s, f)) would start() f on s but would transition back to s2 on return.

This would at least cover the default algorithms, making default uses safer. I believe it would also better match the reader's expectation of where code runs. Transitions are still explicit in the sense that if we add a scoped construct we enforce a transition.

In general though, I would like us to find a way to be much more concrete throughout the model about precisely where work is running and when.

Metadata

Metadata

Assignees

No one assigned

    Labels

    discussionWe need to talk about this; there's nothing actionable here yet

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions