Skip to content

Consolidate together Bevy's TaskPools [adopted] #18163

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

hymm
Copy link
Contributor

@hymm hymm commented Mar 5, 2025

This is an adoption of #12090

Changes from #12090

  • Added a spawn_blocking and spawn_blocking_async to scope. This allows the gltf plugin to not block the compute task pool when running tasks.

Todo

  • figure out what to do with setting the environment variable for limiting the number of blocking threads. set_var is now unsafe and the api in the old pr could potentially set the environment variable from multiple threads. I may just expose setting the environment variable in the docs.
  • test loading a large gltf
  • Think a little bit more about the transmute in Scope::spawn_blocking and make sure that it is sound.
  • Review the migration guide and check for any necessary change from the merge with main.

Objective

Fixes #1907. Spiritual successor to #4740.

Bevy currently creates a 50%/25%/25% split between the Compute, AsyncCompute, and IO task pools, meaning that any given operation can only be scheduled onto that subset of threads. This is suboptimal in the cases where apps are not using any IO or Async Compute, or vice versa, where available parallelism would be under utilized due to the split not reflecting the actual use case. This PR aims to fix that under utilization.

Solution

  • Do away with the IO and AsyncCompute task pools and allocate all of the threads to the ComputeTaskPool.
  • Move all of the non-blocking IO tasks to the Compute task pool.
  • Add TaskPool::spawn_blocking as an alternative for blocking IO operations and any task that would previously be spawned on the AsyncCompute task pool. This will be backed by blocking's dynamically scaled thread pool, which will spin up and down threads depending on the need.

This allows ECS systems and parallel iterators to be scheduled onto all CPU cores instead of artificially constraining them to half of the logical cores available, as well as for typical async IO tasks to interweave onto any thread.

The use of spawn_blocking to perform CPU-bound operations will have to rely on the OS's preemptive scheduler to properly schedule the threads when the main task pool's threads are sitting idle. This comes with potential context switching costs, but is generally preferable to choking out the entire app due the task not cooperatively yielding available threads or artificially limiting available parallelism.

Note: We're already using blocking through async-fs for loading assets from disk. This shift primarily moves all of the other blocking IO tasks, and the async compute tasks into blocking's thread pool.

Changelog

Added: TaskPool::spawn_blocking Added: TaskPool::spawn_blocking_async Removed: IoTaskPool Removed: AsyncComputeTaskPool Changed: ComputeTaskPool by default now spawns a thread for every available logical CPU core.

Migration Guide

IoTaskPool and AsyncComputeTaskPool have been removed and merged with ComputeTaskPool. Replace IoTaskPool::get and AsyncComputeTaskPool::get.

If you were spawning futures that are reliant on non-blocking IO (as in the task spends most of it's time yielding via await), just spawn the task onto the ComputeTaskPool:

// in 0.13
IoTaskPool::get().spawn(async move {
     while let Ok(item) = stream.next().await {
         // process the item here
     }
});
// in 0.14
ComputeTaskPool::get().spawn(async move {
     while let Ok(item) = stream.next().await {
         // process the item here
     }
});

If you were spawning futures that are reliant on blocking IO (i.e. std::fs::File::read_to_string), use ComputeTaskPool::spawn_blocking instead. This will spawn and/or reuse one of the dynamically scaled threads explicitly made for blocking operations.

// in 0.13
IoTaskPool::get().spawn(async move {
     let contents = File::open(path).unwrap().read_to_string().unwrap();
     // process the file contents here
});
// in 0.14
ComputeTaskPool::get().spawn_blocking(move || {
     let contents = File::open(path).unwrap().read_to_string().unwrap();
     // process the file contents here
});

If you were spawning futures for async compute, use ComputeTaskPool::spawn_blocking instead. This will spawn and/or reuse one of the dynamically scaled threads explicitly made for blocking operations.

If you were spawning futures that are reliant on blocking IO (i.e. std::fs::File::read_to_string), use ComputeTaskPool::spawn_blocking instead. This will spawn and/or reuse one of the dynamically scaled threads explicitly made for blocking operations.

// in 0.13
AsyncComputeTaskPool::get().spawn(async move {
     solve_traveling_salesman();
});
// in 0.14
ComputeTaskPool::get().spawn_blocking(move || {
     solve_traveling_salesman();
});

If you were spawning futures for long running tasks that used both blocking and non-blocking work, use ComputeTaskPool::spawn_blocking_async instead.

// in 0.13
AsyncComputeTaskPool::get().spawn(async move {
     let contents = async_fs::File::open("assets/traveling_salesman.json")
         .await
         .unwrap()
         .read_to_string()
         .await
         .unwrap();
     solve_traveling_salesman(contents);
});
// in 0.14
ComputeTaskPool::get().spawn_blocking_async(async move {
     let contents = async_fs::File::open("assets/traveling_salesman.json")
         .await
         .unwrap()
         .read_to_string()
         .await
         .unwrap();
     solve_traveling_salesman(contents);
});
> ```

@hymm hymm force-pushed the task-pool-consolidation branch from 22ec886 to d0a19e9 Compare March 5, 2025 19:50
@hymm hymm force-pushed the task-pool-consolidation branch from 8baead2 to d3bb78e Compare March 5, 2025 19:57
@TimJentzsch TimJentzsch added C-Performance A change motivated by improving speed, memory usage or compile times A-Tasks Tools for parallel and async work S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged labels Mar 5, 2025
@hymm hymm force-pushed the task-pool-consolidation branch from 58725af to 78b78ec Compare March 6, 2025 00:13
@hymm
Copy link
Contributor Author

hymm commented Mar 8, 2025

So I looked into the regression with loading assets and it seems like the issue is that the ImageLoader is doing longer running operations in load. Specifically the call to Image::from_buffer takes a decent amount of time. Besides that loading a large gltf queues up a large amount of image loading tasks, so we end up starving the schedules of any threads to run tasks.

There are a few possible solutions here:

  1. Change all asset loaders to use blocking threads. Probably not a good idea as the blocking threads don't have a proper cooperative async executor. Might be worth trying and seeing what the extra overhead is. We'd be spawning a new thread for every task, up to the max threads limit, so could be expensive memory wise too.

  2. Keep the IoTaskPool, but change the TaskPoolBuilder, so that it'll allow overprovisioning the threads. We would set the ComputeTaskPool to provision threads for all the logical processors, and set the IoTaskPool to be some percentage of that.

  3. Just fix the image loader to use blocking threads for it's longer running tasks. This requires creating an async scope as we need to free up the thread that load is called on. This would put the burden on the asset loader authors for making their asset loaders behave properly.

My inclination here is to go with (2), and rework this pr to remove the AsyncComputeTaskPool. Long running async tasks should be using separate threads from the task pool. (3) probably isn't a good idea to assume that the loaders are always be written to behave. (1) seems like a bad idea for the reasons listed above.

There might be a (4) with some type of task priority system, but that requires more extensive executor changes. I'll probably explore overprovisioning threads and put up a pr for it if it looks ok.

@Elabajaba
Copy link
Contributor

Elabajaba commented Mar 9, 2025

image

Fresh trace of Bistro with compressed textures for this PR, for context on the asset loading regression.

edit: #17914 might improve the blocking in prepare assets after the images are loaded.

@cart
Copy link
Member

cart commented Mar 12, 2025

Yeah the "separate pools" model was solving a real problem that I don't think we can just fully ignore. I like (2).

@cart
Copy link
Member

cart commented Mar 12, 2025

Although I see no reason why IO is any different than "async compute". Seems like plenty of "async compute" cases could also benefit from a pooled model?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Tasks Tools for parallel and async work C-Performance A change motivated by improving speed, memory usage or compile times S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Using ComputeTaskPool for a par_for_each query only uses half of available logical cores
5 participants