Implement concurrent formatting

Hi, 

I've been rummaging around this repo for a bit lately, getting a bit more familiar with the code by looking at some bugs.

What I really like working with is performance though and I noticed that even though files are formatting independently of each other (in some cases), it's all run single-threaded.

In most cases that's entirely reasonable since `rustfmt` is using rustc-types that are inherently not thread-safe, but in some cases it's possible to run each format file in a thread. So I tested that out.

The performance benefits are pretty staggering, of course, the measurements are not especially scientific, and it's only measured on my machine, but we're not talking about percentage points here but 3x - 10x for big projects on my machine.

Running `rustfmt` on all files in `tokio` takes about 400ms while running multithreaded `rustfmt` takes about 50ms.
Running `rustfmt` on all files an unpublished large codebase takes about 5000ms, while multithreaded is about 800ms.

Running `cargo fmt --all` on `tokio` is around 265ms, while running `cargo fmt --all` multithreaded is around 80ms.
Running `cargo fmt --all` on the same unpublished large codebase is around 1800ms, multithreaded it's 570ms.

On a smaller repo with just a 7 files, multithreaded was about 2x faster.

## Downsides

Making `rustfmt` run concurrently will cause problems because of the good old `C`-ghost, shared globals, in this case it's `stdout` and `stderr`, any kind of interaction with those facilities will cause a mess.

Exemplified: 
File A and File B is processed concurrently, in verbose.
There's a diff in File B, and an error in file A.
You could get some output lite:
`"Using rustfmt config file rustfmt.cfg for filea.rs"`
`"Using rustfmt config file rustfmt.cfg for fileb.rs"`
`Diff at fileb.rs l 322`
`Error writing files {msg}` (Error in file A, but you won't know that).

This is a non-trivial problem since `stdout` and `stderr` while they are internally synced for a given `(e)println` are not synced on the file-parsing level. And locking `stdout` and/or `stderr` until a file-pass finishes would nullify most of the benefits of multithreading completely.

### Proposed solution

Reword formatting runs to take an output buffer and then print that chronologically based on which file is processed.  

This preserves previous printing behaviour completely, but requires a pretty sweeping change where these buffers have to be passed down anywhere printing is wanted.

In practice there's only one part of the code where this becomes particularly problematic, and that's the diff-writer which uses `term`, which prints colored messages to the terminal.

There's another potential issue, which is preserving ordering between writes to `stdout` and `stderr`. 
I.E. Buffers are passed down into the application, the code prints into those instead. 
Now the code first prints with an `eprintln` then a `println`. The calling code gets the buffers back and is responsible for flushing them to `stderr` and `stdout` respectively, in which order should the calling code flush them?

This isn't necessarily a huge problem. Ideally code deep down the call-stack shouldn't be printing, and refactoring that may be the best thing to do for clarity anyway, but it's something to consider.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement concurrent formatting #6091

Downsides

Proposed solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement concurrent formatting #6091

Description

Downsides

Proposed solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions