Description
This is a rough idea for a talk/tutorial. Critique welcome :).
Suppose you are a network engineer and you want to understand how modern x86 CPUs work under the hood. Cache-misses, out-of-order execution, pipelined execution, etc. One approach is to read a big heavy book like Hennessy and Patterson. However, there is also a short-cut.
CPUs are basically networks these days (#15) and their mechanisms all have direct analogues in TCP. In fact, if you have spent time troubleshooting TCP performance problems in wireshark it's entirely likely that you have a more visceral intuition for CPU performance that most software people do.
Here is why CPUs are basically equivalent to TCP senders:
TCP | CPU |
---|---|
TCP sends a stream of packets. | CPU issues a stream of instructions. |
TCP packets are eventually acknowledged. | CPU instructions are eventually retired. |
TCP sends multiple packets in series, without waiting for the first to be acknowledged, up to the window size. | CPU issues multiple instructions in series, without waiting for the first to be retired, up to the reservation station size. |
TCP packets that are "in flight" all make progress towards their destination at the same time. | CPU instructions that are in flight all make progress towards completion at the same time in a pipelined architecture. |
TCP incurs packet loss when a packet reaches an overloaded router. The main consequence of a packet loss is more latency between initial transmission and ultimate acknowledgement. (There are also a lot of complex state transitions.) | CPU incurs cache misses when instructions refer to memory addresses that are not cached. The main consequence of a cache miss is more latency between the initial issue of an instruction and its ultimate retirement. |
The impact of a packet loss depends on the workload. Losing certain packets can cripple performance, for example a control packet like a TCP SYN or a HTTP GET, while certain other packets won't have a noticable impact at all, like losing the 900th packet in an FTP transfer. The key is whether TCP can "keep the pipe full" with other data while it waits to recover the lost packet. | The impact of a cache miss depends on the workload. Certain cache misses can cripple performance, for example when fetching the next instruction to execute or chasing a long chain of pointer-dereferences, while certain cache misses won't have a noticable impact at all, like a long series of pipelined memory accesses that all go out to RAM in parallel. |
TCP can use Selective ACK to work-around hazards like packet loss and continue sending new packets beyond the slow one without waiting for it to be recovered and ACKed first. | CPU can use out-of-order execution to work-around hazards like cache misses and continue executing new instructions beyond the slow one without waiting for it to be completed and retired first. |
TCP can run multiple connections on the same link. This does not directly increase bandwidth, because they are sharing the same network resources, but it does improve robustness. If one connection is blocked by a hazard, such as a packet loss, the other can still make progress and so the link is less likely to become idle (which would waste bandwidth.) | CPU can run multiple hyperthreads on the same core. This does not directly increase performance, because they are sharing the same computing resources, but it does improve robustness. If one hyperthread is blocked by a hazard, such as a cache miss, the other can still make progress and so the core is less likely to become idle (which would waste execution cycles.) |
What do you think?
Have an idea for good analogs of branch prediction and dispatching instructions across multiple execution units?