TheTensorCoreProject Microarchitecture implementation of my interpretation of Nvidia's SIMT CUDA and Hybrid-Precision Tensor Cores, and Google's Systolic Array TPU MXU Tensor Core Versions TensorCore v0: Volta Architecture [FP16MUL FP32ADD] TensorCore v1: Ampere Architecture [TF32MUL FP32ADD / BF16MUL FP32ADD] + Fine-Grained Structured Sparsity TensorCore v2: Hopper Architecture [FP8(E5M2/E4M3)MUL FP16ADD]