[WebGPU] Optimize GEMM with vec4 #24478

xiaofeihan1 · 2025-04-21T02:02:35Z

Description

In this PR, we use vec4 to optimize GEMM when colums of A and B can be divided by 4, or use previous shader.
I will add u32/vec2 implementation in the future, and we will only keep one shader at that time.

Perf comparison

I run customized model only include GEMM(M = N = K = 1024) with nodejs on M2/M3 Max. Roughly 20% increase.

	!transA&&!transB	transA	transB	transA&&transB
M2	9.36->7.41	9.45->7.54	11.21->8.19	9.66->8.37
M3 max	8.07->6.99	7.54->6.53	8.42->5.89	5.47->5.29

fs-eire · 2025-04-22T00:00:38Z

Is there a way to reuse the implementation of MatMul? My understanding is that there are some kind of duplication between GEMM and MatMul, and it would be great if we can reuse the shared code

xiaofeihan1 · 2025-04-22T07:26:52Z

Is there a way to reuse the implementation of MatMul? My understanding is that there are some kind of duplication between GEMM and MatMul, and it would be great if we can reuse the shared code

Thanks for the callout. That's what I'm gonna do next.
For current PR, I want to push forward to support vec4 for GEMM. I will take the refactor work in future PRs because it also require some effort to consider(e.g.There are also some differences between gemm and matmul, e.g. the latter supports batch size, the former supports transpose, etc). WDYT?

fs-eire · 2025-04-22T18:44:03Z

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows x64 QNN CI Pipeline

azure-pipelines · 2025-04-22T18:44:24Z

Azure Pipelines successfully started running 5 pipeline(s).

onnxruntime/core/providers/webgpu/math/gemm_vec4.cc

onnxruntime/core/providers/webgpu/math/gemm.h

onnxruntime/core/providers/webgpu/math/gemm_vec4.cc

fs-eire · 2025-04-24T21:10:02Z

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows x64 QNN CI Pipeline

azure-pipelines · 2025-04-24T21:10:24Z

Azure Pipelines successfully started running 5 pipeline(s).

xiaofeihan1 added 5 commits April 21, 2025 10:00

implement vec4

fff087e

fix compile error

2a07abf

delete extra param

8b76007

check vec4 for C

90d371a

cache key

2177b8a

extract functions

4d22064

qjia7 reviewed Apr 23, 2025

View reviewed changes

onnxruntime/core/providers/webgpu/math/gemm_vec4.cc Outdated Show resolved Hide resolved

onnxruntime/core/providers/webgpu/math/gemm_vec4.cc Outdated Show resolved Hide resolved

onnxruntime/core/providers/webgpu/math/gemm_vec4.cc Outdated Show resolved Hide resolved

resolve comments

8a3b67f

xiaofeihan1 requested a review from qjia7 April 24, 2025 03:00

xiaofeihan1 commented Apr 24, 2025

View reviewed changes

onnxruntime/core/providers/webgpu/math/gemm.h Outdated Show resolved Hide resolved

qjia7 reviewed Apr 24, 2025

View reviewed changes

onnxruntime/core/providers/webgpu/math/gemm_vec4.cc Outdated Show resolved Hide resolved

qjia7 reviewed Apr 24, 2025

View reviewed changes

onnxruntime/core/providers/webgpu/math/gemm_vec4.cc Outdated Show resolved Hide resolved

xiaofeihan1 added 2 commits April 24, 2025 18:06

output is vec1 for some cases

f8d710c

remove unnecessry variable

0735137

update comments

8f8a202

xiaofeihan1 requested a review from qjia7 April 25, 2025 08:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WebGPU] Optimize GEMM with vec4 #24478

[WebGPU] Optimize GEMM with vec4 #24478

xiaofeihan1 commented Apr 21, 2025 •

edited

Loading

fs-eire commented Apr 22, 2025

xiaofeihan1 commented Apr 22, 2025 •

edited

Loading

fs-eire commented Apr 22, 2025

azure-pipelines bot commented Apr 22, 2025

fs-eire commented Apr 24, 2025

azure-pipelines bot commented Apr 24, 2025

[WebGPU] Optimize GEMM with vec4 #24478

Are you sure you want to change the base?

[WebGPU] Optimize GEMM with vec4 #24478

Conversation

xiaofeihan1 commented Apr 21, 2025 • edited Loading

Description

Perf comparison

fs-eire commented Apr 22, 2025

xiaofeihan1 commented Apr 22, 2025 • edited Loading

fs-eire commented Apr 22, 2025

azure-pipelines bot commented Apr 22, 2025

fs-eire commented Apr 24, 2025

azure-pipelines bot commented Apr 24, 2025

xiaofeihan1 commented Apr 21, 2025 •

edited

Loading

xiaofeihan1 commented Apr 22, 2025 •

edited

Loading