Skip to content

Use specialized intrinsics for dot4{I, U}8Packed on SPIR-V and HLSL #7574

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: trunk
Choose a base branch
from

Conversation

robamler
Copy link
Contributor

@robamler robamler commented Apr 18, 2025

Connections

Description

Replace the polyfills for dot4I8Packed and dot4U8Packed from #7494 with specialized instructions where they are available. More precisely:

  • For HLSL, we check if Shader Model >= 6.4, in which case we use dot4add_{i,u}8packed (the specification explicitly states: "no separate capability bit check is required, beyond assuring the use of Shader Model 6.4")
  • For SPIR-V, we check if the capabilities DotProduct and DotProductInput4x8BitPacked are available, in which case we use the "Packed Vector Format" argument for Op{S,U}Dot (and we emit the corresponding OpCapability and OpExtension statements at the beginning of the output).

If either of these tests fail, or if we are on Metal or GLSL, we fall back to the existing polyfills.

Testing

I added two tests with .toml configuration files that explicitly enable/prevent these specializations.

Squash or Rebase?

I think each commit should pass all CI tests.

Open Questions / Notes

  • On SPIR-V, the capabilities required for this specialization are only defined if lang_version >= 1.6, but I didn't check for high enough lang_version because I already check the capabilities_available field. This follows precedent in other places in the code base, but I'm not sure if it's sufficient to only check capabilities_available. If capabilities_available == None, then comments in the code (here and here) indicate that this should be interpreted as "all capabilities are permitted", and so the specialized code will be generated. But I guess "all capabilities" is still restricted to "all capabilities that are defined for lang_version". Should I check the lang_version in this case somehow, or is capabilities_available == None only used for debugging anyway? Unfortunately, lang_version isn't exposed anymore at the stage where we do the check, so I'd have to add it as an extra field to the Writer.
    • [Update: added a check in be9debd, but it's a bit hacky]
  • Unclear whether OpSDot with "Packed Vector Format" expects signed or unsigned arguments in spv (see comment below) [Update: resolved in this comment]
  • On Metal, we still fall back to the polyfill because, as far as I can tell, msl doesn't have builtin support for integer dot products (but it does support packed integer vectors, see packed_char4 in Table 2.4 in the specification).

Checklist

  • Run cargo fmt.
  • Run taplo format.
  • Run cargo clippy --tests. If applicable, add:
    • --target wasm32-unknown-unknown
  • Run cargo xtask test to run tests.
  • If this contains user-facing changes, add a CHANGELOG.md entry.

@robamler robamler force-pushed the packed-vector-format branch from e06d774 to 705dc6d Compare April 18, 2025 19:57
@cwfitzgerald cwfitzgerald changed the title Use specialized intrinsics for dot4{I, U}8Packed on SPIR-V and HSLS Use specialized intrinsics for dot4{I, U}8Packed on SPIR-V and HLSL Apr 18, 2025
When checking for capabilities in SPIR-V,
`capabilities_available == None` indicates that all capabilities are
available. However, some capabilities are not even defined for all
language versions, so we still need to check if the requested
capabilities even exist in the language version we're using.
@robamler robamler force-pushed the packed-vector-format branch from c93b2dd to be9debd Compare April 18, 2025 22:43
@robamler
Copy link
Contributor Author

robamler commented Apr 18, 2025

On SPIR-V, the capabilities required for this specialization are only defined if lang_version >= 1.6, but I didn't check for high enough lang_version because I already check the capabilities_available field. This follows precedent in other places in the code base, but I'm not sure if it's sufficient to only check capabilities_available. If capabilities_available == None, then comments in the code (here and here) indicate that this should be interpreted as "all capabilities are permitted", and so the specialized code will be generated. But I guess "all capabilities" is still restricted to "all capabilities that are defined for lang_version". Should I check the lang_version in this case somehow, or is capabilities_available == None only used for debugging anyway? Unfortunately, lang_version isn't exposed anymore at the stage where we do the check, so I'd have to add it as an extra field to the Writer.

In case the above text is confusing: I pushed be9debd as an example of how an additional check for lang_version could be implemented, but it's a bit hacky and I'm also not completely sure if this check is necessary (I don't know if capabilities_available == None is supposed to occur outside of tests).

@robamler

This comment was marked as resolved.

@robamler
Copy link
Contributor Author

The sign issue has been resolved. The implementation should be correct as is. Turns out SPIR-V doesn't care about signedness in this case: "all signed and unsigned operations always work on unsigned types, and the semantics of operation come from the opcode" (credits to Nicol Bolas, see this StackOverflow answer).

@robamler

This comment was marked as resolved.

robamler added a commit to robamler/wgpu that referenced this pull request Apr 22, 2025
See gfx-rs#7574, in particular
<gfx-rs#7574 (comment)>.

Adds `FeaturesWGPU::NATIVE_PACKED_INTEGER_DOT_PRODUCT`, which is
available on `Adapter`s that support the specialized implementations for
`dot4I8Packed` and `dot4U8Packed` implemented in gfx-rs#7574 (currently, this
includes DX12 with Shader Model >= 6.4 and Vulkan with device extension
"VK_KHR_shader_integer_dot_product".

If this feature is requested during `Device` creation, the device is set
up such that `dot4I8Packed` and `dot4U8Packed` will be compiled to their
respective specialized instructions. This means that, on a vulkan
`Device`, the SPIR-V language version is set to 1.6, and the required
SPIR-V capabilities are marked as available (on DX12, requesting the
feature doesn't change anything since availability of the feature
already guarantees that Shader Model >= 6.4, which is all we need to
generate specialized code).
robamler added a commit to bamler-lab/compressed-nn-ops-demo that referenced this pull request Apr 22, 2025
Using the specialized instructions from
<gfx-rs/wgpu#7574>.
@robamler
Copy link
Contributor Author

robamler commented Apr 22, 2025

I separated out the above issue of how these naga optimizations can be used in wgpu to #7595. I think the present PR can be reviewed independently from #7595.

robamler added a commit to robamler/wgpu that referenced this pull request Apr 22, 2025
See gfx-rs#7574, in particular
<gfx-rs#7574 (comment)>.

Adds `FeaturesWGPU::NATIVE_PACKED_INTEGER_DOT_PRODUCT`, which is
available on `Adapter`s that support the specialized implementations for
`dot4I8Packed` and `dot4U8Packed` implemented in gfx-rs#7574 (currently, this
includes DX12 with Shader Model >= 6.4 and Vulkan with device extension
"VK_KHR_shader_integer_dot_product").

If this feature is requested during `Device` creation, the device is set
up such that `dot4I8Packed` and `dot4U8Packed` will be compiled to their
respective specialized instructions. This means that, on a vulkan
`Device`, the SPIR-V language version is set to 1.6, and the required
SPIR-V capabilities are marked as available (on DX12, requesting the
feature doesn't change anything since availability of the feature
already guarantees that Shader Model >= 6.4, which is all we need to
generate specialized code).
@robamler robamler requested a review from cwfitzgerald April 22, 2025 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants