Skip to content

[NVPTX] Add mix precision arith intrinsics #136657

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions llvm/include/llvm/IR/IntrinsicsNVVM.td
Original file line number Diff line number Diff line change
Expand Up @@ -1306,6 +1306,48 @@ let TargetPrefix = "nvvm" in {
DefaultAttrsIntrinsic<[llvm_double_ty], [llvm_double_ty, llvm_double_ty],
[IntrNoMem, IntrSpeculatable, Commutative]>;

// Mixed-precision add intrinsics for half and bfloat16 to float
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need these new intrinsics? It seems like just calling fpext on the 16-bit operand before adding or multiplying it would be semantically equivalent. An idiom like this is also fairly concise and seems likely to be preserved through general optimizations, while still allowing for things like constant-folding and fma-fusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they are semantically equivalent. If we are not inclined on adding newer intrinsics, may be we can evaluate adding transformation to xfrm fpext + fadd => add.f32.f16 because both the scenarios leads to different SASS. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think transformations in the backend would be great! My only concern was with adding a new IR representation for something that can already be expressed fairly simply. Adding rules in ISel to handle these cases sounds like a good idea to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. Thanks, Alex.

foreach rnd = ["rn", "rz", "rm", "rp"] in {
foreach sat = ["", "_sat"] in {
// Half-precision to float
def int_nvvm_add_#rnd#sat#_h_f
: ClangBuiltin<"__nvvm_add_"#rnd#sat#"_h_f">,
DefaultAttrsIntrinsic<[llvm_float_ty],
[llvm_half_ty, llvm_float_ty],
[IntrNoMem, IntrSpeculatable]>;

// BFloat16 to float
def int_nvvm_add_#rnd#sat#_bf_f
: ClangBuiltin<"__nvvm_add_"#rnd#sat#"_bf_f">,
DefaultAttrsIntrinsic<[llvm_float_ty],
[llvm_bfloat_ty, llvm_float_ty],
[IntrNoMem, IntrSpeculatable]>;
}
}

//
// Sub
//

// Mixed-precision subtraction intrinsics for half and bfloat16 to float
foreach rnd = ["rn", "rz", "rm", "rp"] in {
foreach sat = ["", "_sat"] in {
// Half-precision to float
def int_nvvm_sub_#rnd#sat#_h_f
: ClangBuiltin<"__nvvm_sub_"#rnd#sat#"_h_f">,
DefaultAttrsIntrinsic<[llvm_float_ty],
[llvm_half_ty, llvm_float_ty],
[IntrNoMem, IntrSpeculatable]>;

// BFloat16 to float
def int_nvvm_sub_#rnd#sat#_bf_f
: ClangBuiltin<"__nvvm_sub_"#rnd#sat#"_bf_f">,
DefaultAttrsIntrinsic<[llvm_float_ty],
[llvm_bfloat_ty, llvm_float_ty],
[IntrNoMem, IntrSpeculatable]>;
}
}

//
// Dot Product
//
Expand Down
45 changes: 45 additions & 0 deletions llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
Original file line number Diff line number Diff line change
Expand Up @@ -1656,6 +1656,51 @@ def INT_NVVM_ADD_RM_D : F_MATH_2<"add.rm.f64 \t$dst, $src0, $src1;",
def INT_NVVM_ADD_RP_D : F_MATH_2<"add.rp.f64 \t$dst, $src0, $src1;",
Float64Regs, Float64Regs, Float64Regs, int_nvvm_add_rp_d>;

// Define mixed-precision add instructions for half and bfloat16 to float
foreach rnd = ["rn", "rz", "rm", "rp"] in {
foreach sat = ["", "_sat"] in {
// Half-precision to float
def INT_NVVM_ADD_#!toupper(rnd#sat)#_H_F
: F_MATH_2<"add."#rnd#!subst("_", ".",
sat)#".f32.f16 \t$dst, $src0, $src1;",
Float32Regs, Int16Regs, Float32Regs,
!cast<Intrinsic>("int_nvvm_add_"#rnd#sat#"_h_f"),
[hasPTX<86>, hasSM<100>]>;

// BFloat16 to float
def INT_NVVM_ADD_#!toupper(rnd#sat)#_BF_F
: F_MATH_2<"add."#rnd#!subst("_", ".",
sat)#".f32.bf16 \t$dst, $src0, $src1;",
Float32Regs, Int16Regs, Float32Regs,
!cast<Intrinsic>("int_nvvm_add_"#rnd#sat#"_bf_f"),
[hasPTX<86>, hasSM<100>]>;
}
}

//
// Sub
//
// Define mixed-precision sub instructions for half and bfloat16 to float
foreach rnd = ["rn", "rz", "rm", "rp"] in {
foreach sat = ["", "_sat"] in {
// Half-precision to float
def INT_NVVM_SUB_#!toupper(rnd#sat)#_H_F
: F_MATH_2<"sub."#rnd#!subst("_", ".",
sat)#".f32.f16 \t$dst, $src0, $src1;",
Float32Regs, Int16Regs, Float32Regs,
!cast<Intrinsic>("int_nvvm_sub_"#rnd#sat#"_h_f"),
[hasPTX<86>, hasSM<100>]>;

// BFloat16 to float
def INT_NVVM_SUB_#!toupper(rnd#sat)#_BF_F
: F_MATH_2<"sub."#rnd#!subst("_", ".",
sat)#".f32.bf16 \t$dst, $src0, $src1;",
Float32Regs, Int16Regs, Float32Regs,
!cast<Intrinsic>("int_nvvm_sub_"#rnd#sat#"_bf_f"),
[hasPTX<86>, hasSM<100>]>;
}
}

//
// BFIND
//
Expand Down
Loading
Loading