Skip to content

Commit a03b225

Browse files
authored
[NVPTX][Docs] [NFC] Update docs on intrinsics (#133136)
Recently, we have added a set of complex intrinsics on the TMA, tcgen05, and Cvt family of instructions. This patch captures the key learnings from our experience so far and documents them as guidelines for future design. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
1 parent 5812516 commit a03b225

File tree

1 file changed

+78
-0
lines changed

1 file changed

+78
-0
lines changed

llvm/include/llvm/IR/IntrinsicsNVVM.td

+78
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,84 @@
1010
//
1111
//===----------------------------------------------------------------------===//
1212

13+
//===----------------------------------------------------------------------===//
14+
// Guidelines on NVPTX Intrinsic design
15+
//===----------------------------------------------------------------------===//
16+
//
17+
// The NVPTX intrinsics are used to model instructions in the PTX ISA.
18+
// While simpler intrinsics can represent certain features effectively,
19+
// more complex instructions like TMA and MMA are not as straightforward
20+
// to model. A single variant of these complex instructions can expand
21+
// into hundreds of intrinsics. Additionally, any expansion in the
22+
// corresponding ISA can exponentially increase these numbers, making it
23+
// difficult to manage them in the IR and backend passes. Therefore,
24+
// a careful design of intrinsic interfaces can ease maintenance and
25+
// contribute to a sustainable, long-term solution.
26+
//
27+
// The default approach is to have a 1:1 match between the intrinsic and
28+
// the instruction where the instruction suffixes map to the intrinsic name
29+
// and the instruction arguments map to the intrinsic arguments or return
30+
// value.
31+
//
32+
// However, when there are too many instruction/intrinsic variants like
33+
// the TMA/MMA family, it is desirable to encode some variants as a
34+
// constant argument, referred to as 'flags'.
35+
// TODO: Add a guideline to quantify the metric on 'how many intrinsics' here.
36+
//
37+
// Below are a set of guidelines that may help in choosing
38+
// an appropriate design for the complex intrinsics:
39+
//
40+
// 1. Each flag argument represents one set of instruction modifiers.
41+
// These flags are compile-time integer constants.
42+
//
43+
// 2. When an intrinsic uses flags, document it with details of the
44+
// flag usage in the ``NVPTXUsage.rst`` file.
45+
// 3. Annotate all flag arguments with ImmArg<ArgIdx<>>.
46+
// 4. Place the flag arguments at the end of the (actual)argument list.
47+
//
48+
// 5. Use `i1` for boolean flags and `i8` for others. Usually,
49+
// the `i8` types represent an `enum` encoding the family of
50+
// modifiers.
51+
// 6. Note that, the specific variant for non-boolean flags may not be
52+
// obvious in the IR. So, maintain consistency between the enum value
53+
// definitions and their usage in the backend.
54+
// * Provide a meaningful default value in the enums wherever applicable.
55+
// * TODO: Investigate auto-upgrade capability for intrinsics
56+
// when only flag value mappings change.
57+
//
58+
// 7. Identify the key features of an intrinsic and distinguish between
59+
// first-order and supplementary information. Typically, encoding the
60+
// first-order information in the intrinsic name while using flags
61+
// for supplementary details improves readability.
62+
// For example:
63+
//
64+
// i. For MMA intrinsics, 'dense' vs. 'sparse' is a fundamental feature,
65+
// whereas an optional scaling applied to matrices is relatively secondary.
66+
//
67+
// ii. For TMAs, the mode of copy (e.g., 'Tile' or 'Im2col') is a first-order
68+
// information, while features like an optional cache hint tend to be
69+
// secondary.
70+
//
71+
// 8. If there are invalid combinations within a set of modifiers, avoid
72+
// encoding them as flags, as much as possible. This helps reduce the
73+
// need for error handling of unsupported cases in the backend.
74+
// For example, some 'cvt' intrinsics support only a subset of the
75+
// possible rounding modes; so it is preferable not to encode the
76+
// rounding modes as flags.
77+
// 9. Similarly, when there are invalid combinations across a set of
78+
// modifiers, avoid encoding them as flags to prevent additional
79+
// complexity in error handling.
80+
//
81+
// 10. Maintain a consistent design within an intrinsic family, including
82+
// argument ordering as well as the usage and ordering of flags.
83+
// 11. When designing an intrinsic corresponding to an instruction or its variant,
84+
// consider the entire instruction family. This may reveal common features
85+
// that can be modelled consistently across the family.
86+
//
87+
// In summary, strive to balance the aspects mentioned above, to achieve
88+
// a scalable design with maximum readability.
89+
//===----------------------------------------------------------------------===//
90+
1391
// The following intrinsics were once defined here, but are now auto-upgraded
1492
// to target-generic LLVM intrinsics.
1593
//

0 commit comments

Comments
 (0)