|
10 | 10 | //
|
11 | 11 | //===----------------------------------------------------------------------===//
|
12 | 12 |
|
| 13 | +//===----------------------------------------------------------------------===// |
| 14 | +// Guidelines on NVPTX Intrinsic design |
| 15 | +//===----------------------------------------------------------------------===// |
| 16 | +// |
| 17 | +// The NVPTX intrinsics are used to model instructions in the PTX ISA. |
| 18 | +// While simpler intrinsics can represent certain features effectively, |
| 19 | +// more complex instructions like TMA and MMA are not as straightforward |
| 20 | +// to model. A single variant of these complex instructions can expand |
| 21 | +// into hundreds of intrinsics. Additionally, any expansion in the |
| 22 | +// corresponding ISA can exponentially increase these numbers, making it |
| 23 | +// difficult to manage them in the IR and backend passes. Therefore, |
| 24 | +// a careful design of intrinsic interfaces can ease maintenance and |
| 25 | +// contribute to a sustainable, long-term solution. |
| 26 | +// |
| 27 | +// The default approach is to have a 1:1 match between the intrinsic and |
| 28 | +// the instruction where the instruction suffixes map to the intrinsic name |
| 29 | +// and the instruction arguments map to the intrinsic arguments or return |
| 30 | +// value. |
| 31 | +// |
| 32 | +// However, when there are too many instruction/intrinsic variants like |
| 33 | +// the TMA/MMA family, it is desirable to encode some variants as a |
| 34 | +// constant argument, referred to as 'flags'. |
| 35 | +// TODO: Add a guideline to quantify the metric on 'how many intrinsics' here. |
| 36 | +// |
| 37 | +// Below are a set of guidelines that may help in choosing |
| 38 | +// an appropriate design for the complex intrinsics: |
| 39 | +// |
| 40 | +// 1. Each flag argument represents one set of instruction modifiers. |
| 41 | +// These flags are compile-time integer constants. |
| 42 | +// |
| 43 | +// 2. When an intrinsic uses flags, document it with details of the |
| 44 | +// flag usage in the ``NVPTXUsage.rst`` file. |
| 45 | +// 3. Annotate all flag arguments with ImmArg<ArgIdx<>>. |
| 46 | +// 4. Place the flag arguments at the end of the (actual)argument list. |
| 47 | +// |
| 48 | +// 5. Use `i1` for boolean flags and `i8` for others. Usually, |
| 49 | +// the `i8` types represent an `enum` encoding the family of |
| 50 | +// modifiers. |
| 51 | +// 6. Note that, the specific variant for non-boolean flags may not be |
| 52 | +// obvious in the IR. So, maintain consistency between the enum value |
| 53 | +// definitions and their usage in the backend. |
| 54 | +// * Provide a meaningful default value in the enums wherever applicable. |
| 55 | +// * TODO: Investigate auto-upgrade capability for intrinsics |
| 56 | +// when only flag value mappings change. |
| 57 | +// |
| 58 | +// 7. Identify the key features of an intrinsic and distinguish between |
| 59 | +// first-order and supplementary information. Typically, encoding the |
| 60 | +// first-order information in the intrinsic name while using flags |
| 61 | +// for supplementary details improves readability. |
| 62 | +// For example: |
| 63 | +// |
| 64 | +// i. For MMA intrinsics, 'dense' vs. 'sparse' is a fundamental feature, |
| 65 | +// whereas an optional scaling applied to matrices is relatively secondary. |
| 66 | +// |
| 67 | +// ii. For TMAs, the mode of copy (e.g., 'Tile' or 'Im2col') is a first-order |
| 68 | +// information, while features like an optional cache hint tend to be |
| 69 | +// secondary. |
| 70 | +// |
| 71 | +// 8. If there are invalid combinations within a set of modifiers, avoid |
| 72 | +// encoding them as flags, as much as possible. This helps reduce the |
| 73 | +// need for error handling of unsupported cases in the backend. |
| 74 | +// For example, some 'cvt' intrinsics support only a subset of the |
| 75 | +// possible rounding modes; so it is preferable not to encode the |
| 76 | +// rounding modes as flags. |
| 77 | +// 9. Similarly, when there are invalid combinations across a set of |
| 78 | +// modifiers, avoid encoding them as flags to prevent additional |
| 79 | +// complexity in error handling. |
| 80 | +// |
| 81 | +// 10. Maintain a consistent design within an intrinsic family, including |
| 82 | +// argument ordering as well as the usage and ordering of flags. |
| 83 | +// 11. When designing an intrinsic corresponding to an instruction or its variant, |
| 84 | +// consider the entire instruction family. This may reveal common features |
| 85 | +// that can be modelled consistently across the family. |
| 86 | +// |
| 87 | +// In summary, strive to balance the aspects mentioned above, to achieve |
| 88 | +// a scalable design with maximum readability. |
| 89 | +//===----------------------------------------------------------------------===// |
| 90 | + |
13 | 91 | // The following intrinsics were once defined here, but are now auto-upgraded
|
14 | 92 | // to target-generic LLVM intrinsics.
|
15 | 93 | //
|
|
0 commit comments