You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[mlir][vector] Fix emulation of "narrow" type vector.store (#133231)
Below are two examples of "narrow" `vector.stores`. The first example
does not require partial stores and hence no RMW stores. This is
currently emulated correctly.
```mlir
func.func @example_1(%arg0: vector<4xi2>) {
%0 = memref.alloc() : memref<13xi2>
%c4 = arith.constant 4 : index
vector.store %arg0, %0[%c4] : memref<13xi2>, vector<4xi2>
return
}
```
The second example requires a partial (and hence RMW) store due to the
offset pointing outside the emulated type boundary (`%c3`).
```mlir
func.func @example_2(%arg0: vector<4xi2>) {
%0 = memref.alloc() : memref<13xi2>
%c3 = arith.constant 3 : index
vector.store %arg0, %0[%c3] : memref<13xi2>, vector<4xi2>
return
}
```
This is currently incorrectly emulated as a single "full" store (note
that the offset is incorrect) instead of partial stores:
```mlir
func.func @example_2(%arg0: vector<4xi2>) {
%alloc = memref.alloc() : memref<4xi8>
%0 = vector.bitcast %arg0 : vector<4xi2> to vector<1xi8>
%c0 = arith.constant 0 : index
vector.store %0, %alloc[%c0] : memref<4xi8>, vector<1xi8>
return
}
```
The incorrect emulation stems from this simplified (i.e. incomplete)
calculation of the front padding:
```cpp
std::optional<int64_t> foldedNumFrontPadElems =
isDivisibleInSize ? 0
: getConstantIntValue(linearizedInfo.intraDataOffset);
```
Since `isDivisibleInSize` is `true` (i8 / i2 = 4):
* front padding is set to `0` and, as a result,
* the input offset (`%c3`) is ignored, and
* we incorrectly assume that partial stores won't be needed.
Note that in both examples we are storing `vector<4xi2>` into
`memref<13xi2>` (note _different_ trailing dims) and hence partial
stores might in fact be required. The condition above is updated to:
```cpp
std::optional<int64_t> foldedNumFrontPadElems =
(isDivisibleInSize && trailingDimsMatch)
? 0
: getConstantIntValue(linearizedInfo.intraDataOffset);
```
This change ensures that the input offset is properly taken into
account, which fixes the issue. It doesn't affect `@example1`.
Additional comments are added to clarify the current logic.
0 commit comments