Skip to content

[mlir][memref] canonicalization for erasing copying subview to identical subview #125852

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

fschlimb
Copy link
Contributor

@fschlimb fschlimb commented Feb 5, 2025

Extending the canonicalization pattern FoldSelfCopy:
In addition to erasing memref.copy if target and source are identical, it now also erases the copy if it copies a subview into the "same" subview. "Same" here conservatively means target and source are both subviews on the identical source and their dynamic and static offsets and strides are identical.

@llvmbot
Copy link
Member

llvmbot commented Feb 5, 2025

@llvm/pr-subscribers-mlir-memref

@llvm/pr-subscribers-mlir

Author: Frank Schlimbach (fschlimb)

Changes

Extending the canonicalization pattern FoldSelfCopy:
In addition to erasing memref.copy if target and source are identical, it now also erases the copy if it copies a subview into the "same" subview. "Same" here conservatively means target and source are both subviews on the identical source and their dynamic and static offsets and strides are identical.


Full diff: https://github.com/llvm/llvm-project/pull/125852.diff

2 Files Affected:

  • (modified) mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp (+26-2)
  • (modified) mlir/test/Dialect/MemRef/canonicalize.mlir (+30)
diff --git a/mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp b/mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
index e0930abc1887d42..f94c588f99361e2 100644
--- a/mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
+++ b/mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
@@ -824,8 +824,32 @@ struct FoldSelfCopy : public OpRewritePattern<CopyOp> {
 
   LogicalResult matchAndRewrite(CopyOp copyOp,
                                 PatternRewriter &rewriter) const override {
-    if (copyOp.getSource() != copyOp.getTarget())
-      return failure();
+    if (copyOp.getSource() != copyOp.getTarget()) {
+      // If the source and target are SubViews and they are identical, we can fold.
+      auto source = copyOp.getSource().getDefiningOp<SubViewOp>();
+      auto target = copyOp.getTarget().getDefiningOp<SubViewOp>();
+      if (!source || !target ||
+          source.getSource() != target.getSource() ||
+          llvm::any_of(llvm::zip(source.getOffsets(), target.getOffsets()),
+                    [](std::tuple<Value, Value> offsetPair) {
+                      return std::get<0>(offsetPair) != std::get<1>(offsetPair);
+                    }) ||
+          llvm::any_of(llvm::zip(source.getStaticOffsets(), target.getStaticOffsets()),
+                    [](std::tuple<int64_t, int64_t> offsetPair) {
+                      return std::get<0>(offsetPair) != std::get<1>(offsetPair);
+                    }) ||
+          // sizes must be the same anyway
+          llvm::any_of(llvm::zip(source.getStrides(), target.getStrides()),
+                    [](std::tuple<Value, Value> stridePair) {
+                      return std::get<0>(stridePair) != std::get<1>(stridePair);
+                    }) ||
+          llvm::any_of(llvm::zip(source.getStaticStrides(), target.getStaticStrides()),
+                    [](std::tuple<int64_t, int64_t> stridePair) {
+                      return std::get<0>(stridePair) != std::get<1>(stridePair);
+                    })) {
+          return failure();
+      }
+    }
 
     rewriter.eraseOp(copyOp);
     return success();
diff --git a/mlir/test/Dialect/MemRef/canonicalize.mlir b/mlir/test/Dialect/MemRef/canonicalize.mlir
index 02110bc2892d051..56a7014047aa1dd 100644
--- a/mlir/test/Dialect/MemRef/canonicalize.mlir
+++ b/mlir/test/Dialect/MemRef/canonicalize.mlir
@@ -704,6 +704,36 @@ func.func @self_copy(%m1: memref<?xf32>) {
 
 // -----
 
+func.func @self_copy_subview(%arg0: memref<?xf32>, %arg1: memref<?xf32>, %s: index) {
+  %c3 = arith.constant 3: index
+  %0 = memref.subview %arg0[3] [4] [2] : memref<?xf32> to memref<4xf32, strided<[2], offset: 3>>
+  %1 = memref.subview %arg0[%c3] [4] [2] : memref<?xf32> to memref<4xf32, strided<[2], offset: ?>>
+  %2 = memref.subview %arg0[%c3] [4] [%s] : memref<?xf32> to memref<4xf32, strided<[?], offset: ?>>
+  %3 = memref.subview %arg0[3] [4] [%s] : memref<?xf32> to memref<4xf32, strided<[?], offset: 3>>
+  %4 = memref.subview %arg1[3] [4] [%s] : memref<?xf32> to memref<4xf32, strided<[?], offset: 3>>
+  // erase (source and destination subviews render the same)
+  memref.copy %0, %1 : memref<4xf32, strided<[2], offset: 3>> to memref<4xf32, strided<[2], offset: ?>>
+  // keep (strides differ)
+  memref.copy %2, %1 : memref<4xf32, strided<[?], offset: ?>> to memref<4xf32, strided<[2], offset: ?>>
+  // erase (source and destination subviews render the same)
+  memref.copy %2, %3 : memref<4xf32, strided<[?], offset: ?>> to memref<4xf32, strided<[?], offset: 3>>
+  // keep (source and destination differ)
+  memref.copy %3, %4 : memref<4xf32, strided<[?], offset: 3>> to memref<4xf32, strided<[?], offset: 3>>
+  return
+}
+
+// CHECK-LABEL: func.func @self_copy_subview(
+// CHECK-SAME: [[varg0:%.*]]: memref<?xf32>, [[varg1:%.*]]: memref<?xf32>, [[varg2:%.*]]: index) {
+  // CHECK: [[vsubview:%.*]] = memref.subview [[varg0]][3] [4] [2]
+  // CHECK: [[vsubview_0:%.*]] = memref.subview [[varg0]][3] [4] [[[varg2]]]
+  // CHECK: [[vsubview_1:%.*]] = memref.subview [[varg0]][3] [4] [[[varg2]]]
+  // CHECK: [[vsubview_2:%.*]] = memref.subview [[varg1]][3] [4] [[[varg2]]]
+  // CHECK-NEXT: memref.copy [[vsubview_0]], [[vsubview]]
+  // CHECK-NEXT: memref.copy [[vsubview_1]], [[vsubview_2]]
+  // CHECK-NEXT: return
+
+// -----
+
 // CHECK-LABEL: func @empty_copy
 //  CHECK-NEXT:   return
 func.func @empty_copy(%m1: memref<0x10xf32>, %m2: memref<?x10xf32>) {

@fschlimb fschlimb changed the title canonicalization for erasing copying subview to identical subview [mlir][memref] canonicalization for erasing copying subview to identical subview Feb 5, 2025
Copy link

github-actions bot commented Feb 5, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@mgehre-amd
Copy link
Contributor

It sounds like we could run cse first, and then both subview would be the same op. Which then the original pattern would pick up. Am I missing something?

@fschlimb
Copy link
Contributor Author

fschlimb commented Feb 5, 2025

It sounds like we could run cse first, and then both subview would be the same op. Which then the original pattern would pick up. Am I missing something?

Almost.
You can get the same result by doing "canonicalize -> cse -> canonicalize". Otherwise cse it will not catch if constants in the dynamic offsets/strides are the same as static ones.

Copy link
Contributor

@MaheshRavishankar MaheshRavishankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments, but looks good to me.

// fold.
auto source = copyOp.getSource().getDefiningOp<SubViewOp>();
auto target = copyOp.getTarget().getDefiningOp<SubViewOp>();
if (!source || !target || source.getSource() != target.getSource() ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is hard to read. Can we split this out int separate conditions. Checking source and target are non null and equal in one condition is fine. Better to split the rest into separate conditions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this.
I have split the condition and used simplified syntax (using operators !=).

std::get<1>(stridePair);
}) ||
llvm::any_of(
llvm::zip(source.getStaticStrides(), target.getStaticStrides()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you not testing the sizes as well since by semantics of copy they are the same size?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.
I (hopefully) clarified the comment.

@fschlimb
Copy link
Contributor Author

Will rebase & merge next Monday if no other concerns have been raised.

Copy link
Member

@Groverkss Groverkss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the two subview operations are same, why wouldn't they cse to the same operation? You could implement the same canonicalization for any two duplicate operations.

@Groverkss Groverkss requested a review from joker-eph February 27, 2025 22:05
@fschlimb
Copy link
Contributor Author

If the two subview operations are same, why wouldn't they cse to the same operation?

See above, only a canonicalize->cse->canonicalize will do the same.

You could implement the same canonicalization for any two duplicate operations.

Maybe there are others with a similar potential. What is the implication of this statement?

@mgehre-amd
Copy link
Contributor

If the two subview operations are same, why wouldn't they cse to the same operation?

See above, only a canonicalize->cse->canonicalize will do the same.

You could implement the same canonicalization for any two duplicate operations.

Maybe there are others with a similar potential. What is the implication of this statement?

I guess the question is, why can't you run canonicalize->cse->canonicalize to solve this problem?
If you approach this problem by pure canonicalization, you will eventually recreate all functionality of CSE within this canonicalization pattern. E.g. what if the memref.copy has two operands that are both an equivalent memref.expand_shape? Or both are an equivalent memref.collapse_shape? Or both are an identical chain of expand_shape + subview. This would require even more code in this canonicalization pattern, where the problem is already solved today by running canonicalize->cse->canonicalize.

@joker-eph
Copy link
Collaborator

I guess the question is, why can't you run canonicalize->cse->canonicalize to solve this problem?

This is really expensive though: so rather than a "can" I see it as whether it is desirable.
Beyond the "purity" of the design, this gets into practical territory: the tradeoff for the canonicalization to be useful enough can be driven by actual patterns of IR observed.
Note also that because canonicalization can perform transformations that enable CSE, you may have to do loop_until_no_change(canonization, use) if you really wanna guarantee to catch it all.

@fschlimb
Copy link
Contributor Author

I guess the question is, why can't you run canonicalize->cse->canonicalize to solve this problem?

This is really expensive though: so rather than a "can" I see it as whether it is desirable. Beyond the "purity" of the design, this gets into practical territory: the tradeoff for the canonicalization to be useful enough can be driven by actual patterns of IR observed.

I second @joker-eph. The pattern covered here I actually do see, in particular MLIR's tiling machinery (like transform.structured.tile_using_for* ) creates such. Should other op combinations get observed they can still be added.

Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
@mgehre-amd
Copy link
Contributor

I guess the question is, why can't you run canonicalize->cse->canonicalize to solve this problem?

This is really expensive though: so rather than a "can" I see it as whether it is desirable. Beyond the "purity" of the design, this gets into practical territory: the tradeoff for the canonicalization to be useful enough can be driven by actual patterns of IR observed. Note also that because canonicalization can perform transformations that enable CSE, you may have to do loop_until_no_change(canonization, use) if you really wanna guarantee to catch it all.

Ok, that makes sense to me. Thanks for the explanation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants