Skip to content

[mlir] Inconsistent results for memref.copy #114656

Open
@AnonymousBugreporter1

Description

@AnonymousBugreporter1

I have the following MLIR program:
test.mlir:

module {
  func.func nested @func1() -> f32 {
    %idx0 = index.constant 0
    %idx1 = index.constant 1
    %true = arith.constant true
    %false = arith.constant false

    %alloc_33 = memref.alloc() : memref<11xi1>
    linalg.fill ins(%true : i1) outs(%alloc_33 : memref<11xi1>)

    %alloc_147 = memref.alloc() : memref<11xi1>    
    linalg.fill ins(%false : i1) outs(%alloc_147 : memref<11xi1>)

    memref.copy %alloc_147, %alloc_33 : memref<11xi1> to memref<11xi1>
    
    %dim = memref.dim %alloc_33, %idx0 : memref<11xi1>
    %0 = scf.for %arg1 = %idx0 to %dim step %idx1 iter_args(%arg2 = %false) -> (i1) {
      %1 = memref.load %alloc_33[%arg1] : memref<11xi1>
      vector.print %1 : i1
      %2 = arith.addi %arg2, %1 : i1
      scf.yield %2 : i1
    }
    
    vector.print %0 : i1
    %1 = arith.sitofp %0 : i1 to f32
    return %1 : f32
  }
}

When I ran /data/tmp/v1102/llvm-project/build/bin/mlir-opt --convert-vector-to-llvm --convert-linalg-to-loops --convert-scf-to-cf --finalize-memref-to-llvm --convert-arith-to-llvm --convert-func-to-llvm --convert-index-to-llvm --reconcile-unrealized-casts test.mlir | /data/tmp/v1102/llvm-project/build/bin/mlir-cpu-runner -e func1 --shared-libs=/data/tmp/v1102/llvm-project/build/lib/libmlir_runner_utils.so,/data/tmp/v1102/llvm-project/build/lib/libmlir_c_runner_utils.so on the program, I got the result of:

0
0
0
0
0
0
0
0
0
0
0
0
0.000000e+00

However, when I ran /data/tmp/v1102/llvm-project/build/bin/mlir-opt --test-linalg-transform-patterns=test-patterns --convert-vector-to-llvm --convert-linalg-to-loops --convert-scf-to-cf --finalize-memref-to-llvm --convert-arith-to-llvm --convert-func-to-llvm --convert-index-to-llvm --reconcile-unrealized-casts test.mlir | /data/tmp/v1102/llvm-project/build/bin/mlir-cpu-runner -e func1 --shared-libs=/data/tmp/v1102/llvm-project/build/lib/libmlir_runner_utils.so,/data/tmp/v1102/llvm-project/build/lib/libmlir_c_runner_utils.so on the program, I got the result of:

0
0
1
1
1
1
1
1
1
1
1
1
-1.000000e+00

The above two results seem to be inconsistent. I'm not sure if there is any bug in my program or if the wrong usage of the above passes caused these results.

My git version is 33bdb53.

Activity

AnonymousBugreporter1

AnonymousBugreporter1 commented on Nov 29, 2024

@AnonymousBugreporter1
Author

I tried to reproduce this issue on history commit versions, and I found these inconsistence results can be reproduced on commit ebc8153, and cannot be reproduced on the previous commit 9c52a19.
To satisfy the history syntax and pass usage, I made some changes to the program, and the adjusted test.mlir is:

module {
  func nested @func1() -> i32 {
    %idx0 = arith.constant 0 : index
    %idx1 = arith.constant 1 : index
    %true = arith.constant true
    %false = arith.constant false

    %alloc_33 = memref.alloc() : memref<11xi1>
    linalg.fill(%true, %alloc_33) : i1, memref<11xi1>
    %alloc_147 = memref.alloc() : memref<11xi1>
    linalg.fill(%false, %alloc_147) : i1, memref<11xi1>

    memref.copy %alloc_147, %alloc_33 : memref<11xi1> to memref<11xi1>

    %dim = memref.dim %alloc_33, %idx0 : memref<11xi1>
    %0 = scf.for %arg1 = %idx0 to %dim step %idx1 iter_args(%arg2 = %false) -> (i1) {
      %1 = memref.load %alloc_33[%arg1] : memref<11xi1>
      vector.print %1 : i1
      %2 = arith.addi %arg2, %1 : i1
      scf.yield %2 : i1
    }

    vector.print %0 : i1
    %1 = arith.extsi %0 : i1 to i32
    return %1 : i32
  }
}

When I ran

/data/tmp/ebc815378696/llvm-project/build/bin/mlir-opt \
--convert-vector-to-llvm --convert-linalg-to-loops --convert-scf-to-std --convert-memref-to-llvm --convert-arith-to-llvm --convert-std-to-llvm --reconcile-unrealized-casts test.mlir | /data/tmp/ebc815378696/llvm-project/build/bin/mlir-cpu-runner --entry-point-result=i32 -e func1 --shared-libs=/data/tmp/ebc815378696/llvm-project/build/lib/libmlir_runner_utils.so,/data/tmp/ebc815378696/llvm-project/build/lib/libmlir_c_runner_utils.so

on the program, I got the result of:

0
0
0
0
0
0
0
0
0
0
0
0
0

However, when I ran

/data/tmp/ebc815378696/llvm-project/build/bin/mlir-opt \
--test-linalg-transform-patterns=test-patterns \
--convert-vector-to-llvm --convert-linalg-to-loops --convert-scf-to-std --convert-memref-to-llvm --convert-arith-to-llvm --convert-std-to-llvm --reconcile-unrealized-casts test.mlir | /data/tmp/ebc815378696/llvm-project/build/bin/mlir-cpu-runner --entry-point-result=i32 -e func1 --shared-libs=/data/tmp/ebc815378696/llvm-project/build/lib/libmlir_runner_utils.so,/data/tmp/ebc815378696/llvm-project/build/lib/libmlir_c_runner_utils.so

on the program, I got the result of:

0
0
1
1
1
1
1
1
1
1
1
1
-1

Hi @pifon2a, sorry to disturb but I was wondering if you would mind taking a look at this problem?

AnonymousBugreporter1

AnonymousBugreporter1 commented on Dec 2, 2024

@AnonymousBugreporter1
Author

Hi @nicolasvasilache and @banach-space, sorry to disturb you, but I noticed that you have reviewed the related commit or have worked on the same file, and I was wondering if it might be possible for you to take a look at this problem when you have a moment?

pifon2a

pifon2a commented on Dec 2, 2024

@pifon2a
Contributor

We had a lot of different problems in TensorFlow and XLA, because i1 was actually an 8-bit type instead of 1-bit. Could it be something like that?

banach-space

banach-space commented on Dec 2, 2024

@banach-space
Contributor

Thanks for reporting this @wangyongj1a !

Just to confirm, this is broken using LLVM ToT? (Top Of Tree) And, it's this specific flag that brakes things:

  • --test-linalg-transform-patterns=test-patterns

? These patterns are defined here:

static void applyPatterns(func::FuncOp funcOp) {
MLIRContext *ctx = funcOp.getContext();
RewritePatternSet patterns(ctx);
//===--------------------------------------------------------------------===//
// Linalg distribution patterns.
//===--------------------------------------------------------------------===//
LinalgLoopDistributionOptions distributionOptions;
//===--------------------------------------------------------------------===//
// Linalg to vector contraction patterns.
//===--------------------------------------------------------------------===//
patterns.add<CopyVectorizationPattern>(ctx);
(void)applyPatternsAndFoldGreedily(funcOp, std::move(patterns));
}

My suspicion would be this bit:

  (void)applyPatternsAndFoldGreedily(funcOp, std::move(patterns));

I'd start by reverting these changes in MemRefOps.cpp (CopyOp::fold specifically). Could you try that?

AnonymousBugreporter1

AnonymousBugreporter1 commented on Dec 5, 2024

@AnonymousBugreporter1
Author

Thanks for your response!

Thanks for reporting this @wangyongj1a !

Just to confirm, this is broken using LLVM ToT? (Top Of Tree) And, it's this specific flag that brakes things:

  • --test-linalg-transform-patterns=test-patterns

? These patterns are defined here:

llvm-project/mlir/test/lib/Dialect/Linalg/TestLinalgTransforms.cpp

Lines 136 to 151 in fe1c4f0

static void applyPatterns(func::FuncOp funcOp) {
MLIRContext *ctx = funcOp.getContext();
RewritePatternSet patterns(ctx);

//===--------------------------------------------------------------------===//
// Linalg distribution patterns.
//===--------------------------------------------------------------------===//
LinalgLoopDistributionOptions distributionOptions;

//===--------------------------------------------------------------------===//
// Linalg to vector contraction patterns.
//===--------------------------------------------------------------------===//
patterns.add(ctx);

(void)applyPatternsAndFoldGreedily(funcOp, std::move(patterns));
}
My suspicion would be this bit:

(void)applyPatternsAndFoldGreedily(funcOp, std::move(patterns));
I'd start by reverting these changes in MemRefOps.cpp (CopyOp::fold specifically). Could you try that?

This problem requires --test-linalg-transform-patterns=test-patterns to reproduce. However, the inconsistency may be not from (void)applyPatternsAndFoldGreedily(funcOp, std::move(patterns));
Consider the condition of CopyOp::fold (i.e., the operand of memref.copy is defined by cast operation). In this case, two operands are defined by memref.alloc, which does not match the condition.

Further, I ran the --test-linalg-transform-patterns=test-patterns singly, and I got the following output:

...
%0 = vector.transfer_read %alloc_0[%c0], %false {in_bounds = [true]} : memref<11xi1>, vector<11xi1>
vector.transfer_write %0, %alloc[%c0] {in_bounds = [true]} : vector<11xi1>, memref<11xi1>
...

Note that vector.transfer_read and vector.transfer_write is generated in CopyVectorizationPattern, which means the inconsistency is from patterns.add<CopyVectorizationPattern>(ctx);.

Besides, this problem can still be reproduced on 7d1c661 using the above commands.

I have the following MLIR program: test.mlir:

module {
  func.func nested @func1() -> f32 {
    %idx0 = index.constant 0
    %idx1 = index.constant 1
    %true = arith.constant true
    %false = arith.constant false

    %alloc_33 = memref.alloc() : memref<11xi1>
    linalg.fill ins(%true : i1) outs(%alloc_33 : memref<11xi1>)

    %alloc_147 = memref.alloc() : memref<11xi1>    
    linalg.fill ins(%false : i1) outs(%alloc_147 : memref<11xi1>)

    memref.copy %alloc_147, %alloc_33 : memref<11xi1> to memref<11xi1>
    
    %dim = memref.dim %alloc_33, %idx0 : memref<11xi1>
    %0 = scf.for %arg1 = %idx0 to %dim step %idx1 iter_args(%arg2 = %false) -> (i1) {
      %1 = memref.load %alloc_33[%arg1] : memref<11xi1>
      vector.print %1 : i1
      %2 = arith.addi %arg2, %1 : i1
      scf.yield %2 : i1
    }
    
    vector.print %0 : i1
    %1 = arith.sitofp %0 : i1 to f32
    return %1 : f32
  }
}

When I ran /data/tmp/v1102/llvm-project/build/bin/mlir-opt --convert-vector-to-llvm --convert-linalg-to-loops --convert-scf-to-cf --finalize-memref-to-llvm --convert-arith-to-llvm --convert-func-to-llvm --convert-index-to-llvm --reconcile-unrealized-casts test.mlir | /data/tmp/v1102/llvm-project/build/bin/mlir-cpu-runner -e func1 --shared-libs=/data/tmp/v1102/llvm-project/build/lib/libmlir_runner_utils.so,/data/tmp/v1102/llvm-project/build/lib/libmlir_c_runner_utils.so on the program, I got the result of:

0
0
0
0
0
0
0
0
0
0
0
0
0.000000e+00

However, when I ran /data/tmp/v1102/llvm-project/build/bin/mlir-opt --test-linalg-transform-patterns=test-patterns --convert-vector-to-llvm --convert-linalg-to-loops --convert-scf-to-cf --finalize-memref-to-llvm --convert-arith-to-llvm --convert-func-to-llvm --convert-index-to-llvm --reconcile-unrealized-casts test.mlir | /data/tmp/v1102/llvm-project/build/bin/mlir-cpu-runner -e func1 --shared-libs=/data/tmp/v1102/llvm-project/build/lib/libmlir_runner_utils.so,/data/tmp/v1102/llvm-project/build/lib/libmlir_c_runner_utils.so on the program, I got the result of:

0
0
1
1
1
1
1
1
1
1
1
1
-1.000000e+00

The above two results seem to be inconsistent. I'm not sure if there is any bug in my program or if the wrong usage of the above passes caused these results.

My git version is 33bdb53.

And I compared the LLVM IR obtained by two commands, the LLVM IR that uses

/data/tmp/v1205/llvm-project/build/bin/mlir-opt \
--test-linalg-transform-patterns=test-patterns \
--convert-vector-to-llvm --convert-linalg-to-loops --convert-scf-to-cf --finalize-memref-to-llvm --convert-arith-to-llvm --convert-func-to-llvm --convert-index-to-llvm --reconcile-unrealized-casts test.mlir | /data/tmp/v1205/llvm-project/build/bin/mlir-translate --mlir-to-llvmir

is:

...
%29 = extractvalue { ptr, ptr, i64, [1 x i64], [1 x i64] } %20, 1
%30 = getelementptr i1, ptr %29, i64 0
%31 = load <11 x i1>, ptr %30, align 1
%32 = extractvalue { ptr, ptr, i64, [1 x i64], [1 x i64] } %6, 1
%33 = getelementptr i1, ptr %32, i64 0
store <11 x i1> %31, ptr %33, align 1
...

The LLVM IR that uses

/data/tmp/v1205/llvm-project/build/bin/mlir-opt \
--convert-vector-to-llvm --convert-linalg-to-loops --convert-scf-to-cf --finalize-memref-to-llvm --convert-arith-to-llvm --convert-func-to-llvm --convert-index-to-llvm --reconcile-unrealized-casts test.mlir | /data/tmp/v1205/llvm-project/build/bin/mlir-translate --mlir-to-llvmir

is:

...
%29 = extractvalue { ptr, ptr, i64, [1 x i64], [1 x i64] } %20, 3, 0
%30 = mul i64 1, %29
%31 = mul i64 %30, ptrtoint (ptr getelementptr (i1, ptr null, i32 1) to i64)
%32 = extractvalue { ptr, ptr, i64, [1 x i64], [1 x i64] } %20, 1
%33 = extractvalue { ptr, ptr, i64, [1 x i64], [1 x i64] } %20, 2    ; which is 0
%34 = getelementptr i1, ptr %32, i64 %33
%35 = extractvalue { ptr, ptr, i64, [1 x i64], [1 x i64] } %6, 1
%36 = extractvalue { ptr, ptr, i64, [1 x i64], [1 x i64] } %6, 2     ; which is 0
%37 = getelementptr i1, ptr %35, i64 %36
call void @llvm.memcpy.p0.p0.i64(ptr %37, ptr %34, i64 %31, i1 false)
...

The above two LLVM IRs are the same for me.
Does the inconsistency come from the process that converts LLVM IR to Assembly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @banach-space@EugeneZelenko@pifon2a@AnonymousBugreporter1

        Issue actions

          [mlir] Inconsistent results for memref.copy · Issue #114656 · llvm/llvm-project