Skip to content

when offloading nested structs, sometimes "PluginInterface" error: Failure to synchronize stream (nil): Error in cuStreamSynchronize: an illegal memory access was encountered happens #126342

Open
@bschulz81

Description

@bschulz81

The following code attached to this bugreport has no memory problems. but it fails with

Entirely on gpu
"PluginInterface" error: Failure to synchronize stream (nil): Error in cuStreamSynchronize: an illegal memory access was encountered
omptarget error: Consult https://openmp.llvm.org/design/Runtimes.html for debugging options.
mdspan_acc.h:3807:9: omptarget fatal error 1: failure of target construct while offloading is mandatory

Process returned -1 (0xFFFFFFFF) execution time : 0.394 s

Upon investigating, it seems not to offload dQ and dR correctly in the function qr_decomposition in line 3792 which calls
create_in_struct(dA);
create_out_struct(dQ);
create_out_struct(dR);

that do the mappings with

template
void inline create_in_struct(const datastruct& dA)
{
#pragma omp target enter data map(to: dA,dA.pdata[0:dA.pdatalength],dA.pextents[0:dA.prank],dA.pstrides[0:dA.prank])

}

template
void inline create_out_struct(datastruct& dA)
{
#pragma omp target enter data map(to: dA) map(alloc: dA.pdata[0:dA.pdatalength]) map(to:dA.pextents[0:dA.prank],dA.pstrides[0:dA.prank])
}
template

and then calls

gpu_qr_decomposition.

If one removes the lines 1863 -1997 in gpu_qr_decomposition, especially these lines:

#pragma omp parallel for
for (size_t i=0; i<Q.pdatalength; i++)
{
Q.pdata[i]=0;
}
//
//
#pragma omp parallel for
for (size_t i=0; i<R.pdatalength; i++)
{
R.pdata[i]=0;
}

then the code suddenly compiles...

The strange problem is that before, similar code is called for a cholesky and an lu decomposition.

here, the code works provided i compile it without optimization.

if i compile the code with -O2, then clang takes veeeeeeerry long to finish, and for the result, the lu decomposition crashes too...

There is no problem with the array sizes, as I have checked these, and these are just test cases with 9 elements (3x3 matrix).

main_acc.cpp.txt

mdspan_acc.h.txt

CMakeLists.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions