Skip to content

lld/mingw: Behavior difference with ld.bfd when re-exporting imported global #84424

Open
@Keno

Description

@Keno

Over in JuliaLang/julia#53421, we are seeing an issue where a library linked with ld.bfd is working, but lld is not (when targeting win64/mingw). The code in question is a bit strange, but it's essentially a sanity check to make sure that there weren't any linking mistakes and that there aren't multiple copies of the runtime library floating around (e.g. a common mistake is to load both debug and release copies of the runtime library into the same address space).

In reduced terms, we have a source file that at the LLVM level looks like this (fully minimized):

# cat metadata.ll
; ModuleID = 'metadata_opt.bc'
source_filename = "metadata"
target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-w64-windows-gnu-coff"

@jl_RTLD_DEFAULT_handle = external dllimport constant ptr
@jl_RTLD_DEFAULT_handle_pointer = dllexport constant ptr @jl_RTLD_DEFAULT_handle

define x86_stdcallcc i32 @_DllMainCRTStartup(ptr %0, i32 %1, ptr %2) {
top:
  ret i32 1
}

where jl_RTLD_DEFAULT_handle is just some dllexported global defined in libjulia-internal.dll, but for the present purposes, we may just treat it as the following:

# cat fakelibjulia.c
__declspec(dllexport) void *jl_RTLD_DEFAULT_handle = 0;
# gcc -shared -o libfakejulia.dll fakelibjulia.c 

Doing the following:

# llc.exe --filetype=obj -o metadata.o metadata.ll
# ld --disable-runtime-pseudo-reloc -shared -o metadata-bfd.dll --whole-archive metadata.o --no-whole-archive libfakejulia.dll
# lld.exe  -flavor gnu -m i386pep -Bdynamic  --disable-runtime-pseudo-reloc -shared -o metadata-lld.dll --whole-archive metadata.o --no-whole-archive -L./usr/bin -ljulia -ljulia-internal
lld: error: automatic dllimport of jl_RTLD_DEFAULT_handle in metadata.o requires pseudo relocations
# lld.exe  -flavor gnu -m i386pep -Bdynamic  --enable-runtime-pseudo-reloc -shared -o metadata-lld.dll --whole-archive metadata.o --no-whole-archive -L./usr/bin -ljulia -ljulia-internal

And then loading both libraries and comparing the pointers:

# cat print_pointers.c
#include <stdio.h>

extern __declspec(dllimport) void *jl_RTLD_DEFAULT_handle;
extern __declspec(dllimport) void *jl_RTLD_DEFAULT_handle_pointer;

int main(void) {
        printf("Pointers: %p %p\n", &jl_RTLD_DEFAULT_handle, jl_RTLD_DEFAULT_handle_pointer);
        return 0;
}
# gcc -o print_pointers-bfd.exe print_pointers.c metadata-bfd.dll libfakejulia.dll
# gcc -o print_pointers-lld.exe print_pointers.c metadata-lld.dll libfakejulia.dll
# ./print_pointers-bfd.exe
Pointers: 00007ffafef77020 00007ffafef77020
# ./print_pointers-lld.exe
Pointers: 00007ffafef77020 00007ffb0d1b20e0

The BFD result is expected. The LLD result is not (separate and apart from the fact that lld and bfd disagree over the necessity of runtime pseudo relocations - the output is the same for bfd if those are enabled, although it does require linking the mingw dllcrt in that case).

Finally, I want to note that the issue is the library, not the executable. The same behavior is observed via dlsym (as in the original issue).

julia> lib_lld = dlopen("metadata-lld.dll")
Ptr{Nothing} @0x00007ffb0d1b0000

julia> lib_bfd = dlopen("metadata-bfd.dll")
Ptr{Nothing} @0x00007ffb08ff0000

julia> unsafe_load(Ptr{Ptr{Cvoid}}(dlsym(lib_lld, "jl_RTLD_DEFAULT_handle_pointer")))
Ptr{Nothing} @0x00007ffb0d1b20e0

julia> unsafe_load(Ptr{Ptr{Cvoid}}(dlsym(lib_bfd, "jl_RTLD_DEFAULT_handle_pointer")))
Ptr{Nothing} @0x00007ffab81a9818

julia> cglobal(:jl_RTLD_DEFAULT_handle, Ptr{Cvoid})
Ptr{Ptr{Nothing}} @0x00007ffab81a9818

In particular, the value that lld gives is one extra level of indirection removed from that used by bfd:

julia> unsafe_load(unsafe_load(Ptr{Ptr{Ptr{Cvoid}}}(dlsym(lib_lld,` "jl_RTLD_DEFAULT_handle_pointer"))))
Ptr{Nothing} @0x00007ffb0d1b20e0

Versions:

# ld --version
GNU ld (GNU Binutils) 2.42
Copyright (C) 2024 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) a later version.
This program has absolutely no warranty.
# lld.exe -flavor ld --version
LLD 16.0.6 (compatible with GNU linkers)

Activity

llvmbot

llvmbot commented on Mar 8, 2024

@llvmbot
Member

@llvm/issue-subscribers-lld-coff

Author: Keno Fischer (Keno)

Over in https://github.com/JuliaLang/julia/pull/53421, we are seeing an issue where a library linked with `ld.bfd` is working, but `lld` is not (when targeting win64/mingw). The code in question is a bit strange, but it's essentially a sanity check to make sure that there weren't any linking mistakes and that there aren't multiple copies of the runtime library floating around (e.g. a common mistake is to load both debug and release copies of the runtime library into the same address space).

In reduced terms, we have a source file that at the LLVM level looks like this (fully minimized):

# cat metadata.ll
; ModuleID = 'metadata_opt.bc'
source_filename = "metadata"
target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-w64-windows-gnu-coff"

@<!-- -->jl_RTLD_DEFAULT_handle = external dllimport constant ptr
@<!-- -->jl_RTLD_DEFAULT_handle_pointer = dllexport constant ptr @<!-- -->jl_RTLD_DEFAULT_handle

define x86_stdcallcc i32 @<!-- -->_DllMainCRTStartup(ptr %0, i32 %1, ptr %2) {
top:
  ret i32 1
}

where jl_RTLD_DEFAULT_handle is just some dllexported global defined in libjulia-internal.dll, but for the present purposes, we may just treat it as the following:

# cat fakelibjulia.c
__declspec(dllexport) void *jl_RTLD_DEFAULT_handle = 0;
# gcc -shared -o libfakejulia.dll fakelibjulia.c 

Doing the following:

# llc.exe --filetype=obj -o metadata.o metadata.ll
# ld --disable-runtime-pseudo-reloc -shared -o metadata-bfd.dll --whole-archive metadata.o --no-whole-archive libfakejulia.dll
# lld.exe  -flavor gnu -m i386pep -Bdynamic  --disable-runtime-pseudo-reloc -shared -o metadata-lld.dll --whole-archive metadata.o --no-whole-archive -L./usr/bin -ljulia -ljulia-internal
lld: error: automatic dllimport of jl_RTLD_DEFAULT_handle in metadata.o requires pseudo relocations
# lld.exe  -flavor gnu -m i386pep -Bdynamic  --enable-runtime-pseudo-reloc -shared -o metadata-lld.dll --whole-archive metadata.o --no-whole-archive -L./usr/bin -ljulia -ljulia-internal

And then loading both libraries and comparing the pointers:

# cat print_pointers.c
#include &lt;stdio.h&gt;

extern __declspec(dllimport) void *jl_RTLD_DEFAULT_handle;
extern __declspec(dllimport) void *jl_RTLD_DEFAULT_handle_pointer;

int main(void) {
        printf("Pointers: %p %p\n", &amp;jl_RTLD_DEFAULT_handle, jl_RTLD_DEFAULT_handle_pointer);
        return 0;
}
# gcc -o print_pointers-bfd.exe print_pointers.c metadata-bfd.dll libfakejulia.dll
# gcc -o print_pointers-lld.exe print_pointers.c metadata-lld.dll libfakejulia.dll
# ./print_pointers-bfd.exe
Pointers: 00007ffafef77020 00007ffafef77020
# ./print_pointers-lld.exe
Pointers: 00007ffafef77020 00007ffb0d1b20e0

The BFD result is expected. The LLD result is not (separate and apart from the fact that lld and bfd disagree over the necessity of runtime pseudo relocations - the output is the same for bfd if those are enabled, although it does require linking the mingw dllcrt in that case).

Finally, I want to note that the issue is the library, not the executable. The same behavior is observed via dlsym (as in the original issue).

julia&gt; lib_lld = dlopen("metadata-lld.dll")
Ptr{Nothing} @<!-- -->0x00007ffb0d1b0000

julia&gt; lib_bfd = dlopen("metadata-bfd.dll")
Ptr{Nothing} @<!-- -->0x00007ffb08ff0000

julia&gt; unsafe_load(Ptr{Ptr{Cvoid}}(dlsym(lib_lld, "jl_RTLD_DEFAULT_handle_pointer")))
Ptr{Nothing} @<!-- -->0x00007ffb0d1b20e0

julia&gt; unsafe_load(Ptr{Ptr{Cvoid}}(dlsym(lib_bfd, "jl_RTLD_DEFAULT_handle_pointer")))
Ptr{Nothing} @<!-- -->0x00007ffab81a9818

julia&gt; cglobal(:jl_RTLD_DEFAULT_handle, Ptr{Cvoid})
Ptr{Ptr{Nothing}} @<!-- -->0x00007ffab81a9818

In particular, the value that lld gives is one extra level of indirection removed from that used by bfd:

julia&gt; unsafe_load(unsafe_load(Ptr{Ptr{Ptr{Cvoid}}}(dlsym(lib_lld,` "jl_RTLD_DEFAULT_handle_pointer"))))
Ptr{Nothing} @<!-- -->0x00007ffb0d1b20e0

Versions:

# ld --version
GNU ld (GNU Binutils) 2.42
Copyright (C) 2024 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) a later version.
This program has absolutely no warranty.
# lld.exe -flavor ld --version
LLD 16.0.6 (compatible with GNU linkers)
llvmbot

llvmbot commented on Mar 8, 2024

@llvmbot
Member

@llvm/issue-subscribers-julialang

Author: Keno Fischer (Keno)

Over in https://github.com/JuliaLang/julia/pull/53421, we are seeing an issue where a library linked with `ld.bfd` is working, but `lld` is not (when targeting win64/mingw). The code in question is a bit strange, but it's essentially a sanity check to make sure that there weren't any linking mistakes and that there aren't multiple copies of the runtime library floating around (e.g. a common mistake is to load both debug and release copies of the runtime library into the same address space).

In reduced terms, we have a source file that at the LLVM level looks like this (fully minimized):

# cat metadata.ll
; ModuleID = 'metadata_opt.bc'
source_filename = "metadata"
target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-w64-windows-gnu-coff"

@<!-- -->jl_RTLD_DEFAULT_handle = external dllimport constant ptr
@<!-- -->jl_RTLD_DEFAULT_handle_pointer = dllexport constant ptr @<!-- -->jl_RTLD_DEFAULT_handle

define x86_stdcallcc i32 @<!-- -->_DllMainCRTStartup(ptr %0, i32 %1, ptr %2) {
top:
  ret i32 1
}

where jl_RTLD_DEFAULT_handle is just some dllexported global defined in libjulia-internal.dll, but for the present purposes, we may just treat it as the following:

# cat fakelibjulia.c
__declspec(dllexport) void *jl_RTLD_DEFAULT_handle = 0;
# gcc -shared -o libfakejulia.dll fakelibjulia.c 

Doing the following:

# llc.exe --filetype=obj -o metadata.o metadata.ll
# ld --disable-runtime-pseudo-reloc -shared -o metadata-bfd.dll --whole-archive metadata.o --no-whole-archive libfakejulia.dll
# lld.exe  -flavor gnu -m i386pep -Bdynamic  --disable-runtime-pseudo-reloc -shared -o metadata-lld.dll --whole-archive metadata.o --no-whole-archive -L./usr/bin -ljulia -ljulia-internal
lld: error: automatic dllimport of jl_RTLD_DEFAULT_handle in metadata.o requires pseudo relocations
# lld.exe  -flavor gnu -m i386pep -Bdynamic  --enable-runtime-pseudo-reloc -shared -o metadata-lld.dll --whole-archive metadata.o --no-whole-archive -L./usr/bin -ljulia -ljulia-internal

And then loading both libraries and comparing the pointers:

# cat print_pointers.c
#include &lt;stdio.h&gt;

extern __declspec(dllimport) void *jl_RTLD_DEFAULT_handle;
extern __declspec(dllimport) void *jl_RTLD_DEFAULT_handle_pointer;

int main(void) {
        printf("Pointers: %p %p\n", &amp;jl_RTLD_DEFAULT_handle, jl_RTLD_DEFAULT_handle_pointer);
        return 0;
}
# gcc -o print_pointers-bfd.exe print_pointers.c metadata-bfd.dll libfakejulia.dll
# gcc -o print_pointers-lld.exe print_pointers.c metadata-lld.dll libfakejulia.dll
# ./print_pointers-bfd.exe
Pointers: 00007ffafef77020 00007ffafef77020
# ./print_pointers-lld.exe
Pointers: 00007ffafef77020 00007ffb0d1b20e0

The BFD result is expected. The LLD result is not (separate and apart from the fact that lld and bfd disagree over the necessity of runtime pseudo relocations - the output is the same for bfd if those are enabled, although it does require linking the mingw dllcrt in that case).

Finally, I want to note that the issue is the library, not the executable. The same behavior is observed via dlsym (as in the original issue).

julia&gt; lib_lld = dlopen("metadata-lld.dll")
Ptr{Nothing} @<!-- -->0x00007ffb0d1b0000

julia&gt; lib_bfd = dlopen("metadata-bfd.dll")
Ptr{Nothing} @<!-- -->0x00007ffb08ff0000

julia&gt; unsafe_load(Ptr{Ptr{Cvoid}}(dlsym(lib_lld, "jl_RTLD_DEFAULT_handle_pointer")))
Ptr{Nothing} @<!-- -->0x00007ffb0d1b20e0

julia&gt; unsafe_load(Ptr{Ptr{Cvoid}}(dlsym(lib_bfd, "jl_RTLD_DEFAULT_handle_pointer")))
Ptr{Nothing} @<!-- -->0x00007ffab81a9818

julia&gt; cglobal(:jl_RTLD_DEFAULT_handle, Ptr{Cvoid})
Ptr{Ptr{Nothing}} @<!-- -->0x00007ffab81a9818

In particular, the value that lld gives is one extra level of indirection removed from that used by bfd:

julia&gt; unsafe_load(unsafe_load(Ptr{Ptr{Ptr{Cvoid}}}(dlsym(lib_lld,` "jl_RTLD_DEFAULT_handle_pointer"))))
Ptr{Nothing} @<!-- -->0x00007ffb0d1b20e0

Versions:

# ld --version
GNU ld (GNU Binutils) 2.42
Copyright (C) 2024 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) a later version.
This program has absolutely no warranty.
# lld.exe -flavor ld --version
LLD 16.0.6 (compatible with GNU linkers)
mstorsjo

mstorsjo commented on Apr 12, 2024

@mstorsjo
Member

I've had a look at this now...

The situation in metadata.ll cannot really be linked without any form of linker tricks for autoimporting jl_RTLD_DEFAULT_handle in one way or another. By default, ld.bfd would also like to create runtime pseudo relocations for handling this; if you remove the --disable-runtime-pseudo-reloc argument from the invocation of ld.bfd, you end up with this error message:

/usr/bin/x86_64-w64-mingw32-ld:o׆@V_ertr000004.o:(.rdata+0x0): undefined reference to `_pei386_runtime_relocator'

This makes it clear that when using runtime psudo relocations, they rely on runtime support to be sorted out.

Now in the case of LLD, LLD assumes that there will be runtime support, but won't throw an error when that's not linked in. If you'd link metadata-lld.dll by invoking Clang, so that you get the regular C runtime startup routines linked in, e.g. just clang metadata.o -shared -o metadata-lld.dll libfakejulia.dll, then it passes the test.

What ld.bfd does, when runtime pseudo relocations is disabled, is that it creates a different kind of hack to sort out the autoimport, by adding the same DLL multiple times in the import directory; for each time an autoimport reference has to be fixed, it imports the DLL once more, importing only one symbol from the DLL, but instead of pointing at the IAT (import address table), it points at the right data section of the DLL. So when the Windows loader loads the DLL, it thinks it is filling in addresses within the IAT, it actually patches sections within the data section.

This approach doesn't work generally in all cases of autoimport, so I think this approach was deprecated when x86_64 started becoming a thing.

LLD doesn't support this kind of way of fixing autoimports (since it was long deprecated and practically unused already when I started implementing this) - but unfortunately, the mechanism that LLD uses does require runtime support for the pseudo relocations.

LLD does, on the other hand, have a different trick; in many cases in compiler generated code, LLD can avoid the runtime pseudo relocations altogether anyway - but it doesn't work in your case right now.

As an example:

extern int maybe_imported;
extern __declspec(dllimport) int explicitly_imported;
int get1(void) { return maybe_imported; }
int get2(void) { return explicitly_imported; }
$ clang -target x86_64-w64-mingw32 -S -o - get.c -O2
get1:
	movq	.refptr.maybe_imported(%rip), %rax
	movl	(%rax), %eax
	retq
get2:
	movq	__imp_explicitly_imported(%rip), %rax
	movl	(%rax), %eax
	retq

	.section	.rdata$.refptr.maybe_imported,"dr",discard,.refptr.maybe_imported
	.globl	.refptr.maybe_imported
.refptr.maybe_imported:
	.quad	maybe_imported

So for any variable that we're not sure is from the same DLL, we do indirection via a .refptr.<symbolname> stub, which is a comdat section.

When LLD notices that we need to autoimport maybe_imported, it also looks for symbols named .refptr.maybe_imported, and if found, and if this happens to be a single separate section which is the size of a pointer, it gets removed and .refptr.maybe_imported gets redirected towards the IAT entry. See https://github.com/llvm/llvm-project/blob/llvmorg-19-init/lld/COFF/SymbolTable.cpp#L374-L388 for the logic for that.

In this case, the jl_RTLD_DEFAULT_handle_pointer variable would almost qualify for this, if it would be compiled with --data-sections, but the linker doesn't know to look for any arbitrary variable that might be a suitable candidate for omitting, it just looks for a symbol .refptr.<variable>.

For this case, it might be possible to help LLD fix this case, if you could slip in an alias like this (and build metadata.ll with --data-sections):

module asm ".globl .refptr.jl_RTLD_DEFAULT_handle"
module asm ".refptr.jl_RTLD_DEFAULT_handle = jl_RTLD_DEFAULT_handle_pointer"

This does seem to somewhat have the desired effect on LLD, but when doing that, LLD then crashes on some other unexpected situation. I'll try to see if that is fixable...

So, TL;DR, the main options I see are:

  • Link in enough of the mingw base C runtime files, for fixing the pseudo relocations at runtime (on startup)
  • Implement the old way of fixing autoimports in LLD, without needing runtime support, as an optional behaviour. Not very keen on doing this, as I'm not sure how many others would use it.
  • Pursue the case where we make LLD merge jl_RTLD_DEFAULT_handle_pointer into the IAT of jl_RTLD_DEFAULT_handle, by telling it that it's the same as other .refptr variables. It doesn't work right now but I don't think it's unfixable. This requires injecting the extra alias though.
Keno

Keno commented on Apr 12, 2024

@Keno
MemberAuthor

but won't throw an error when that's not linked in

This feels like it may be part of the confusion. I don't think we actually have a problem linking the mingw startup code, but if the linker doesn't complain, it's hard to know that it's required.

Pursue the case where we make LLD merge jl_RTLD_DEFAULT_handle_pointer into the IAT of jl_RTLD_DEFAULT_handle, by telling it that it's the same as other .refptr variables. It doesn't work right now but I don't think it's unfixable. This requires injecting the extra alias though.

I'm interested in giving this approach a try. We fully control both sides of the build process (as well as the version of lld used), so if you manage to come up with a fix for the LLD issue, I'll happily give it a try.

Keno

Keno commented on Apr 12, 2024

@Keno
MemberAuthor

Also, thank you so much for your detailed analysis, this very much filled in some gaps, in particular, the piece I was missing was

What ld.bfd does, when runtime pseudo relocations is disabled, is that it creates a different kind of hack to sort out the autoimport, by adding the same DLL multiple times in the import directory

so with that, I finally understand what's going on here.

(And for my own reference, the ld.bfd scheme is documented here: documentation implementation)

mstorsjo

mstorsjo commented on Apr 12, 2024

@mstorsjo
Member

but won't throw an error when that's not linked in

This feels like it may be part of the confusion. I don't think we actually have a problem linking the mingw startup code, but if the linker doesn't complain, it's hard to know that it's required.

#88573 should address this

Pursue the case where we make LLD merge jl_RTLD_DEFAULT_handle_pointer into the IAT of jl_RTLD_DEFAULT_handle, by telling it that it's the same as other .refptr variables. It doesn't work right now but I don't think it's unfixable. This requires injecting the extra alias though.

I'm interested in giving this approach a try. We fully control both sides of the build process (as well as the version of lld used), so if you manage to come up with a fix for the LLD issue, I'll happily give it a try.

I had a closer look at this, and I don't have a good idea for how to proceed to fix it unfortunately. The issues is that we have two symbols, .refptr. jl_RTLD_DEFAULT_handle and jl_RTLD_DEFAULT_handle_pointer both pointing at the same pointer-sized section chunk. When doing the autoimport of jl_RTLD_DEFAULT_handle we find .refptr. jl_RTLD_DEFAULT_handle and conclude that we can remove and replace it with __imp_ jl_RTLD_DEFAULT_handle. But we still have jl_RTLD_DEFAULT_handle_pointer still pointing at the now orphaned section chunk. So this section replacement logic doesn't work if we have other symbols pointing at the section, other than the .refptr one. To fix it, we'd need to sequentially scan over all symbols and see if there are other symbols that point at the same data - which seems awfully inefficient. (And if we'd do that, we could just ditch the extra .refptr alias anyway, and just extend the logic to always look for potential candidates by exhaustive search.)

I think I need to leave this be for a while and see if I come up with a better way forward.

Implementing the ancient ld.bfd approach (which you manage to dig up the description for, thanks!) could be one option. But I'd almost mostly do that to fulfill my own curiosity - not that it's a feature that one should be using in the general case. And I'm not sure how much work it is, or whether I'll actually get around to it. (As the description says, it's almost borderline invalid, but still mostly ok.)

Or taking another step back, I wonder if it's possible to redesign your linking check, to avoid the need for autoimported data in the first place. DLL exports can be in a different form, where they don't point at data in the current DLL at all, but point at data in another DLL. This is possible to do e.g. with -Wl,-Xlink=-export: jl_RTLD_DEFAULT_handle=libfakejulia. jl_RTLD_DEFAULT_handle (I don't think the mingw level linker interface exposes this feature very well though.) Then you have a symbol exported jl_RTLD_DEFAULT_handle which should end up pointing at the same data as the same symbol in libfakejulia.dll. I.e. it's the same pointer, not a pointer containing the value of the other pointer.

Or another way is to expose a getter function, instead of a data symbol. A C level (or equivalent) getter function should generate a regular .refptr indirection, which we cope with nicely in lld, avoiding the runtime pseudo relocation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @mstorsjo@Keno@llvmbot

        Issue actions

          lld/mingw: Behavior difference with ld.bfd when re-exporting imported global · Issue #84424 · llvm/llvm-project