Description
Over in JuliaLang/julia#53421, we are seeing an issue where a library linked with ld.bfd
is working, but lld
is not (when targeting win64/mingw). The code in question is a bit strange, but it's essentially a sanity check to make sure that there weren't any linking mistakes and that there aren't multiple copies of the runtime library floating around (e.g. a common mistake is to load both debug and release copies of the runtime library into the same address space).
In reduced terms, we have a source file that at the LLVM level looks like this (fully minimized):
# cat metadata.ll
; ModuleID = 'metadata_opt.bc'
source_filename = "metadata"
target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-w64-windows-gnu-coff"
@jl_RTLD_DEFAULT_handle = external dllimport constant ptr
@jl_RTLD_DEFAULT_handle_pointer = dllexport constant ptr @jl_RTLD_DEFAULT_handle
define x86_stdcallcc i32 @_DllMainCRTStartup(ptr %0, i32 %1, ptr %2) {
top:
ret i32 1
}
where jl_RTLD_DEFAULT_handle
is just some dllexported
global defined in libjulia-internal.dll
, but for the present purposes, we may just treat it as the following:
# cat fakelibjulia.c
__declspec(dllexport) void *jl_RTLD_DEFAULT_handle = 0;
# gcc -shared -o libfakejulia.dll fakelibjulia.c
Doing the following:
# llc.exe --filetype=obj -o metadata.o metadata.ll
# ld --disable-runtime-pseudo-reloc -shared -o metadata-bfd.dll --whole-archive metadata.o --no-whole-archive libfakejulia.dll
# lld.exe -flavor gnu -m i386pep -Bdynamic --disable-runtime-pseudo-reloc -shared -o metadata-lld.dll --whole-archive metadata.o --no-whole-archive -L./usr/bin -ljulia -ljulia-internal
lld: error: automatic dllimport of jl_RTLD_DEFAULT_handle in metadata.o requires pseudo relocations
# lld.exe -flavor gnu -m i386pep -Bdynamic --enable-runtime-pseudo-reloc -shared -o metadata-lld.dll --whole-archive metadata.o --no-whole-archive -L./usr/bin -ljulia -ljulia-internal
And then loading both libraries and comparing the pointers:
# cat print_pointers.c
#include <stdio.h>
extern __declspec(dllimport) void *jl_RTLD_DEFAULT_handle;
extern __declspec(dllimport) void *jl_RTLD_DEFAULT_handle_pointer;
int main(void) {
printf("Pointers: %p %p\n", &jl_RTLD_DEFAULT_handle, jl_RTLD_DEFAULT_handle_pointer);
return 0;
}
# gcc -o print_pointers-bfd.exe print_pointers.c metadata-bfd.dll libfakejulia.dll
# gcc -o print_pointers-lld.exe print_pointers.c metadata-lld.dll libfakejulia.dll
# ./print_pointers-bfd.exe
Pointers: 00007ffafef77020 00007ffafef77020
# ./print_pointers-lld.exe
Pointers: 00007ffafef77020 00007ffb0d1b20e0
The BFD result is expected. The LLD result is not (separate and apart from the fact that lld and bfd disagree over the necessity of runtime pseudo relocations - the output is the same for bfd if those are enabled, although it does require linking the mingw dllcrt in that case).
Finally, I want to note that the issue is the library, not the executable. The same behavior is observed via dlsym (as in the original issue).
julia> lib_lld = dlopen("metadata-lld.dll")
Ptr{Nothing} @0x00007ffb0d1b0000
julia> lib_bfd = dlopen("metadata-bfd.dll")
Ptr{Nothing} @0x00007ffb08ff0000
julia> unsafe_load(Ptr{Ptr{Cvoid}}(dlsym(lib_lld, "jl_RTLD_DEFAULT_handle_pointer")))
Ptr{Nothing} @0x00007ffb0d1b20e0
julia> unsafe_load(Ptr{Ptr{Cvoid}}(dlsym(lib_bfd, "jl_RTLD_DEFAULT_handle_pointer")))
Ptr{Nothing} @0x00007ffab81a9818
julia> cglobal(:jl_RTLD_DEFAULT_handle, Ptr{Cvoid})
Ptr{Ptr{Nothing}} @0x00007ffab81a9818
In particular, the value that lld gives is one extra level of indirection removed from that used by bfd:
julia> unsafe_load(unsafe_load(Ptr{Ptr{Ptr{Cvoid}}}(dlsym(lib_lld,` "jl_RTLD_DEFAULT_handle_pointer"))))
Ptr{Nothing} @0x00007ffb0d1b20e0
Versions:
# ld --version
GNU ld (GNU Binutils) 2.42
Copyright (C) 2024 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) a later version.
This program has absolutely no warranty.
# lld.exe -flavor ld --version
LLD 16.0.6 (compatible with GNU linkers)
Activity
llvmbot commentedon Mar 8, 2024
@llvm/issue-subscribers-lld-coff
Author: Keno Fischer (Keno)
In reduced terms, we have a source file that at the LLVM level looks like this (fully minimized):
where
jl_RTLD_DEFAULT_handle
is just somedllexported
global defined inlibjulia-internal.dll
, but for the present purposes, we may just treat it as the following:Doing the following:
And then loading both libraries and comparing the pointers:
The BFD result is expected. The LLD result is not (separate and apart from the fact that lld and bfd disagree over the necessity of runtime pseudo relocations - the output is the same for bfd if those are enabled, although it does require linking the mingw dllcrt in that case).
Finally, I want to note that the issue is the library, not the executable. The same behavior is observed via dlsym (as in the original issue).
In particular, the value that lld gives is one extra level of indirection removed from that used by bfd:
Versions:
llvmbot commentedon Mar 8, 2024
@llvm/issue-subscribers-julialang
Author: Keno Fischer (Keno)
In reduced terms, we have a source file that at the LLVM level looks like this (fully minimized):
where
jl_RTLD_DEFAULT_handle
is just somedllexported
global defined inlibjulia-internal.dll
, but for the present purposes, we may just treat it as the following:Doing the following:
And then loading both libraries and comparing the pointers:
The BFD result is expected. The LLD result is not (separate and apart from the fact that lld and bfd disagree over the necessity of runtime pseudo relocations - the output is the same for bfd if those are enabled, although it does require linking the mingw dllcrt in that case).
Finally, I want to note that the issue is the library, not the executable. The same behavior is observed via dlsym (as in the original issue).
In particular, the value that lld gives is one extra level of indirection removed from that used by bfd:
Versions:
--compile=all
mode after CodeInstance refactor JuliaLang/julia#53421mstorsjo commentedon Apr 12, 2024
I've had a look at this now...
The situation in
metadata.ll
cannot really be linked without any form of linker tricks for autoimportingjl_RTLD_DEFAULT_handle
in one way or another. By default, ld.bfd would also like to create runtime pseudo relocations for handling this; if you remove the--disable-runtime-pseudo-reloc
argument from the invocation of ld.bfd, you end up with this error message:This makes it clear that when using runtime psudo relocations, they rely on runtime support to be sorted out.
Now in the case of LLD, LLD assumes that there will be runtime support, but won't throw an error when that's not linked in. If you'd link
metadata-lld.dll
by invoking Clang, so that you get the regular C runtime startup routines linked in, e.g. justclang metadata.o -shared -o metadata-lld.dll libfakejulia.dll
, then it passes the test.What ld.bfd does, when runtime pseudo relocations is disabled, is that it creates a different kind of hack to sort out the autoimport, by adding the same DLL multiple times in the import directory; for each time an autoimport reference has to be fixed, it imports the DLL once more, importing only one symbol from the DLL, but instead of pointing at the IAT (import address table), it points at the right data section of the DLL. So when the Windows loader loads the DLL, it thinks it is filling in addresses within the IAT, it actually patches sections within the data section.
This approach doesn't work generally in all cases of autoimport, so I think this approach was deprecated when x86_64 started becoming a thing.
LLD doesn't support this kind of way of fixing autoimports (since it was long deprecated and practically unused already when I started implementing this) - but unfortunately, the mechanism that LLD uses does require runtime support for the pseudo relocations.
LLD does, on the other hand, have a different trick; in many cases in compiler generated code, LLD can avoid the runtime pseudo relocations altogether anyway - but it doesn't work in your case right now.
As an example:
So for any variable that we're not sure is from the same DLL, we do indirection via a
.refptr.<symbolname>
stub, which is a comdat section.When LLD notices that we need to autoimport
maybe_imported
, it also looks for symbols named.refptr.maybe_imported
, and if found, and if this happens to be a single separate section which is the size of a pointer, it gets removed and.refptr.maybe_imported
gets redirected towards the IAT entry. See https://github.com/llvm/llvm-project/blob/llvmorg-19-init/lld/COFF/SymbolTable.cpp#L374-L388 for the logic for that.In this case, the
jl_RTLD_DEFAULT_handle_pointer
variable would almost qualify for this, if it would be compiled with--data-sections
, but the linker doesn't know to look for any arbitrary variable that might be a suitable candidate for omitting, it just looks for a symbol.refptr.<variable>
.For this case, it might be possible to help LLD fix this case, if you could slip in an alias like this (and build
metadata.ll
with--data-sections
):This does seem to somewhat have the desired effect on LLD, but when doing that, LLD then crashes on some other unexpected situation. I'll try to see if that is fixable...
So, TL;DR, the main options I see are:
jl_RTLD_DEFAULT_handle_pointer
into the IAT ofjl_RTLD_DEFAULT_handle
, by telling it that it's the same as other.refptr
variables. It doesn't work right now but I don't think it's unfixable. This requires injecting the extra alias though.Keno commentedon Apr 12, 2024
This feels like it may be part of the confusion. I don't think we actually have a problem linking the mingw startup code, but if the linker doesn't complain, it's hard to know that it's required.
I'm interested in giving this approach a try. We fully control both sides of the build process (as well as the version of lld used), so if you manage to come up with a fix for the LLD issue, I'll happily give it a try.
Keno commentedon Apr 12, 2024
Also, thank you so much for your detailed analysis, this very much filled in some gaps, in particular, the piece I was missing was
so with that, I finally understand what's going on here.
(And for my own reference, the ld.bfd scheme is documented here: documentation implementation)
[LLD] [COFF] Warn if the runtime pseudo relocation function is missing
mstorsjo commentedon Apr 12, 2024
#88573 should address this
I had a closer look at this, and I don't have a good idea for how to proceed to fix it unfortunately. The issues is that we have two symbols,
.refptr. jl_RTLD_DEFAULT_handle
andjl_RTLD_DEFAULT_handle_pointer
both pointing at the same pointer-sized section chunk. When doing the autoimport ofjl_RTLD_DEFAULT_handle
we find.refptr. jl_RTLD_DEFAULT_handle
and conclude that we can remove and replace it with__imp_ jl_RTLD_DEFAULT_handle
. But we still havejl_RTLD_DEFAULT_handle_pointer
still pointing at the now orphaned section chunk. So this section replacement logic doesn't work if we have other symbols pointing at the section, other than the.refptr
one. To fix it, we'd need to sequentially scan over all symbols and see if there are other symbols that point at the same data - which seems awfully inefficient. (And if we'd do that, we could just ditch the extra.refptr
alias anyway, and just extend the logic to always look for potential candidates by exhaustive search.)I think I need to leave this be for a while and see if I come up with a better way forward.
Implementing the ancient ld.bfd approach (which you manage to dig up the description for, thanks!) could be one option. But I'd almost mostly do that to fulfill my own curiosity - not that it's a feature that one should be using in the general case. And I'm not sure how much work it is, or whether I'll actually get around to it. (As the description says, it's almost borderline invalid, but still mostly ok.)
Or taking another step back, I wonder if it's possible to redesign your linking check, to avoid the need for autoimported data in the first place. DLL exports can be in a different form, where they don't point at data in the current DLL at all, but point at data in another DLL. This is possible to do e.g. with
-Wl,-Xlink=-export: jl_RTLD_DEFAULT_handle=libfakejulia. jl_RTLD_DEFAULT_handle
(I don't think the mingw level linker interface exposes this feature very well though.) Then you have a symbol exportedjl_RTLD_DEFAULT_handle
which should end up pointing at the same data as the same symbol inlibfakejulia.dll
. I.e. it's the same pointer, not a pointer containing the value of the other pointer.Or another way is to expose a getter function, instead of a data symbol. A C level (or equivalent) getter function should generate a regular
.refptr
indirection, which we cope with nicely in lld, avoiding the runtime pseudo relocation.