Skip to content

Commit 2a6452d

Browse files
authored
2025-05-01 version of cuda.bindings.path_finder (#578)
* Undo changes to the nvJitLink, nvrtc, nvvm bindings * Undo changes under .github, specific to nvvm, manipulating LD_LIBRARY_PATH or PATH * PARTIALLY_SUPPORTED_LIBNAMES_LINUX, PARTIALLY_SUPPORTED_LIBNAMES_WINDOWS * Update EXPECTED_LIB_SYMBOLS for nvJitLink to cleanly support CTK versions 12.0, 12.1, 12.2 * Save result of factoring out load_dl_common.py, load_dl_linux.py, load_dl_windows.py with the help of Cursor. * Fix an auto-generated docstring * first round of Cursor refactoring (about 4 iterations until all tests passed), followed by ruff auto-fixes * Revert "first round of Cursor refactoring (about 4 iterations until all tests passed), followed by ruff auto-fixes" This reverts commit 001a6a2. There were many GitHub Actions jobs that failed (all tests with 12.x): https://github.com/NVIDIA/cuda-python/actions/runs/14677553387 This is not worth spending time debugging. Especially because * Cursor has been unresponsive for at least half an hour: We're having trouble connecting to the model provider. This might be temporary - please try again in a moment. * The refactored code does not seem easier to read. * A couple trivial tweaks * Prefix the public API (just two items) with underscores for now. * Add SPDX-License-Identifier to all files under toolshed/ that don't have it already * Add SPDX-License-Identifier under cuda_bindings/tests/ * Respond to "Do these need to be run as subprocesses?" review question (#578 (comment)) * Respond to "dead code?" review questions (e.g. #578 (comment)) * Respond to "Do we need to implement a cache separately ..." review question (#578 (comment)) * Remove cuDriverGetVersion() function for now. * Move add_dll_directory() from load_dl_common.py to load_dl_windows.py (response to review question #578 (comment)) * Add SPDX-License-Identifier and # Forked from: URL in cuda_paths.py * Add Add SPDX-License-Identifier and Original LICENSE in findlib.py * Very first draft of README.md * Update README.md, mostly as revised by perplexity, with various manual edits. * Refork cuda_paths.py AS-IS: https://github.com/NVIDIA/numba-cuda/blob/8c9c9d0cb901c06774a9abea6d12b6a4b0287e5e/numba_cuda/numba/cuda/cuda_paths.py * ruff format cuda_paths.py (NO manual changes) * Add back _get_numba_CUDA_INCLUDE_PATH from 2279bda (i.e. cuda_paths.py as it was right before re-forking) * Remove cuda_paths.py dependency on numba.cuda.cudadrv.runtime * Add Forked from URLs, two SPDX-License-Identifier, Original Numba LICENSE * Temporarily restore debug changes under .github/workflows, for expanded path_finder test coverage * Restore cuda_path.py AS-IT-WAS at commit 2279bda * Revert "Restore cuda_path.py AS-IT-WAS at commit 2279bda" This reverts commit 1b88ec2. * Force compute-sanitizer off unconditionally * Revert "Force compute-sanitizer off unconditionally" This reverts commit 2bc7ef6. * Add timeout=10 seconds to test_path_finder.py subprocess.run() invocations. * Increase test_path_finder.py subprocess.run() timeout to 30 seconds: Under Windows, loading cublas or cusolver may exceed the 10 second timeout: #578 (comment) * Revert "Temporarily restore debug changes under .github/workflows, for expanded path_finder test coverage" This reverts commit 47ad79f. * Force compute-sanitizer off unconditionally * Add: Note that the search is done on a per-library basis. * Add Note for CUDA_HOME / CUDA_PATH * Add 0. **Check if a library was loaded into the process already by some other means.** * _find_dll_using_nvidia_bin_dirs(): reuse lib_searched_for in place of file_wild * Systematically replace all relative imports with absolute imports. * handle: int → ctypes.CDLL fix * Make load_dl_windows.py abs_path_for_dynamic_library() implementation maximally robust. * Change argument name → libname for self-consistency * Systematically replace previously overlooked relative imports with absolute imports. * Simplify code (also for self-consistency) * Expand the 3. **System Installations** section with information produced by perplexity * Pull out `**Environment variables**` into an added section, after manual inspection of cuda_paths.py. Minor additional edits. * Revert "Force compute-sanitizer off unconditionally" This reverts commit aeaf4f0. * Move _path_finder/sys_path_find_sub_dirs.py → find_sub_dirs.py, use find_sub_dirs_all_sitepackages() from find_nvidia_dynamic_library.py * WIP (search priority updated in README.md but not in code) * Revert "WIP (search priority updated in README.md but not in code)" This reverts commit bf9734c.
1 parent 86e901c commit 2a6452d

25 files changed

+1109
-405
lines changed

cuda_bindings/cuda/bindings/_bindings/cynvrtc.pyx.in

+55-9
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,13 @@
99
# This code was automatically generated with version 12.8.0. Do not modify it directly.
1010
{{if 'Windows' == platform.system()}}
1111
import os
12+
import site
13+
import struct
1214
import win32api
15+
from pywintypes import error
1316
{{else}}
1417
cimport cuda.bindings._lib.dlfcn as dlfcn
15-
from libc.stdint cimport uintptr_t
1618
{{endif}}
17-
from cuda.bindings import path_finder
1819

1920
cdef bint __cuPythonInit = False
2021
{{if 'nvrtcGetErrorString' in found_functions}}cdef void *__nvrtcGetErrorString = NULL{{endif}}
@@ -45,18 +46,65 @@ cdef bint __cuPythonInit = False
4546
{{if 'nvrtcSetFlowCallback' in found_functions}}cdef void *__nvrtcSetFlowCallback = NULL{{endif}}
4647

4748
cdef int cuPythonInit() except -1 nogil:
48-
{{if 'Windows' != platform.system()}}
49-
cdef void* handle = NULL
50-
{{endif}}
51-
5249
global __cuPythonInit
5350
if __cuPythonInit:
5451
return 0
5552
__cuPythonInit = True
5653

54+
# Load library
55+
{{if 'Windows' == platform.system()}}
56+
with gil:
57+
# First check if the DLL has been loaded by 3rd parties
58+
try:
59+
handle = win32api.GetModuleHandle("nvrtc64_120_0.dll")
60+
except:
61+
handle = None
62+
63+
# Check if DLLs can be found within pip installations
64+
if not handle:
65+
LOAD_LIBRARY_SEARCH_DEFAULT_DIRS = 0x00001000
66+
LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR = 0x00000100
67+
site_packages = [site.getusersitepackages()] + site.getsitepackages()
68+
for sp in site_packages:
69+
mod_path = os.path.join(sp, "nvidia", "cuda_nvrtc", "bin")
70+
if os.path.isdir(mod_path):
71+
os.add_dll_directory(mod_path)
72+
try:
73+
handle = win32api.LoadLibraryEx(
74+
# Note: LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR needs an abs path...
75+
os.path.join(mod_path, "nvrtc64_120_0.dll"),
76+
0, LOAD_LIBRARY_SEARCH_DEFAULT_DIRS | LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR)
77+
78+
# Note: nvrtc64_120_0.dll calls into nvrtc-builtins64_*.dll which is
79+
# located in the same mod_path.
80+
# Update PATH environ so that the two dlls can find each other
81+
os.environ["PATH"] = os.pathsep.join((os.environ.get("PATH", ""), mod_path))
82+
except:
83+
pass
84+
else:
85+
break
86+
else:
87+
# Else try default search
88+
# Only reached if DLL wasn't found in any site-package path
89+
LOAD_LIBRARY_SAFE_CURRENT_DIRS = 0x00002000
90+
try:
91+
handle = win32api.LoadLibraryEx("nvrtc64_120_0.dll", 0, LOAD_LIBRARY_SAFE_CURRENT_DIRS)
92+
except:
93+
pass
94+
95+
if not handle:
96+
raise RuntimeError('Failed to LoadLibraryEx nvrtc64_120_0.dll')
97+
{{else}}
98+
handle = dlfcn.dlopen('libnvrtc.so.12', dlfcn.RTLD_NOW)
99+
if handle == NULL:
100+
with gil:
101+
raise RuntimeError('Failed to dlopen libnvrtc.so.12')
102+
{{endif}}
103+
104+
105+
# Load function
57106
{{if 'Windows' == platform.system()}}
58107
with gil:
59-
handle = path_finder.load_nvidia_dynamic_library("nvrtc").handle
60108
{{if 'nvrtcGetErrorString' in found_functions}}
61109
try:
62110
global __nvrtcGetErrorString
@@ -241,8 +289,6 @@ cdef int cuPythonInit() except -1 nogil:
241289
{{endif}}
242290

243291
{{else}}
244-
with gil:
245-
handle = <void*><uintptr_t>path_finder.load_nvidia_dynamic_library("nvrtc").handle
246292
{{if 'nvrtcGetErrorString' in found_functions}}
247293
global __nvrtcGetErrorString
248294
__nvrtcGetErrorString = dlfcn.dlsym(handle, 'nvrtcGetErrorString')

cuda_bindings/cuda/bindings/_internal/nvjitlink_linux.pyx

+14-6
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,11 @@
44
#
55
# This code was automatically generated across versions from 12.0.1 to 12.8.0. Do not modify it directly.
66

7-
from libc.stdint cimport intptr_t, uintptr_t
7+
from libc.stdint cimport intptr_t
88

9-
from .utils import FunctionNotFoundError, NotSupportedError
9+
from .utils cimport get_nvjitlink_dso_version_suffix
1010

11-
from cuda.bindings import path_finder
11+
from .utils import FunctionNotFoundError, NotSupportedError
1212

1313
###############################################################################
1414
# Extern
@@ -52,9 +52,17 @@ cdef void* __nvJitLinkGetInfoLog = NULL
5252
cdef void* __nvJitLinkVersion = NULL
5353

5454

55-
cdef void* load_library(int driver_ver) except* with gil:
56-
cdef uintptr_t handle = path_finder.load_nvidia_dynamic_library("nvJitLink").handle
57-
return <void*>handle
55+
cdef void* load_library(const int driver_ver) except* with gil:
56+
cdef void* handle
57+
for suffix in get_nvjitlink_dso_version_suffix(driver_ver):
58+
so_name = "libnvJitLink.so" + (f".{suffix}" if suffix else suffix)
59+
handle = dlopen(so_name.encode(), RTLD_NOW | RTLD_GLOBAL)
60+
if handle != NULL:
61+
break
62+
else:
63+
err_msg = dlerror()
64+
raise RuntimeError(f'Failed to dlopen libnvJitLink ({err_msg.decode()})')
65+
return handle
5866

5967

6068
cdef int _check_or_init_nvjitlink() except -1 nogil:

cuda_bindings/cuda/bindings/_internal/nvjitlink_windows.pyx

+45-8
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,12 @@
66

77
from libc.stdint cimport intptr_t
88

9+
from .utils cimport get_nvjitlink_dso_version_suffix
10+
911
from .utils import FunctionNotFoundError, NotSupportedError
1012

11-
from cuda.bindings import path_finder
13+
import os
14+
import site
1215

1316
import win32api
1417

@@ -39,9 +42,44 @@ cdef void* __nvJitLinkGetInfoLog = NULL
3942
cdef void* __nvJitLinkVersion = NULL
4043

4144

42-
cdef void* load_library(int driver_ver) except* with gil:
43-
cdef intptr_t handle = path_finder.load_nvidia_dynamic_library("nvJitLink").handle
44-
return <void*>handle
45+
cdef inline list get_site_packages():
46+
return [site.getusersitepackages()] + site.getsitepackages()
47+
48+
49+
cdef load_library(const int driver_ver):
50+
handle = 0
51+
52+
for suffix in get_nvjitlink_dso_version_suffix(driver_ver):
53+
if len(suffix) == 0:
54+
continue
55+
dll_name = f"nvJitLink_{suffix}0_0.dll"
56+
57+
# First check if the DLL has been loaded by 3rd parties
58+
try:
59+
return win32api.GetModuleHandle(dll_name)
60+
except:
61+
pass
62+
63+
# Next, check if DLLs are installed via pip
64+
for sp in get_site_packages():
65+
mod_path = os.path.join(sp, "nvidia", "nvJitLink", "bin")
66+
if os.path.isdir(mod_path):
67+
os.add_dll_directory(mod_path)
68+
try:
69+
return win32api.LoadLibraryEx(
70+
# Note: LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR needs an abs path...
71+
os.path.join(mod_path, dll_name),
72+
0, LOAD_LIBRARY_SEARCH_DEFAULT_DIRS | LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR)
73+
except:
74+
pass
75+
# Finally, try default search
76+
# Only reached if DLL wasn't found in any site-package path
77+
try:
78+
return win32api.LoadLibrary(dll_name)
79+
except:
80+
pass
81+
82+
raise RuntimeError('Failed to load nvJitLink')
4583

4684

4785
cdef int _check_or_init_nvjitlink() except -1 nogil:
@@ -50,24 +88,23 @@ cdef int _check_or_init_nvjitlink() except -1 nogil:
5088
return 0
5189

5290
cdef int err, driver_ver
53-
cdef intptr_t handle
5491
with gil:
5592
# Load driver to check version
5693
try:
57-
nvcuda_handle = win32api.LoadLibraryEx("nvcuda.dll", 0, LOAD_LIBRARY_SEARCH_SYSTEM32)
94+
handle = win32api.LoadLibraryEx("nvcuda.dll", 0, LOAD_LIBRARY_SEARCH_SYSTEM32)
5895
except Exception as e:
5996
raise NotSupportedError(f'CUDA driver is not found ({e})')
6097
global __cuDriverGetVersion
6198
if __cuDriverGetVersion == NULL:
62-
__cuDriverGetVersion = <void*><intptr_t>win32api.GetProcAddress(nvcuda_handle, 'cuDriverGetVersion')
99+
__cuDriverGetVersion = <void*><intptr_t>win32api.GetProcAddress(handle, 'cuDriverGetVersion')
63100
if __cuDriverGetVersion == NULL:
64101
raise RuntimeError('something went wrong')
65102
err = (<int (*)(int*) noexcept nogil>__cuDriverGetVersion)(&driver_ver)
66103
if err != 0:
67104
raise RuntimeError('something went wrong')
68105

69106
# Load library
70-
handle = <intptr_t>load_library(driver_ver)
107+
handle = load_library(driver_ver)
71108

72109
# Load function
73110
global __nvJitLinkCreate

cuda_bindings/cuda/bindings/_internal/nvvm_linux.pyx

+13-5
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,11 @@
44
#
55
# This code was automatically generated across versions from 11.0.3 to 12.8.0. Do not modify it directly.
66

7-
from libc.stdint cimport intptr_t, uintptr_t
7+
from libc.stdint cimport intptr_t
88

9-
from .utils import FunctionNotFoundError, NotSupportedError
9+
from .utils cimport get_nvvm_dso_version_suffix
1010

11-
from cuda.bindings import path_finder
11+
from .utils import FunctionNotFoundError, NotSupportedError
1212

1313
###############################################################################
1414
# Extern
@@ -51,8 +51,16 @@ cdef void* __nvvmGetProgramLog = NULL
5151

5252

5353
cdef void* load_library(const int driver_ver) except* with gil:
54-
cdef uintptr_t handle = path_finder.load_nvidia_dynamic_library("nvvm").handle
55-
return <void*>handle
54+
cdef void* handle
55+
for suffix in get_nvvm_dso_version_suffix(driver_ver):
56+
so_name = "libnvvm.so" + (f".{suffix}" if suffix else suffix)
57+
handle = dlopen(so_name.encode(), RTLD_NOW | RTLD_GLOBAL)
58+
if handle != NULL:
59+
break
60+
else:
61+
err_msg = dlerror()
62+
raise RuntimeError(f'Failed to dlopen libnvvm ({err_msg.decode()})')
63+
return handle
5664

5765

5866
cdef int _check_or_init_nvvm() except -1 nogil:

cuda_bindings/cuda/bindings/_internal/nvvm_windows.pyx

+53-8
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,12 @@
66

77
from libc.stdint cimport intptr_t
88

9+
from .utils cimport get_nvvm_dso_version_suffix
10+
911
from .utils import FunctionNotFoundError, NotSupportedError
1012

11-
from cuda.bindings import path_finder
13+
import os
14+
import site
1215

1316
import win32api
1417

@@ -37,9 +40,52 @@ cdef void* __nvvmGetProgramLogSize = NULL
3740
cdef void* __nvvmGetProgramLog = NULL
3841

3942

40-
cdef void* load_library(int driver_ver) except* with gil:
41-
cdef intptr_t handle = path_finder.load_nvidia_dynamic_library("nvvm").handle
42-
return <void*>handle
43+
cdef inline list get_site_packages():
44+
return [site.getusersitepackages()] + site.getsitepackages() + ["conda"]
45+
46+
47+
cdef load_library(const int driver_ver):
48+
handle = 0
49+
50+
for suffix in get_nvvm_dso_version_suffix(driver_ver):
51+
if len(suffix) == 0:
52+
continue
53+
dll_name = "nvvm64_40_0.dll"
54+
55+
# First check if the DLL has been loaded by 3rd parties
56+
try:
57+
return win32api.GetModuleHandle(dll_name)
58+
except:
59+
pass
60+
61+
# Next, check if DLLs are installed via pip or conda
62+
for sp in get_site_packages():
63+
if sp == "conda":
64+
# nvvm is not under $CONDA_PREFIX/lib, so it's not in the default search path
65+
conda_prefix = os.environ.get("CONDA_PREFIX")
66+
if conda_prefix is None:
67+
continue
68+
mod_path = os.path.join(conda_prefix, "Library", "nvvm", "bin")
69+
else:
70+
mod_path = os.path.join(sp, "nvidia", "cuda_nvcc", "nvvm", "bin")
71+
if os.path.isdir(mod_path):
72+
os.add_dll_directory(mod_path)
73+
try:
74+
return win32api.LoadLibraryEx(
75+
# Note: LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR needs an abs path...
76+
os.path.join(mod_path, dll_name),
77+
0, LOAD_LIBRARY_SEARCH_DEFAULT_DIRS | LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR)
78+
except:
79+
pass
80+
81+
# Finally, try default search
82+
# Only reached if DLL wasn't found in any site-package path
83+
try:
84+
return win32api.LoadLibrary(dll_name)
85+
except:
86+
pass
87+
88+
raise RuntimeError('Failed to load nvvm')
4389

4490

4591
cdef int _check_or_init_nvvm() except -1 nogil:
@@ -48,24 +94,23 @@ cdef int _check_or_init_nvvm() except -1 nogil:
4894
return 0
4995

5096
cdef int err, driver_ver
51-
cdef intptr_t handle
5297
with gil:
5398
# Load driver to check version
5499
try:
55-
nvcuda_handle = win32api.LoadLibraryEx("nvcuda.dll", 0, LOAD_LIBRARY_SEARCH_SYSTEM32)
100+
handle = win32api.LoadLibraryEx("nvcuda.dll", 0, LOAD_LIBRARY_SEARCH_SYSTEM32)
56101
except Exception as e:
57102
raise NotSupportedError(f'CUDA driver is not found ({e})')
58103
global __cuDriverGetVersion
59104
if __cuDriverGetVersion == NULL:
60-
__cuDriverGetVersion = <void*><intptr_t>win32api.GetProcAddress(nvcuda_handle, 'cuDriverGetVersion')
105+
__cuDriverGetVersion = <void*><intptr_t>win32api.GetProcAddress(handle, 'cuDriverGetVersion')
61106
if __cuDriverGetVersion == NULL:
62107
raise RuntimeError('something went wrong')
63108
err = (<int (*)(int*) noexcept nogil>__cuDriverGetVersion)(&driver_ver)
64109
if err != 0:
65110
raise RuntimeError('something went wrong')
66111

67112
# Load library
68-
handle = <intptr_t>load_library(driver_ver)
113+
handle = load_library(driver_ver)
69114

70115
# Load function
71116
global __nvvmVersion

cuda_bindings/cuda/bindings/_internal/utils.pxd

+3
Original file line numberDiff line numberDiff line change
@@ -165,3 +165,6 @@ cdef int get_nested_resource_ptr(nested_resource[ResT] &in_out_ptr, object obj,
165165

166166
cdef bint is_nested_sequence(data)
167167
cdef void* get_buffer_pointer(buf, Py_ssize_t size, readonly=*) except*
168+
169+
cdef tuple get_nvjitlink_dso_version_suffix(int driver_ver)
170+
cdef tuple get_nvvm_dso_version_suffix(int driver_ver)

cuda_bindings/cuda/bindings/_internal/utils.pyx

+14
Original file line numberDiff line numberDiff line change
@@ -127,3 +127,17 @@ cdef int get_nested_resource_ptr(nested_resource[ResT] &in_out_ptr, object obj,
127127
class FunctionNotFoundError(RuntimeError): pass
128128

129129
class NotSupportedError(RuntimeError): pass
130+
131+
132+
cdef tuple get_nvjitlink_dso_version_suffix(int driver_ver):
133+
if 12000 <= driver_ver < 13000:
134+
return ('12', '')
135+
raise NotSupportedError(f'CUDA driver version {driver_ver} is not supported')
136+
137+
138+
cdef tuple get_nvvm_dso_version_suffix(int driver_ver):
139+
if 11000 <= driver_ver < 11020:
140+
return ('3', '')
141+
if 11020 <= driver_ver < 13000:
142+
return ('4', '')
143+
raise NotSupportedError(f'CUDA driver version {driver_ver} is not supported')

0 commit comments

Comments
 (0)