-
Notifications
You must be signed in to change notification settings - Fork 159
Switch to use CUDA driver APIs in Device
constructor
#460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
leofang
wants to merge
6
commits into
NVIDIA:main
Choose a base branch
from
leofang:reduce_cudart
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
2afcb20
cache cc to speed it up
leofang 87405ad
avoid using cudart APIs in Device constructor
leofang 95777c4
avoid silly, redundant lock
leofang 4cfd505
Merge branch 'main' into cache_cc
leofang 7852459
Merge branch 'cache_cc' into reduce_cudart
leofang 7f11565
Merge branch 'main' into reduce_cudart
leofang File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -957,34 +957,42 @@ def __new__(cls, device_id=None): | |
|
||
# important: creating a Device instance does not initialize the GPU! | ||
if device_id is None: | ||
device_id = handle_return(runtime.cudaGetDevice()) | ||
assert_type(device_id, int) | ||
err, dev = driver.cuCtxGetDevice() | ||
if err == 0: | ||
device_id = int(dev) | ||
else: | ||
ctx = handle_return(driver.cuCtxGetCurrent()) | ||
assert int(ctx) == 0 | ||
device_id = 0 # cudart behavior | ||
assert isinstance(device_id, int), f"{device_id=}" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this intentionally not using the
helper? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this PR predates the helper, I'll add it |
||
else: | ||
total = handle_return(runtime.cudaGetDeviceCount()) | ||
assert_type(device_id, int) | ||
if not (0 <= device_id < total): | ||
total = handle_return(driver.cuDeviceGetCount()) | ||
if not isinstance(device_id, int) or not (0 <= device_id < total): | ||
raise ValueError(f"device_id must be within [0, {total}), got {device_id}") | ||
|
||
# ensure Device is singleton | ||
if not hasattr(_tls, "devices"): | ||
total = handle_return(runtime.cudaGetDeviceCount()) | ||
total = handle_return(driver.cuDeviceGetCount()) | ||
_tls.devices = [] | ||
for dev_id in range(total): | ||
dev = super().__new__(cls) | ||
|
||
dev._id = dev_id | ||
# If the device is in TCC mode, or does not support memory pools for some other reason, | ||
# use the SynchronousMemoryResource which does not use memory pools. | ||
if ( | ||
handle_return( | ||
runtime.cudaDeviceGetAttribute(runtime.cudaDeviceAttr.cudaDevAttrMemoryPoolsSupported, 0) | ||
driver.cuDeviceGetAttribute( | ||
driver.CUdevice_attribute.CU_DEVICE_ATTRIBUTE_MEMORY_POOLS_SUPPORTED, dev_id | ||
) | ||
) | ||
) == 1: | ||
dev._mr = _DefaultAsyncMempool(dev_id) | ||
else: | ||
dev._mr = _SynchronousMemoryResource(dev_id) | ||
|
||
dev._has_inited = False | ||
dev._properties = None | ||
|
||
_tls.devices.append(dev) | ||
|
||
return _tls.devices[device_id] | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's going on here? Is there some requirement from cudart which requires CtxGetCurrent() to be called before the device can be queried?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it be helpful to raise a more specific error here?
It might be helpful to add a comment right after the else:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This handles the case that no context is set to current. The logic is as follows:
cuInit(0)
and before anything else is called), we confirm it is the case by checkingctx
pointer is zero (err
will always succeed), and then pick device 0