-
Notifications
You must be signed in to change notification settings - Fork 206
add support for data installations #4474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests needs adjusting
test_prefix_option
:
easybuild-framework/test/framework/options.py
Line 5032 in 92d2968
def test_prefix_option(self): |
test_show_config
:
easybuild-framework/test/framework/options.py
Line 4886 in 92d2968
def test_show_config(self): |
@@ -92,6 +92,7 @@ | |||
'checksums': [[], "Checksums for sources and patches", BUILD], | |||
'configopts': ['', 'Extra options passed to configure (default already has --prefix)', BUILD], | |||
'cuda_compute_capabilities': [[], "List of CUDA compute capabilities to build with (if supported)", BUILD], | |||
'data_sources': [[], "List of source files for data", BUILD], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a need to separate data_sources
from sources
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data sources can be very big so i think it's good to at least have an option to separate them.
we can set data_sources
equal to sources
by default, would you prefer that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, i see now that i misunderstood you.
the separate parameter data_sources
was suggested by @boegel to make it clear they are different from software sources.
there is no real need for it, it's just cosmetic. i'm not sure if it's a good idea, happy to revert if you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as discussed during the EUM, we should keep data_sources
to allow installing software and datasets in a single easyconfig (e.g. using components) if we want to add support for this later on. this PR does not support this, as it requires substantive changes and i'm unsure how useful it is. i've modified this PR to make it easier to implement support for it if/when desired.
@smoors Let's re-target this to |
motivation
swap
dataset versions withml swap
changes
--installpath-data
similar to--installpath-software
--subdir-data
(default =data
) similar to--subdir-software
--sourcepath-data
similar to--sourcepath
data_sources
similar tosources
design
subdir_data
is reusability: in contrast to software it does not have to be rebuilt/reinstalled when for example upgrading the OS or building for a new architecturesourcepath_data
is that datasets can be very large, so you may want to store them in a different file system or location.