-
Notifications
You must be signed in to change notification settings - Fork 63
Subset arrays #411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Subset arrays #411
Conversation
…make the squeeze function and link it to the front end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR! The core of these changes already looks good, but there are some things I would like to see changed. Please also change the base branch of the PR to dev
. See here on how to do that. EDIT: I could change it myself 🙂
@@ -39,3 +39,6 @@ docs/ | |||
|
|||
# MacOS | |||
.DS_Store | |||
|
|||
# Rproj | |||
.Rproj.user |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unfamiliar with R. What is this directory used for, and should all other users have it ignored too? Otherwise, please put this in your local .git/info/exclude
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
according to @stefanradev93, this should be .Rproj
bayesflow/adapters/adapter.py
Outdated
Additional keyword arguments passed to the transform. | ||
|
||
""" | ||
transform = FilterTransform( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this should be a MapTransform
, but I could be mistaken. Do users often want to subsample everything in the batch, including non-arrays?
tests/test_adapters/conftest.py
Outdated
@@ -25,6 +25,7 @@ def adapter(): | |||
.one_hot("o1", 10) | |||
.keep(["x", "y", "z1", "p1", "p2", "s1", "s2", "t1", "t2", "o1"]) | |||
.rename("o1", "o2") | |||
.subsample_array("s3", sample_size = 3, axis = 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP8 (see above). In this case, you can just run ruff format
and it should do the trick for you.
) | ||
assert not np.all( | ||
np.diff(output, axis=i) > 0 | ||
), f"is ordered along axis which is not meant to be ordered: {i}." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why this is being reordered now, are you running ruff version 0.11.2
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apparently I was running ruff 0.8.1 but I will update it
def __init__(self): | ||
super().__init__() | ||
|
||
def forward(self, data, indices, axis=-1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, the indices
need to be passed in the constructor for this transform. See above.
def __init__(self): | ||
super().__init__() | ||
|
||
def forward(self, data: np.ndarray, sample_size: int, axis=-1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, the sample_size
should be part of the constructor for this transform. It seems like a bit of a hassle to have to pass this argument in the forward call of the Adapter, unless you have a specific use case in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also allow it to be a float in [0, 1] specifying a proportion of the sample to subsample.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I second this, but it raises the concern of whether we should floor or ceil the resulting value. I am thinking ceiling should be better.
…r than internal shorthand
…ike the other transforms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm honestly not sure that this should be a map transform as written, one consequence of the way its written now is that all keys specified by this transform will have random subsamples of the same size from the same axis. I think that all datasets that a user wants to subsample should be specified individually, so that axis and sample size are specified individually. If this is the case, would it not be better to force the map transform to reject a sequence of keys but rather only take one key?
… than only integer input
You also asked me to rename subsample_array to random_subsample my question is should all associated files also be renamed? |
This was accidentally closed. We will investigate how to restore the branch and reopen PRs. |
Yes, please rename all associated files so the structure of
In this case, I think we would want to have a regular |
…sample and random_subsample respectively
Addresses Issue #278
Implemented Take and subsample methods, however as discussed with Lars the requested squeeze functionality is essentially just the inverse of expand dims