Skip to content

Subset arrays #411

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: dev
Choose a base branch
from
Open

Subset arrays #411

wants to merge 15 commits into from

Conversation

eodole
Copy link
Collaborator

@eodole eodole commented Apr 14, 2025

Addresses Issue #278

Implemented Take and subsample methods, however as discussed with Lars the requested squeeze functionality is essentially just the inverse of expand dims

@eodole eodole requested a review from LarsKue April 14, 2025 16:47
Copy link
Contributor

@LarsKue LarsKue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR! The core of these changes already looks good, but there are some things I would like to see changed. Please also change the base branch of the PR to dev. See here on how to do that. EDIT: I could change it myself 🙂

@@ -39,3 +39,6 @@ docs/

# MacOS
.DS_Store

# Rproj
.Rproj.user
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unfamiliar with R. What is this directory used for, and should all other users have it ignored too? Otherwise, please put this in your local .git/info/exclude instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to @stefanradev93, this should be .Rproj

Additional keyword arguments passed to the transform.

"""
transform = FilterTransform(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this should be a MapTransform, but I could be mistaken. Do users often want to subsample everything in the batch, including non-arrays?

@@ -25,6 +25,7 @@ def adapter():
.one_hot("o1", 10)
.keep(["x", "y", "z1", "p1", "p2", "s1", "s2", "t1", "t2", "o1"])
.rename("o1", "o2")
.subsample_array("s3", sample_size = 3, axis = 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8 (see above). In this case, you can just run ruff format and it should do the trick for you.

)
assert not np.all(
np.diff(output, axis=i) > 0
), f"is ordered along axis which is not meant to be ordered: {i}."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why this is being reordered now, are you running ruff version 0.11.2?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apparently I was running ruff 0.8.1 but I will update it

def __init__(self):
super().__init__()

def forward(self, data, indices, axis=-1):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, the indices need to be passed in the constructor for this transform. See above.

def __init__(self):
super().__init__()

def forward(self, data: np.ndarray, sample_size: int, axis=-1):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, the sample_size should be part of the constructor for this transform. It seems like a bit of a hassle to have to pass this argument in the forward call of the Adapter, unless you have a specific use case in mind?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also allow it to be a float in [0, 1] specifying a proportion of the sample to subsample.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I second this, but it raises the concern of whether we should floor or ceil the resulting value. I am thinking ceiling should be better.

@LarsKue LarsKue changed the base branch from main to dev April 15, 2025 15:43
@LarsKue LarsKue added feature New feature or request user interface Changes to the user interface and improvements in usability good first issue Good for first-time contributors labels Apr 15, 2025
@LarsKue LarsKue moved this from Future to In Progress in bayesflow development Apr 15, 2025
Copy link
Collaborator Author

@eodole eodole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm honestly not sure that this should be a map transform as written, one consequence of the way its written now is that all keys specified by this transform will have random subsamples of the same size from the same axis. I think that all datasets that a user wants to subsample should be specified individually, so that axis and sample size are specified individually. If this is the case, would it not be better to force the map transform to reject a sequence of keys but rather only take one key?

@eodole
Copy link
Collaborator Author

eodole commented Apr 22, 2025

You also asked me to rename subsample_array to random_subsample my question is should all associated files also be renamed?

@stefanradev93 stefanradev93 deleted the branch bayesflow-org:dev April 22, 2025 14:37
@github-project-automation github-project-automation bot moved this from In Progress to Done in bayesflow development Apr 22, 2025
@LarsKue
Copy link
Contributor

LarsKue commented Apr 22, 2025

This was accidentally closed. We will investigate how to restore the branch and reopen PRs.

@LarsKue LarsKue reopened this Apr 22, 2025
@LarsKue
Copy link
Contributor

LarsKue commented Apr 22, 2025

Yes, please rename all associated files so the structure of file_name.py::class_or_function_name is consistent.

to force the map transform to reject a sequence of keys

In this case, I think we would want to have a regular Transform. It would also be fine to implement it as an ElementwiseTransform and wrap it in a MapTransform internally, but then the dispatch method on the adapter, i.e., adapter.random_subsample should raise an error if a Sequence of keys is passed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request good first issue Good for first-time contributors user interface Changes to the user interface and improvements in usability
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants