Subset arrays #411

eodole · 2025-04-14T16:46:44Z

Addresses Issue #278

Implemented Take and subsample methods, however as discussed with Lars the requested squeeze functionality is essentially just the inverse of expand dims

…make the squeeze function and link it to the front end

…procedures

LarsKue

Thank you for the PR! The core of these changes already looks good, but there are some things I would like to see changed. Please also change the base branch of the PR to dev. See here on how to do that. EDIT: I could change it myself 🙂

LarsKue · 2025-04-15T12:51:42Z

.gitignore

@@ -39,3 +39,6 @@ docs/

 # MacOS
 .DS_Store
+
+# Rproj
+.Rproj.user


I am unfamiliar with R. What is this directory used for, and should all other users have it ignored too? Otherwise, please put this in your local .git/info/exclude instead.

according to @stefanradev93, this should be .Rproj

bayesflow/adapters/adapter.py

LarsKue · 2025-04-15T12:56:13Z

bayesflow/adapters/adapter.py

+            Additional keyword arguments passed to the transform.
+
+        """
+        transform = FilterTransform(


I feel like this should be a MapTransform, but I could be mistaken. Do users often want to subsample everything in the batch, including non-arrays?

bayesflow/adapters/transforms/take.py

LarsKue · 2025-04-15T13:01:35Z

tests/test_adapters/conftest.py

@@ -25,6 +25,7 @@ def adapter():
        .one_hot("o1", 10)
        .keep(["x", "y", "z1", "p1", "p2", "s1", "s2", "t1", "t2", "o1"])
        .rename("o1", "o2")
+        .subsample_array("s3", sample_size = 3, axis = 0)


PEP8 (see above). In this case, you can just run ruff format and it should do the trick for you.

LarsKue · 2025-04-15T15:11:23Z

tests/test_links/test_links.py

-            )
+            assert not np.all(
+                np.diff(output, axis=i) > 0
+            ), f"is ordered along axis which is not meant to be ordered: {i}."


I'm not sure why this is being reordered now, are you running ruff version 0.11.2?

apparently I was running ruff 0.8.1 but I will update it

LarsKue · 2025-04-15T15:12:15Z

bayesflow/adapters/transforms/take.py

+    def __init__(self):
+        super().__init__()
+
+    def forward(self, data, indices, axis=-1):


IMO, the indices need to be passed in the constructor for this transform. See above.

LarsKue · 2025-04-15T15:13:25Z

bayesflow/adapters/transforms/subsample_array.py

+    def __init__(self):
+        super().__init__()
+
+    def forward(self, data: np.ndarray, sample_size: int, axis=-1):


IMO, the sample_size should be part of the constructor for this transform. It seems like a bit of a hassle to have to pass this argument in the forward call of the Adapter, unless you have a specific use case in mind?

I would also allow it to be a float in [0, 1] specifying a proportion of the sample to subsample.

I second this, but it raises the concern of whether we should floor or ceil the resulting value. I am thinking ceiling should be better.

bayesflow/adapters/transforms/subsample_array.py

…r than internal shorthand

…ike the other transforms

eodole

I'm honestly not sure that this should be a map transform as written, one consequence of the way its written now is that all keys specified by this transform will have random subsamples of the same size from the same axis. I think that all datasets that a user wants to subsample should be specified individually, so that axis and sample size are specified individually. If this is the case, would it not be better to force the map transform to reject a sequence of keys but rather only take one key?

… than only integer input

…mple size

eodole · 2025-04-22T13:59:29Z

You also asked me to rename subsample_array to random_subsample my question is should all associated files also be renamed?

LarsKue · 2025-04-22T14:42:15Z

This was accidentally closed. We will investigate how to restore the branch and reopen PRs.

LarsKue · 2025-04-22T14:53:46Z

Yes, please rename all associated files so the structure of file_name.py::class_or_function_name is consistent.

to force the map transform to reject a sequence of keys

In this case, I think we would want to have a regular Transform. It would also be fine to implement it as an ElementwiseTransform and wrap it in a MapTransform internally, but then the dispatch method on the adapter, i.e., adapter.random_subsample should raise an error if a Sequence of keys is passed.

…sample and random_subsample respectively

…time

eodole added 3 commits April 8, 2025 11:28

made initial backend functions for adapter subsetting, need to still …

69e236d

…make the squeeze function and link it to the front end

added subsample functionality, to do would be adding them to testing …

9c0da4c

…procedures

made the take function and ran the linter

d57aee4

eodole requested a review from LarsKue April 14, 2025 16:47

LarsKue requested changes Apr 15, 2025

View reviewed changes

LarsKue changed the base branch from main to dev April 15, 2025 15:43

LarsKue assigned eodole Apr 15, 2025

LarsKue added feature New feature or request user interface Changes to the user interface and improvements in usability good first issue Good for first-time contributors labels Apr 15, 2025

LarsKue added this to bayesflow development Apr 15, 2025

github-project-automation bot moved this to Future in bayesflow development Apr 15, 2025

LarsKue moved this from Future to In Progress in bayesflow development Apr 15, 2025

eodole added 6 commits April 22, 2025 14:38

changed name of subsampling function

8d834da

changed documentation, to be consistent with external notation, rathe…

6c1d503

…r than internal shorthand

small formation change to documentation

2e83846

changed subsample to have sample size and axis in the constructor

dee4534

moved transforms in the adapter.py so they're in alphabetical order l…

71dc35a

…ike the other transforms

changed random_subsample to maptransform rather than filter transform

6c34a5d

eodole commented Apr 22, 2025

View reviewed changes

eodole added 4 commits April 22, 2025 15:24

updated documentation with new naming convention

c3640cb

added arguments of take to the constructor

f17322f

added feature to specify a percentage of the data to subsample rather…

5312c5f

… than only integer input

changed subsample in adapter.py to allow float as an input for the sa…

5361c04

…mple size

stefanradev93 deleted the branch bayesflow-org:dev April 22, 2025 14:37

stefanradev93 closed this Apr 22, 2025

github-project-automation bot moved this from In Progress to Done in bayesflow development Apr 22, 2025

LarsKue reopened this Apr 22, 2025

eodole added 2 commits April 22, 2025 17:47

renamed subsample_array and associated classes/functions to RandomSub…

504344b

…sample and random_subsample respectively

included TypeError to force users to only subsample one dataset at a …

4218b70

…time

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subset arrays #411

Subset arrays #411

eodole commented Apr 14, 2025

LarsKue left a comment •

edited

Loading

LarsKue Apr 15, 2025

LarsKue Apr 22, 2025

LarsKue Apr 15, 2025

LarsKue Apr 15, 2025

LarsKue Apr 15, 2025

eodole Apr 22, 2025

LarsKue Apr 15, 2025

LarsKue Apr 15, 2025

stefanradev93 Apr 22, 2025

LarsKue Apr 22, 2025

eodole left a comment

eodole commented Apr 22, 2025

LarsKue commented Apr 22, 2025

LarsKue commented Apr 22, 2025 •

edited

Loading

Subset arrays #411

Are you sure you want to change the base?

Subset arrays #411

Conversation

eodole commented Apr 14, 2025

LarsKue left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eodole left a comment

Choose a reason for hiding this comment

eodole commented Apr 22, 2025

LarsKue commented Apr 22, 2025

LarsKue commented Apr 22, 2025 • edited Loading

LarsKue left a comment •

edited

Loading

LarsKue commented Apr 22, 2025 •

edited

Loading