Added the KeyPoints TVTensor #8817

Alexandre-SCHOEPP · 2024-12-17T14:21:22Z

Description

Adds and integrates the KeyPoints TVTensor (requested in #8728), which is a representation of picture-attached points (or vertices) attached point

Details

Inner workings

The KeyPoints represent a tensor of shape [..., 2], which allow for arbitrarily complex structures to be represented (polygons, skeletons, or even SAM-like points prompts). Whenever the __new__ is called, the shape of the source tensor is checked.

Tensors of shape [2] are reshaped to [1, 2], similarly to BoundingBoxes.

KeyPoints, like BoundingBoxes, carry arround a canvas_size attribute which represents the scale of a batch-typical picture.

Kernels

Kernels for all operations should be supported (if I missed one, I will fix this). It merely consists of an adaptation of the code of BoundingBoxes.

Particularities

Maintainers may notice that a TYPE_CHECKING section was added that differs significantly from the implementation:

class KeyPoints(TVTensors)


    if TYPE_CHECKING:
        # EVIL: Just so that MYPY+PYLANCE+others stop shouting that everything is wrong when initializeing the TVTensor
        # Not read or defined at Runtime (only at linting time).
        # TODO: BOUNDING BOXES needs something similar
        def __init__(
            self,
            data: Any,
            *,
            dtype: Optional[torch.dtype] = None,
            device: Optional[Union[torch.device, str, int]] = None,
            requires_grad: Optional[bool] = None,
            canvas_size: Tuple[int, int],
        ):
            ...

I marked this section as EVIL since it is a trick, but it cannot generate vulnerabilities: TYPE_CHECKING is always False at runtime, and only ever True for the linter.

For the last few months, I had issues in my weird PyLance + Mypy mix with BoundingBoxes initialization. No overload is ever detected to match it. By "re-defining" it, I got to it solved on my machine.

Convertors

Added a convertor convert_box_to_points in torchvision.transorfms.v2.functional._meta exported in torchvision.transforms.v2 which (as its name states) converts a [N, 4] BoundingBoxes TVTensor into a [N, 4, 2] KeyPoints TVTensor.

Other changes

For the purposes of my custom type checking, I also changed tv_tensors.wrap to be 3.8-compatible generics.

Since wrap only ever outputs a subclass of its like argument, I used a TypeVar bound to TVTensor to ensure that type-checking passes no matter the checker used.

Methodology

Formated using ufmt
Flake8 compliance with line-length 120 enforced by editor
Documented the classes

Discussion

Since many converters of BoundingBoxes are based on chaning the bboxes to polygons, then operating on the points, I believe that there is a possibility to lower line count and increase reliability for negligeable computational latency cost by using KeyPoints kernels and converting using the method described in the details above

pytorch-bot · 2024-12-17T14:21:26Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/8817

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

CI workflows being skipped on PR

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2024-12-17T14:21:29Z

Hi @Alexandre-SCHOEPP!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

facebook-github-bot · 2024-12-17T15:08:24Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

broomhead · 2025-04-10T03:18:31Z

I'd love to see this get included in a future release.

Alexandre-SCHOEPP · 2025-04-10T07:03:21Z

I'd love to see this get included in a future release.

Given that the PR is about 4 months old now... me too ?

NicolasHug · 2025-04-10T07:40:32Z

Thank you for your patience @Alexandre-SCHOEPP , I don't have much bandwidth at the moment, but I will try to get this reviewed and merged for 0.23

Alexandre-SCHOEPP · 2025-04-10T10:44:07Z

No worries @NicolasHug, that project is very large given the size of the team behind it, i understant that prioritization needs to happen.

AntoineSimoulin

Hey! Thanks a lot for the cool PR. I do think it will be a very relevant addition to torchvision. I feel the PR is well aligned with the structure of the codebase. However, this is introduces a large number of changes, so I propose we make a couple iterations to fully align with the main branch before merging. How does it sound?

I left some comments in the code, but here are my high level questions and remarks.

Regarding parameter naming, let's make sure we are sticking to "keypoint" for user-facing functions and not use "kp" and "kpoint" there;
I am not sure about the _xyxy_to_points output. Can we include some tests for both _xyxy_to_points and convert_box_to_points to make sure the input/output is as expected?
We a keypoint rcnn model that support keypoints in torchvision. I'm not sure we should change the code because of the risk of breaking it, but can use this as some advanced integration test?
I think we are missing a certain number of tests for the transforms: TestResize.test_kernel_key_points, TestResize.test_functional for key_points, TestResize.test_key_points_correctness, TestResize.test_noop, TestResize.test_no_regression_5405, TestHorizontalFlip.test_kernel_key_points, TestHorizontalFlip.test_bounding_boxes_correctness ... Would it be eventually possible to add those?
More generally, the keypoints are defined here a set of independent points. Is it the usual definition, can those be used to define arbitrary polygons? In those case, will some other functionalities be needed?

Thanks again for your time submitting this PR. Let me know what you think about my comments. On my side, I am excited to work on this with you and make sure we can merge the feature into torchvision.

Best,
Antoine

AntoineSimoulin · 2025-04-29T12:33:44Z

torchvision/transforms/v2/_utils.py

+def get_all_keypoints(flat_inputs: List[Any]) -> Iterable[tv_tensors.KeyPoints]:
+    """Yields all KeyPoints in the input.
+
+    Raises:
+        ValueError: No KeyPoints can be found
+    """
+    generator = (inpt for inpt in flat_inputs if isinstance(inpt, tv_tensors.KeyPoints))
+    try:
+        yield next(generator)
+    except StopIteration:
+        raise ValueError("No Keypoints were found in the sample.")
+    return generator
+
+


It seems this get_all_keypoints function is not used elsewhere. Should we delete it if this is not needed in the current state of the PR?

This function is meant to parallel the one for bounding boxes, the schematic of which I followed here. I don't necessarily have an attachment to it, but I felt like it was better to have it if someone who worked on the BBoxes had feedback on it.

AntoineSimoulin · 2025-04-29T12:37:24Z

torchvision/tv_tensors/_keypoints.py

+from ._tv_tensor import TVTensor
+
+
+class KeyPoints(TVTensor):


Overall, It seems to me there are a few discrepancies between the KeyPoints and BoundingBoxes classes (c.f. comments below). Maybe we can try to reduce the gap to align as much as possible with bbox?

As stated in the response below, I think that having more freedom in the shape makes sense in this context. Points are inherently less structure than Boxes, and whether it's having a representation of a polygons/polylines (N_Poly, N_sides, 2), a skeleton (N_bones, 2, 2), I feel like letting this freedom stay enables the development of future more complex TVTensors without having to re-implement all of the kernels (which is frankly unfun, 0/10 would not recommend)

I also feel like there are use cases where having more than (N, 4) shaped bboxes would make sense (for example for groups of up to N objects to detect and group together), but the limitation feels far less oppressing than it would be for points.

AntoineSimoulin · 2025-04-29T12:39:06Z

torchvision/tv_tensors/_keypoints.py

+
+
+class KeyPoints(TVTensor):
+    """:class:`torch.Tensor` subclass for tensors with shape ``[..., 2]`` that represent points in an image.


Is the expected shape shape [N, 2] here where N is the number of keypoints?

No. I left the free space for more complex shapes for use in more complex structures, like skeletons. A skeleton would be a NumBones, 2, 2 tensor (a list of NumBones segments), and would be a specific type of KeyPoints to reuse the kernels as much as possible

AntoineSimoulin · 2025-04-29T12:39:37Z

torchvision/tv_tensors/_keypoints.py

+        dtype: Optional[torch.dtype] = None,
+        device: Optional[Union[torch.device, str, int]] = None,
+        requires_grad: Optional[bool] = None,
+        canvas_size: Tuple[int, int],


Maybe align the order of the arguments with the BoundingBoxes class? In this case, canvas_size should be the 4th argument and not the last.

I agree for the sake of consistent code styling, however, given that this is a keyword-only argument, this difference isn't visible at runtime outside of inspecting the function's attributes

I'm putting it in third position since in BoundingBoxes the 3rd argument is the format, which does not exist in KeyPoints.

AntoineSimoulin · 2025-04-29T12:40:22Z

torchvision/tv_tensors/_keypoints.py

+            # Since a Tensor is a sequence of Tensor, had it not been the case, we may have had silent
+            # or complex errors
+            output = tuple(KeyPoints(part, canvas_size=canvas_size) for part in output)
+        elif isinstance(output, MutableSequence):


What is the purpose of this case here?

This case handles lists, and list-like objects.

Unlike in the case of tuples, (which are non-mutable sequences), I can't reuse the container object. Here, I can.

I agree that it needs to be changed, however. Custom-made non-mutable sequences are not supported, which is an oversight on my part.

The valid logic would be :

if isinstance(output, torch.Tensor) and not isinstance(output, KeyPoints): output = KeyPoints(output, canvas_size=canvas_size) elif isinstance(output, MutableSequence): # For lists and list-like object we don't try to create a new object, we just set the values in the list for i, part in enumerate(output): output[i] = KeyPoints(part, canvas_size=canvas_size) elif isinstance(output, Sequence): # Non-mutable sequences handled here (like tuples) # NB: output is checked against sequence because it has already been checked against Tensor # Since a Tensor is a sequence of Tensor, had it not been the case, we may have had silent # or complex errors output = tuple(KeyPoints(part, canvas_size=canvas_size) for part in output) return output

torchvision/transforms/v2/functional/_geometry.py

torchvision/transforms/v2/functional/_meta.py

test/common_utils.py

AntoineSimoulin · 2025-04-29T12:59:38Z

torchvision/transforms/v2/functional/_meta.py

+        new_format=BoundingBoxFormat.XYXY,
+        inplace=False,
+    )
+    return tv_tensors.KeyPoints(_xyxy_to_points(bbox), canvas_size=bounding_boxes.canvas_size)


We should also make sure this function is covered in the test case.

Alexandre-SCHOEPP

Thanks for the review. I marked every comment where I agreed with the need to change things with the word agree in my response just so I can ensure to not miss one whenever I get the time to fix it

test/common_utils.py

Alexandre-SCHOEPP · 2025-04-29T13:19:16Z

torchvision/transforms/v2/_utils.py

+def get_all_keypoints(flat_inputs: List[Any]) -> Iterable[tv_tensors.KeyPoints]:
+    """Yields all KeyPoints in the input.
+
+    Raises:
+        ValueError: No KeyPoints can be found
+    """
+    generator = (inpt for inpt in flat_inputs if isinstance(inpt, tv_tensors.KeyPoints))
+    try:
+        yield next(generator)
+    except StopIteration:
+        raise ValueError("No Keypoints were found in the sample.")
+    return generator
+
+


This function is meant to parallel the one for bounding boxes, the schematic of which I followed here. I don't necessarily have an attachment to it, but I felt like it was better to have it if someone who worked on the BBoxes had feedback on it.

torchvision/transforms/v2/functional/_meta.py

Alexandre-SCHOEPP · 2025-04-29T13:21:07Z

torchvision/transforms/v2/functional/_meta.py

+        new_format=BoundingBoxFormat.XYXY,
+        inplace=False,
+    )
+    return tv_tensors.KeyPoints(_xyxy_to_points(bbox), canvas_size=bounding_boxes.canvas_size)


Alexandre-SCHOEPP · 2025-04-29T13:22:56Z

torchvision/tv_tensors/_keypoints.py

+
+
+class KeyPoints(TVTensor):
+    """:class:`torch.Tensor` subclass for tensors with shape ``[..., 2]`` that represent points in an image.


No. I left the free space for more complex shapes for use in more complex structures, like skeletons. A skeleton would be a NumBones, 2, 2 tensor (a list of NumBones segments), and would be a specific type of KeyPoints to reuse the kernels as much as possible

Alexandre-SCHOEPP · 2025-04-29T13:23:59Z

torchvision/tv_tensors/_keypoints.py

+        dtype: Optional[torch.dtype] = None,
+        device: Optional[Union[torch.device, str, int]] = None,
+        requires_grad: Optional[bool] = None,
+        canvas_size: Tuple[int, int],


I agree for the sake of consistent code styling, however, given that this is a keyword-only argument, this difference isn't visible at runtime outside of inspecting the function's attributes

Alexandre-SCHOEPP · 2025-04-29T13:25:06Z

torchvision/tv_tensors/_keypoints.py

+        points.canvas_size = canvas_size
+        return points
+
+    if TYPE_CHECKING:


I had no need for type:ignore. This solution works, and quite frankly, I think should be the standard in the library (despite marking it jokingly as EVIL)

Alexandre-SCHOEPP · 2025-04-29T13:28:39Z

torchvision/tv_tensors/_keypoints.py

+            # Since a Tensor is a sequence of Tensor, had it not been the case, we may have had silent
+            # or complex errors
+            output = tuple(KeyPoints(part, canvas_size=canvas_size) for part in output)
+        elif isinstance(output, MutableSequence):


This case handles lists, and list-like objects.

Unlike in the case of tuples, (which are non-mutable sequences), I can't reuse the container object. Here, I can.

I agree that it needs to be changed, however. Custom-made non-mutable sequences are not supported, which is an oversight on my part.

The valid logic would be :

if isinstance(output, torch.Tensor) and not isinstance(output, KeyPoints): output = KeyPoints(output, canvas_size=canvas_size) elif isinstance(output, MutableSequence): # For lists and list-like object we don't try to create a new object, we just set the values in the list for i, part in enumerate(output): output[i] = KeyPoints(part, canvas_size=canvas_size) elif isinstance(output, Sequence): # Non-mutable sequences handled here (like tuples) # NB: output is checked against sequence because it has already been checked against Tensor # Since a Tensor is a sequence of Tensor, had it not been the case, we may have had silent # or complex errors output = tuple(KeyPoints(part, canvas_size=canvas_size) for part in output) return output

Alexandre-SCHOEPP

_geometry.py was hidden by the change reviewer, and so I failed to answer a few comments. Here's the change

torchvision/transforms/v2/functional/_geometry.py

Alexandre-SCHOEPP · 2025-04-29T14:05:17Z

torchvision/tv_tensors/_keypoints.py

+from ._tv_tensor import TVTensor
+
+
+class KeyPoints(TVTensor):


As stated in the response below, I think that having more freedom in the shape makes sense in this context. Points are inherently less structure than Boxes, and whether it's having a representation of a polygons/polylines (N_Poly, N_sides, 2), a skeleton (N_bones, 2, 2), I feel like letting this freedom stay enables the development of future more complex TVTensors without having to re-implement all of the kernels (which is frankly unfun, 0/10 would not recommend)

I also feel like there are use cases where having more than (N, 4) shaped bboxes would make sense (for example for groups of up to N objects to detect and group together), but the limitation feels far less oppressing than it would be for points.

broomhead · 2025-04-30T15:30:04Z

@Alexandre-SCHOEPP I pulled this fork and compiled it for my own project, as I really need keypoints that match the TVTensor design, particularly for transforms.

It's working well for me. One thing I did find is KeyPoints needs to be added to transforms like SanitizeBoundingBoxes. The issue arises that when a transform removes data on some criterion, like bounding box size, the associated data should be removed with it. Well, KeyPoints are not getting removed. I looked at the code and it was as simple as adding KeyPoints into a check in one line. For now I just overrode the class and made the one-line change locally:

# This is a direct copy-and-paste of SanitzeBoundingBoxes, with KeyPoints added.
@register(name="SanitizeBoundingBoxesWithKeyPoints")
class SanitizeBoundingBoxesWithKeyPoints(T.SanitizeBoundingBoxes):
    def transform(self, inpt: Any, params: Dict[str, Any]) -> Any:
        is_label = params["labels"] is not None and any(inpt is label for label in params["labels"])
        is_bounding_boxes_or_mask_or_keypoints = isinstance(
            inpt, (BoundingBoxes, Mask, KeyPoints)
        )

        if not (is_label or is_bounding_boxes_or_mask_or_keypoints):
            return inpt

        output = inpt[params["valid"]]
        if is_label:
            return output
        else:
            wrapped = wrap(output, like=inpt)
            return wrapped

that isinstance check is all we need; I just added KeyPoints to the list. Now, I'm new to PyTorch, so I'm not sure if there are other transforms that require this sort of check as well. None of the others used in my project appear to.

Alexandre-SCHOEPP · 2025-04-30T15:37:42Z

@Alexandre-SCHOEPP I pulled this fork and compiled it for my own project, as I really need keypoints that match the TVTensor design, particularly for transforms.

Thanks ! Always good to have this feedback.

One thing I did find is KeyPoints needs to be added to transforms like SanitizeBoundingBoxes

I am waiting for feedback on it by the maintainers (@NicolasHug, @AntoineSimoulin). I am willing to do it as I correct all other mistakes, but they may want it to be another PR.

… sequences in wrap_output

…oxes_dispatch

Alexandre-SCHOEPP · 2025-04-30T18:10:04Z

I just noticed that rotated bounding boxes are a thing, now. @AntoineSimoulin, should I support it in this PR for convert_bounding_boxes_to_points, or would another PR be better ?

…sake of consistency with the other bounding_boxes converters in _meta.py

… BoundingBoxes and Masks in RandomErasing, AutoAugment, FiveCrop, and SanitizeBoundingBoxes now can handle them and its documentation now states that the underlying logic relies on masks and keypoints being a given shape

Alexandre-SCHOEPP · 2025-04-30T19:00:25Z

@AntoineSimoulin I did some work on the various issues that had been marked. Don't hesitate to close whatever is resolved, I'll do another pass whenever I have time.

Alexandre-SCHOEPP · 2025-04-30T19:15:05Z

Regarding parameter naming, let's make sure we are sticking to "keypoint" for user-facing functions and not use "kp" and "kpoint" there;

I think I fixed them wherever I found them. They were mostly in kernels, which I don't consider to be user-facing, but I agree that this is going to be easier to maintain.

We a keypoint rcnn model that support keypoints in torchvision. I'm not sure we should change the code because of the risk of breaking it, but can use this as some advanced integration test?

I think this PR is already quite large. Is it ok if I leave it to someone else ?
Especially given that KeyPointsRCNN uses 3 dimensionnal keypoints with the third dimension being "visibility"

I think we are missing a certain number of tests for the transforms: TestResize.test_kernel_key_points, TestResize.test_functional for key_points, TestResize.test_key_points_correctness, TestResize.test_noop, TestResize.test_no_regression_5405, TestHorizontalFlip.test_kernel_key_points, TestHorizontalFlip.test_bounding_boxes_correctness ... Would it be eventually possible to add those?

Could you please provide a complete list ? I'm not against doing them, but to be frank, the lib is very arcane to someone decovering its structure (I had never used the registered kernel design pattern before working on this PR)

More generally, the keypoints are defined here a set of independent points. Is it the usual definition, can those be used to define arbitrary polygons? In those case, will some other functionalities be needed?

As I argue in other conversations, KeyPoints are the basis for anything that is a set of points. I user very intently the [..., 2] shape to enable polygons (but also polylines, skeletons, etc) to be defined as a subclass of KeyPoints (in which case they will inherit all kernels and adding more user-friendly functionalities won't take the 2 days it took me to go over every kernel)

NicolasHug

Thank you so much for the PR and for all your work so far @Alexandre-SCHOEPP . The torchvision code base is pretty complex as we have to account for a lot of features in a lot of settings, so we definitely appreciate your effort.

Agreed on leaving the keypoint rcnn integration for later, no worries. Agreed as well on having the KeyPoints be [..., 2], this makes the leading-dimension logic consistent with our other keypoints like Image and Video.

I have one main high-level comment, and @AntoineSimoulin will follow-up with more on the tests: we should assume (at least for now) that there must exist only one single instance of KeyPoints per input. I.e. one image should be associated with a single KeyPoints instance. This is what we assume / enforce for BoundingBoxes as well (see note: https://pytorch.org/vision/main/generated/torchvision.tv_tensors.BoundingBoxes.html): the reason for that is that in SanitizeBoundingBoxes , we need an exact mapping between the BoundingBoxes and their corresponding labels to be deleted. And I think we'll eventually need to support labels for KeyPoints as well, which brings the same constraint: only one KeyPoint instance must exist per input.

I left more details below, thanks again for all the work!

NicolasHug · 2025-05-01T15:34:44Z

torchvision/tv_tensors/_keypoints.py

+            # Every sequence that is not a mutable sequence is a non-mutable sequence
+            # We have to use a tuple here, since we know its initialization api, unlike for `output`
+            output = tuple(KeyPoints(part, canvas_size=canvas_size) for part in output)
+        return output


The purpose of _wrap_output is to return a KeyPoints object, it's part of a broader interface where we expect the output to be a TVTensor. So we'll need some small changes to this - it also relates to my point above about having a single KeyPoints instance (in the code above there are many)

NicolasHug · 2025-05-01T15:42:58Z

torchvision/transforms/v2/_misc.py

@@ -341,9 +341,9 @@ def transform(self, inpt: Any, params: dict[str, Any]) -> Any:


 class SanitizeBoundingBoxes(Transform):
-    """Remove degenerate/invalid bounding boxes and their corresponding labels and masks.
+    """Remove degenerate/invalid bounding boxes and their corresponding labels, masks and keypoints.


I wonder if we really need to special-case keypoints in SanitizeBoundingBoxes.
(Note: I do think we'll need SanitizeKeyPoints like you implemented, but that's a different thing).

This should only be needed if there are some keypoints that are associated with some bounding boxes - and so when the bbox gets deleted, we need to delete the corresponding keypoint. But, is that an actual use-case?
@broomhead's thanks a lot for your feedback on #8817 (comment) - can you please share more about why you needed this?

If really this is something necessary, then I think it's better addressed by users by relying on the labels_getter parameter (which could just return the keypoints instance).

Hey, yeah I'm adding keypoint tracking to a model that only does bounding boxes. I had keypoints as tensors but all the TVTensor transforms were leaving the keypoints unmodified and unfiltered. The training has random flips, sanitizing, all that.

So I eventually found out like BoundingBoxes, the type needs to have transform kernels defined. That led me here. I'm sure it's imperfect, but already works great for now (given I had to make my own SanitizeBoundingBoxes adjustment).

It also allowed me to add a normalizekeypoints filter for now as well. Perhaps that should work with the built-in normalize filter, but it didn't seem to. So I wrote a quick one and moved on.

I think it makes sense for the sake of consistency to have SanitizeBoundingBoxes sanitize KeyPoints, since it sanitizes Masks.

Agreed on having the SanitizeKeyPoints transform class implemented fully later

NicolasHug · 2025-05-01T15:45:31Z

torchvision/transforms/v2/functional/_misc.py

+def sanitize_keypoints(
+    keypoints: torch.Tensor, canvas_size: Optional[Tuple[int, int]] = None
+) -> Tuple[torch.Tensor, torch.Tensor]:


I think we'll need to do something similar as in SanitizeBoundingBoxes: we'll have to sanitize the corresponding labels as well. And for that reason, I think we'll need to assume that a single KeyPoints instance exists in the input (a single KeyPoint instance may contain an arbitrary number of keypoints, of course). We have the same assumption for bounding boxes: one BoundingBoxes instance per input.

BTW, I have a stash that makes sanitize_keypoints work for inputs of more than 2D by aggregating.

For example, if the input is a sequence of N triangles, sanitize_keypoints could filter all triangles that have at least one point outside of the canvas. Would that be relevant to this PR ?

NicolasHug

Thank you so much for the PR and for all your work so far @Alexandre-SCHOEPP . The torchvision code base is pretty complex as we have to account for a lot of features in a lot of settings, so we definitely appreciate your effort.

Agreed on leaving the keypoint rcnn integration for later, no worries. Agreed as well on having the KeyPoints be [..., 2], this makes the leading-dimension logic consistent with our other keypoints like Image and Video.

I have one main high-level comment, and @AntoineSimoulin will follow-up with more on the tests: we should assume (at least for now) that there must exist only one single instance of KeyPoints per input. I.e. one image should be associated with a single KeyPoints instance. This is what we assume / enforce for BoundingBoxes as well (see note: https://pytorch.org/vision/main/generated/torchvision.tv_tensors.BoundingBoxes.html): the reason for that is that in SanitizeBoundingBoxes , we need an exact mapping between the BoundingBoxes and their corresponding labels to be deleted. And I think we'll eventually need to support labels for KeyPoints as well, which brings the same constraint: only one KeyPoint instance must exist per input.

I left more details below, thanks again for all the work!

AntoineSimoulin · 2025-05-04T18:17:43Z

Hey @Alexandre-SCHOEPP, thanks again for the amazing work to submit this PR! I appreciate your time submitting and amending this contribution.

As discussed above, I contributed some additional tests to align with what is done for bounding boxes. This required some small adjustments in your contributions to make sure all tests are passing. Let me know if you are aligned with this. I listed the details below.

Details of modifications

Avoid doing the modification in place in horizontal_flip_keypoints;
Adjust to follow python 3.9 hinting;
Modified line keypoints = torch.cat([keypoints, torch.ones(keypoints.shape[0], 1, device=device, dtype=dtype)], dim=-1) in _affine_keypoints_with_expand to pass tests for inputs of shape torch.Size([2, 4, 2]). Also added clamping;
Modified vertical_flip_keypoints and vertical_flip_keypoints_dispatch (add clone to avoid inplace modification);
Removed unused interpolation and fill parameters in rotate_keypoints;
Remove **kwargs in _rotate_keypoints_dispatch;
Fix missing expand case in _affine_keypoints_with_expand;
Fix _elastic_keypoints_dispatch expected type;
Avoid inplace operation in elastic_keypoints (add clone to avoid inplace modification);
Avoid inplace operation in crop_keypoints (add clone to avoid inplace modification);
Fix type for _perspective_keypoints_dispatch;
Add imports in functional/init and fix perspective_keypoints typo in function name;
Fix dtype in perspective_keypoints;
Remove unused interpolation and antalias arguments in resize_keypoints;
Add case of no canva size change in resize_keypoints.

Run tests

pytest test/test_transforms_v2.py -vvv -k "keypoints"
...
313 passed, 189 skipped, 7147 deselected in 4.12s

Next steps

As mentioned by @NicolasHug, the last step to merge would be to adjust the KeyPoints with the assumption that there must exist only one single instance of KeyPoints per input (so we should not return a list of KeyPoints but rather a single instance of KeyPoints in the wrapper. Let us know if you can adjust this so we can merge!

Thanks again for your time, I am excited about integrating the new feature into the codebase:)

Alexandre-SCHOEPP · 2025-05-04T19:39:56Z

@NicolasHug @AntoineSimoulin

Thanks for the feedback, and glad to see the 3.9 type-hints coming to the lib. I struggled a bit to work with 3.8-compatible type-hints when I on all other projects have left those behind.

I reviewed the changes made to the branch on my fork, they seem fine.

I'll finish the compatibility change with batch-wise unicity of the KeyPoints object tomorrow, during which I will also probably make upgrades to the documentation related to the changes. I did most of the documentation 4+ months ago, things are bound to be inadequate when related to your feedback and the changes made since then.

In particular, the KeyPoints class requires documenting that it can be used for Polylines/Polygons and Skeleton, as it was clear during our conversation that an overview of the code did not make that clear. I know of people working with Polylines, I think that actually using the terms in the docs would help developers find what they want by just googling it.

Alexandre-SCHOEPP · 2025-05-05T18:00:15Z

@AntoineSimoulin @NicolasHug

While working on the feedback on _wrap_output , I noticed that the code for BoundingBoxes does the exact same thing, just in a less generalized manner.

Code for BoundingBoxes._wrap_output as of writing

class BoundingBoxes(...):
    ...

    @classmethod
    def _wrap_output(
        cls,
        output: torch.Tensor,
        args: Sequence[Any] = (),
        kwargs: Mapping[str, Any] | None = None,
    ) -> BoundingBoxes:
        # If there are BoundingBoxes instances in the output, their metadata got lost when we called
        # super().__torch_function__. We need to restore the metadata somehow, so we choose to take
        # the metadata from the first bbox in the parameters.
        # This should be what we want in most cases. When it's not, it's probably a mis-use anyway, e.g.
        # something like some_xyxy_bbox + some_xywh_bbox; we don't guard against those cases.
        flat_params, _ = tree_flatten(args + (tuple(kwargs.values()) if kwargs else ()))  # type: ignore[operator]
        first_bbox_from_args = next(x for x in flat_params if isinstance(x, BoundingBoxes))
        format, canvas_size = first_bbox_from_args.format, first_bbox_from_args.canvas_size

        if isinstance(output, torch.Tensor) and not isinstance(output, BoundingBoxes):
            output = BoundingBoxes._wrap(output, format=format, canvas_size=canvas_size, check_dims=False)
        elif isinstance(output, (tuple, list)):
            output = type(output)(
                BoundingBoxes._wrap(part, format=format, canvas_size=canvas_size, check_dims=False) for part in output
            )
        return output

While my version handles any python sequence object, BoundingBoxes' only handles list and tuples and therefore can assume that a generator can be provided as the only argument to the constructor of the type of output.

The offending code on my side attempts to reuse the object to benefit of the class instantiation for mutable sequences.
On non-mutable sequences, it defaults to a tuple. This the best generalization of the behavior of BoundingBoxes._wrap_output as it stands I can imagine.

As such, I am reluctant to undo this logic until I get confirmation that the logic will also be undone for BoundingBoxes._wrap_output (in another PR, preferably)

…boxes and added tests for all formats

…on BoundingBoxes.__repr__ whose signature was copied

…"function" definition

… object in the sample (as is assumed)

Alexandre-SCHOEPP added 9 commits December 12, 2024 17:39

Added Keypoints to the library

8253305

Improved KeyPoints to be exported

484561d

Added kernels to support the keypoints

3255890

Added tests for keypoints

7436636

Applied ufmt formatting

b35cba6

Fixed the bugs found while testing

a19ec0b

Improved documentation to take KeyPoints into account

5f4b188

Applied ufmt check

cabce1c

Fixed the hflip not being along the right coordinate

d1b27ad

facebook-github-bot added the cla signed label Dec 17, 2024

Merge branch 'main' into main

6fa38f4

twsl mentioned this pull request Dec 25, 2024

Improve type hints in tv_tensor.wrap #8830

Open

Alexandre-SCHOEPP added 3 commits February 10, 2025 16:56

Merge branch 'main' into main

05e4ad6

Merge branch 'main' into main

03dc6c8

Merge branch 'main' into main

d4d087c

AntoineSimoulin reviewed Apr 29, 2025

View reviewed changes

Alexandre-SCHOEPP commented Apr 29, 2025

View reviewed changes

Alex-S-H-P added 3 commits April 30, 2025 19:36

Fixed order of arguments

5a8c5b4

Reworked logic of the conditions to better handle mutable/non mutable…

dea31e2

… sequences in wrap_output

Renamed out variable to be more similar with _resized_crop_bounding_b…

71e20a5

…oxes_dispatch

renamed _xyxy_to_points to _xyxy_to_keypoints for consistency

2f77527

Alex-S-H-P and others added 4 commits April 30, 2025 20:12

clarified _xyxy_to_points and changed the name of its caller for the …

517a6de

…sake of consistency with the other bounding_boxes converters in _meta.py

Renamed half_point to more explicit single_coord_shape

63ed4a5

Merge branch 'main' into main

fcfd597

Fixed _geometry.py post botched merge request

1cc3b6f

NicolasHug reviewed May 1, 2025

View reviewed changes

AntoineSimoulin added 3 commits May 3, 2025 16:38

Review python 3.9 type hint and lint

841de77

Add specific keypoint tests

ff6ab48

Adjust variable names

0de59e7

Alex-S-H-P added 7 commits May 5, 2025 21:53

Improved documentation inside of the KeyPoints class definition

4b62ef4

Improved convert_bounding_boxes_to_points to handle rotated bounding …

e99b82a

…boxes and added tests for all formats

Applied ufmt

a869f39

Adding a type:ignore[override] on KeyPoints__repr__ as it also exist …

6007b2c

…on BoundingBoxes.__repr__ whose signature was copied

Fixed flake8 compliance on "..." present in the line of the __init__ …

b68b57b

…"function" definition

get_all_keypoints is now get_keypoints and returns the only keypoints…

801e24d

… object in the sample (as is assumed)

Fixed docstring on sanitize_keypoints

73a40a8



		class KeyPoints(TVTensor):
		""":class:`torch.Tensor` subclass for tensors with shape ``[..., 2]`` that represent points in an image.

Added the KeyPoints TVTensor #8817

Are you sure you want to change the base?

Added the KeyPoints TVTensor #8817

Conversation

Alexandre-SCHOEPP commented Dec 17, 2024

Description

Details

Inner workings

Kernels

Particularities

Convertors

Other changes

Methodology

Discussion

pytorch-bot bot commented Dec 17, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/8817

❗ 1 Active SEVs

facebook-github-bot commented Dec 17, 2024

Action Required

Process

facebook-github-bot commented Dec 17, 2024

broomhead commented Apr 10, 2025

Alexandre-SCHOEPP commented Apr 10, 2025

NicolasHug commented Apr 10, 2025

Alexandre-SCHOEPP commented Apr 10, 2025

AntoineSimoulin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Alexandre-SCHOEPP Apr 29, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Alexandre-SCHOEPP Apr 29, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Alexandre-SCHOEPP left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Alexandre-SCHOEPP Apr 29, 2025 • edited Loading

Choose a reason for hiding this comment

Alexandre-SCHOEPP left a comment

Choose a reason for hiding this comment

Alexandre-SCHOEPP Apr 29, 2025 • edited Loading

Choose a reason for hiding this comment

broomhead commented Apr 30, 2025

Alexandre-SCHOEPP commented Apr 30, 2025 • edited Loading

Alexandre-SCHOEPP commented Apr 30, 2025

Alexandre-SCHOEPP commented Apr 30, 2025 • edited Loading

Alexandre-SCHOEPP commented Apr 30, 2025 • edited Loading

NicolasHug left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NicolasHug left a comment

Choose a reason for hiding this comment

AntoineSimoulin commented May 4, 2025

Details of modifications

Run tests

Next steps

Alexandre-SCHOEPP commented May 4, 2025

Alexandre-SCHOEPP commented May 5, 2025 • edited Loading

pytorch-bot bot commented Dec 17, 2024 •

edited

Loading

Alexandre-SCHOEPP Apr 29, 2025 •

edited

Loading

Alexandre-SCHOEPP Apr 29, 2025 •

edited

Loading

Alexandre-SCHOEPP Apr 29, 2025 •

edited

Loading

Alexandre-SCHOEPP Apr 29, 2025 •

edited

Loading

Alexandre-SCHOEPP commented Apr 30, 2025 •

edited

Loading

Alexandre-SCHOEPP commented Apr 30, 2025 •

edited

Loading

Alexandre-SCHOEPP commented Apr 30, 2025 •

edited

Loading

Alexandre-SCHOEPP commented May 5, 2025 •

edited

Loading