Skip to content

Add EoMT from ViT is Secretly an Image Segmentation Model #1132

Open
@tcourat

Description

@tcourat

Hi, here to share a new image segmentation paper using ViT !

Paper : https://arxiv.org/abs/2503.19108
Code : https://github.com/tue-mps/eomt

This papers reach almost SOTA result with considerably less complex architectures (vision transformer only), if they are already well pretrained. EoMT only uses the architecture of the plain ViT with a few extra learned queries and a small mask prediction module. It works on par with ViT-Adapter + Mask2Former while being much less complex.

It would be interesting to have in this library !

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions