Self-Supervised Learning of Object Parts for Semantic Segmentation
- URL: http://arxiv.org/abs/2204.13101v1
- Date: Wed, 27 Apr 2022 17:55:17 GMT
- Title: Self-Supervised Learning of Object Parts for Semantic Segmentation
- Authors: Adrian Ziegler, Yuki M. Asano
- Abstract summary: We argue that self-supervised learning of object parts is a solution to this issue.
Our method surpasses the state-of-the-art on three semantic segmentation benchmarks by 17%-3%.
- Score: 7.99536002595393
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Progress in self-supervised learning has brought strong general image
representation learning methods. Yet so far, it has mostly focused on
image-level learning. In turn, tasks such as unsupervised image segmentation
have not benefited from this trend as they require spatially-diverse
representations. However, learning dense representations is challenging, as in
the unsupervised context it is not clear how to guide the model to learn
representations that correspond to various potential object categories. In this
paper, we argue that self-supervised learning of object parts is a solution to
this issue. Object parts are generalizable: they are a priori independent of an
object definition, but can be grouped to form objects a posteriori. To this
end, we leverage the recently proposed Vision Transformer's capability of
attending to objects and combine it with a spatially dense clustering task for
fine-tuning the spatial tokens. Our method surpasses the state-of-the-art on
three semantic segmentation benchmarks by 17%-3%, showing that our
representations are versatile under various object definitions. Finally, we
extend this to fully unsupervised segmentation - which refrains completely from
using label information even at test-time - and demonstrate that a simple
method for automatically merging discovered object parts based on community
detection yields substantial gains.
Related papers
- Towards Open-World Segmentation of Parts [16.056921233445784]
We propose to explore a class-agnostic part segmentation task.
We argue that models trained without part classes can better localize parts and segment them on objects unseen in training.
We show notable and consistent gains by our approach, essentially a critical step towards open-world part segmentation.
arXiv Detail & Related papers (2023-05-26T10:34:58Z) - Self-Supervised Visual Representation Learning with Semantic Grouping [50.14703605659837]
We tackle the problem of learning visual representations from unlabeled scene-centric data.
We propose contrastive learning from data-driven semantic slots, namely SlotCon, for joint semantic grouping and representation learning.
arXiv Detail & Related papers (2022-05-30T17:50:59Z) - Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels.
Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions.
We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z) - Unsupervised Part Discovery from Contrastive Reconstruction [90.88501867321573]
The goal of self-supervised visual representation learning is to learn strong, transferable image representations.
We propose an unsupervised approach to object part discovery and segmentation.
Our method yields semantic parts consistent across fine-grained but visually distinct categories.
arXiv Detail & Related papers (2021-11-11T17:59:42Z) - Unsupervised Part Discovery by Unsupervised Disentanglement [10.664434993386525]
Part segmentations provide information about part localizations on the level of individual pixels.
Large annotation costs limit the scalability of supervised algorithms to other object categories.
Our work demonstrates the feasibility to discover semantic part segmentations without supervision.
arXiv Detail & Related papers (2020-09-09T12:34:37Z) - Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation [128.03739769844736]
Two neural co-attentions are incorporated into the classifier to capture cross-image semantic similarities and differences.
In addition to boosting object pattern learning, the co-attention can leverage context from other related images to improve localization map inference.
Our algorithm sets new state-of-the-arts on all these settings, demonstrating well its efficacy and generalizability.
arXiv Detail & Related papers (2020-07-03T21:53:46Z) - Look-into-Object: Self-supervised Structure Modeling for Object
Recognition [71.68524003173219]
We propose to "look into object" (explicitly yet intrinsically model the object structure) through incorporating self-supervisions.
We show the recognition backbone can be substantially enhanced for more robust representation learning.
Our approach achieves large performance gain on a number of benchmarks, including generic object recognition (ImageNet) and fine-grained object recognition tasks (CUB, Cars, Aircraft)
arXiv Detail & Related papers (2020-03-31T12:22:51Z) - Global-Local Bidirectional Reasoning for Unsupervised Representation
Learning of 3D Point Clouds [109.0016923028653]
We learn point cloud representation by bidirectional reasoning between the local structures and the global shape without human supervision.
We show that our unsupervised model surpasses the state-of-the-art supervised methods on both synthetic and real-world 3D object classification datasets.
arXiv Detail & Related papers (2020-03-29T08:26:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.