Unsupervised Object Localization in the Era of Self-Supervised ViTs: A Survey
- URL: http://arxiv.org/abs/2310.12904v2
- Date: Thu, 11 Jul 2024 11:46:20 GMT
- Title: Unsupervised Object Localization in the Era of Self-Supervised ViTs: A Survey
- Authors: Oriane Siméoni, Éloi Zablocki, Spyros Gidaris, Gilles Puy, Patrick Pérez,
- Abstract summary: Recent works show that it is possible to perform class-agnostic unsupervised object localization by exploiting self-supervised pre-trained features.
We propose here a survey of unsupervised object localization methods that discover objects in images without requiring any manual annotation.
- Score: 33.692534984177364
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent enthusiasm for open-world vision systems show the high interest of the community to perform perception tasks outside of the closed-vocabulary benchmark setups which have been so popular until now. Being able to discover objects in images/videos without knowing in advance what objects populate the dataset is an exciting prospect. But how to find objects without knowing anything about them? Recent works show that it is possible to perform class-agnostic unsupervised object localization by exploiting self-supervised pre-trained features. We propose here a survey of unsupervised object localization methods that discover objects in images without requiring any manual annotation in the era of self-supervised ViTs. We gather links of discussed methods in the repository https://github.com/valeoai/Awesome-Unsupervised-Object-Localization.
Related papers
- VOVTrack: Exploring the Potentiality in Videos for Open-Vocabulary Object Tracking [61.56592503861093]
This issue amalgamates the complexities of open-vocabulary object detection (OVD) and multi-object tracking (MOT)
Existing approaches to OVMOT often merge OVD and MOT methodologies as separate modules, predominantly focusing on the problem through an image-centric lens.
We propose VOVTrack, a novel method that integrates object states relevant to MOT and video-centric training to address this challenge from a video object tracking standpoint.
arXiv Detail & Related papers (2024-10-11T05:01:49Z) - PEEKABOO: Hiding parts of an image for unsupervised object localization [7.161489957025654]
Localizing objects in an unsupervised manner poses significant challenges due to the absence of key visual information.
We propose a single-stage learning framework, dubbed PEEKABOO, for unsupervised object localization.
The key idea is to selectively hide parts of an image and leverage the remaining image information to infer the location of objects without explicit supervision.
arXiv Detail & Related papers (2024-07-24T20:35:20Z) - Unsupervised Open-Vocabulary Object Localization in Videos [118.32792460772332]
We show that recent advances in video representation learning and pre-trained vision-language models allow for substantial improvements in self-supervised video object localization.
We propose a method that first localizes objects in videos via an object-centric approach with slot attention and then assigns text to the obtained slots.
arXiv Detail & Related papers (2023-09-18T15:20:13Z) - Object-Centric Multiple Object Tracking [124.30650395969126]
This paper proposes a video object-centric model for multiple-object tracking pipelines.
It consists of an index-merge module that adapts the object-centric slots into detection outputs and an object memory module.
Benefited from object-centric learning, we only require sparse detection labels for object localization and feature binding.
arXiv Detail & Related papers (2023-09-01T03:34:12Z) - Unsupervised Object Localization: Observing the Background to Discover
Objects [4.870509580034194]
In this work, we take a different approach and propose to look for the background instead.
This way, the salient objects emerge as a by-product without any strong assumption on what an object should be.
We propose FOUND, a simple model made of a single $conv1times1$ with coarse background masks extracted from self-supervised patch-based representations.
arXiv Detail & Related papers (2022-12-15T13:43:11Z) - Open World DETR: Transformer based Open World Object Detection [60.64535309016623]
We propose a two-stage training approach named Open World DETR for open world object detection based on Deformable DETR.
We fine-tune the class-specific components of the model with a multi-view self-labeling strategy and a consistency constraint.
Our proposed method outperforms other state-of-the-art open world object detection methods by a large margin.
arXiv Detail & Related papers (2022-12-06T13:39:30Z) - Open-Set Object Detection Using Classification-free Object Proposal and
Instance-level Contrastive Learning [25.935629339091697]
Open-set object detection (OSOD) is a promising direction to handle the problem consisting of two subtasks: objects and background separation, and open-set object classification.
We present Openset RCNN to address the challenging OSOD.
We show that our Openset RCNN can endow the robot with an open-set perception ability to support robotic rearrangement tasks in cluttered environments.
arXiv Detail & Related papers (2022-11-21T15:00:04Z) - 4D Unsupervised Object Discovery [53.561750858325915]
We propose 4D unsupervised object discovery, jointly discovering objects from 4D data -- 3D point clouds and 2D RGB images with temporal information.
We present the first practical approach for this task by proposing a ClusterNet on 3D point clouds, which is jointly optimized with a 2D localization network.
arXiv Detail & Related papers (2022-10-10T16:05:53Z) - Towards Open-Set Object Detection and Discovery [38.81806249664884]
We present a new task, namely Open-Set Object Detection and Discovery (OSODD)
We propose a two-stage method that first uses an open-set object detector to predict both known and unknown objects.
Then, we study the representation of predicted objects in an unsupervised manner and discover new categories from the set of unknown objects.
arXiv Detail & Related papers (2022-04-12T08:07:01Z) - Localizing Objects with Self-Supervised Transformers and no Labels [44.364726903520086]
Localizing objects in image collections without supervision can help to avoid expensive annotation campaigns.
We propose a simple approach to this problem, that leverages the activation features of a vision transformer pre-trained in a self-supervised manner.
We outperform state-of-the-art object discovery methods by up to 8 CorLoc points on PASCAL VOC 2012.
arXiv Detail & Related papers (2021-09-29T09:01:07Z) - Learning Open-World Object Proposals without Learning to Classify [110.30191531975804]
We propose a classification-free Object Localization Network (OLN) which estimates the objectness of each region purely by how well the location and shape of a region overlaps with any ground-truth object.
This simple strategy learns generalizable objectness and outperforms existing proposals on cross-category generalization.
arXiv Detail & Related papers (2021-08-15T14:36:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.