Unsupervised Multi-object Segmentation Using Attention and Soft-argmax
- URL: http://arxiv.org/abs/2205.13271v1
- Date: Thu, 26 May 2022 10:58:48 GMT
- Title: Unsupervised Multi-object Segmentation Using Attention and Soft-argmax
- Authors: Bruno Sauvalle and Arnaud de La Fortelle
- Abstract summary: We introduce a new architecture for unsupervised object-centric representation learning and multi-object detection and segmentation.
We show that this architecture significantly outperforms the state of the art on complex synthetic benchmarks and provide examples of applications to real-world traffic videos.
- Score: 0.6853165736531939
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We introduce a new architecture for unsupervised object-centric
representation learning and multi-object detection and segmentation, which uses
an attention mechanism to associate a feature vector to each object present in
the scene and to predict the coordinates of these objects using soft-argmax. A
transformer encoder handles occlusions and redundant detections, and a separate
pre-trained background model is in charge of background reconstruction. We show
that this architecture significantly outperforms the state of the art on
complex synthetic benchmarks and provide examples of applications to real-world
traffic videos.
Related papers
- MI-DETR: An Object Detection Model with Multi-time Inquiries Mechanism [67.56918651825056]
We propose a new decoder architecture with the parallel Multi-time Inquiries (MI) mechanism.
Our MI based model, MI-DETR, outperforms all existing DETR-like models on COCO benchmark.
A series of diagnostic and visualization experiments demonstrate the effectiveness, rationality, and interpretability of MI.
arXiv Detail & Related papers (2025-03-03T12:19:06Z) - Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals.
Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars.
Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z) - Neural Constraint Satisfaction: Hierarchical Abstraction for
Combinatorial Generalization in Object Rearrangement [75.9289887536165]
We present a hierarchical abstraction approach to uncover underlying entities.
We show how to learn a correspondence between intervening on states of entities in the agent's model and acting on objects in the environment.
We use this correspondence to develop a method for control that generalizes to different numbers and configurations of objects.
arXiv Detail & Related papers (2023-03-20T18:19:36Z) - Guided Slot Attention for Unsupervised Video Object Segmentation [16.69412563413671]
We propose a guided slot attention network to reinforce spatial structural information and obtain better foreground--background separation.
The proposed model achieves state-of-the-art performance on two popular datasets.
arXiv Detail & Related papers (2023-03-15T02:08:20Z) - Segmenting Moving Objects via an Object-Centric Layered Representation [100.26138772664811]
We introduce an object-centric segmentation model with a depth-ordered layer representation.
We introduce a scalable pipeline for generating synthetic training data with multiple objects.
We evaluate the model on standard video segmentation benchmarks.
arXiv Detail & Related papers (2022-07-05T17:59:43Z) - Complex-Valued Autoencoders for Object Discovery [62.26260974933819]
We propose a distributed approach to object-centric representations: the Complex AutoEncoder.
We show that this simple and efficient approach achieves better reconstruction performance than an equivalent real-valued autoencoder on simple multi-object datasets.
We also show that it achieves competitive unsupervised object discovery performance to a SlotAttention model on two datasets, and manages to disentangle objects in a third dataset where SlotAttention fails - all while being 7-70 times faster to train.
arXiv Detail & Related papers (2022-04-05T09:25:28Z) - Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels.
Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions.
We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z) - Benchmarking Unsupervised Object Representations for Video Sequences [111.81492107649889]
We compare the perceptual abilities of four object-centric approaches: ViMON, OP3, TBA and SCALOR.
Our results suggest that the architectures with unconstrained latent representations learn more powerful representations in terms of object detection, segmentation and tracking.
Our benchmark may provide fruitful guidance towards learning more robust object-centric video representations.
arXiv Detail & Related papers (2020-06-12T09:37:24Z) - Contextual Encoder-Decoder Network for Visual Saliency Prediction [42.047816176307066]
We propose an approach based on a convolutional neural network pre-trained on a large-scale image classification task.
We combine the resulting representations with global scene information for accurately predicting visual saliency.
Compared to state of the art approaches, the network is based on a lightweight image classification backbone.
arXiv Detail & Related papers (2019-02-18T16:15:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.