Adaptive Slot Attention: Object Discovery with Dynamic Slot Number
- URL: http://arxiv.org/abs/2406.09196v1
- Date: Thu, 13 Jun 2024 14:55:11 GMT
- Title: Adaptive Slot Attention: Object Discovery with Dynamic Slot Number
- Authors: Ke Fan, Zechen Bai, Tianjun Xiao, Tong He, Max Horn, Yanwei Fu, Francesco Locatello, Zheng Zhang,
- Abstract summary: A major drawback of most object-centric models, including slot attention, is their reliance on predefining the number of slots.
Within this framework, we introduce an adaptive slot attention (AdaSlot) mechanism that dynamically determines the optimal number of slots.
Our framework, tested extensively on object discovery tasks with various datasets, shows performance matching or exceeding top fixed-slot models.
- Score: 64.45419820717754
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Object-centric learning (OCL) extracts the representation of objects with slots, offering an exceptional blend of flexibility and interpretability for abstracting low-level perceptual features. A widely adopted method within OCL is slot attention, which utilizes attention mechanisms to iteratively refine slot representations. However, a major drawback of most object-centric models, including slot attention, is their reliance on predefining the number of slots. This not only necessitates prior knowledge of the dataset but also overlooks the inherent variability in the number of objects present in each instance. To overcome this fundamental limitation, we present a novel complexity-aware object auto-encoder framework. Within this framework, we introduce an adaptive slot attention (AdaSlot) mechanism that dynamically determines the optimal number of slots based on the content of the data. This is achieved by proposing a discrete slot sampling module that is responsible for selecting an appropriate number of slots from a candidate list. Furthermore, we introduce a masked slot decoder that suppresses unselected slots during the decoding process. Our framework, tested extensively on object discovery tasks with various datasets, shows performance matching or exceeding top fixed-slot models. Moreover, our analysis substantiates that our method exhibits the capability to dynamically adapt the slot number according to each instance's complexity, offering the potential for further exploration in slot attention research. Project will be available at https://kfan21.github.io/AdaSlot/
Related papers
- Masked Multi-Query Slot Attention for Unsupervised Object Discovery [7.613552182035413]
In this work, we consider an object-centric approach in which DINO ViT features are reconstructed via a set of representations queried called slots.
We propose a masking scheme on input features that disregards the background regions, inducing our model to focus more on salient objects during the reconstruction phase.
Our experimental results and ablations on the PASCAL-VOC 2012 dataset show the importance of each component and highlight how their combination consistently improves object localization.
arXiv Detail & Related papers (2024-04-30T15:51:05Z) - Object-Centric Multiple Object Tracking [124.30650395969126]
This paper proposes a video object-centric model for multiple-object tracking pipelines.
It consists of an index-merge module that adapts the object-centric slots into detection outputs and an object memory module.
Benefited from object-centric learning, we only require sparse detection labels for object localization and feature binding.
arXiv Detail & Related papers (2023-09-01T03:34:12Z) - Enhancing Interpretable Object Abstraction via Clustering-based Slot
Initialization [17.25953277219166]
We present a new method for object-centric representations using slots.
Our method outperforms prior works consistently.
We evaluate our method on object discovery and novel view synthesis tasks with various datasets.
arXiv Detail & Related papers (2023-08-22T11:48:43Z) - Sensitivity of Slot-Based Object-Centric Models to their Number of Slots [15.990209329609275]
We study the sensitivity of slot-based methods to $K$ and how this affects their learned correspondence to objects in the data.
We find that, especially during training, incorrect choices of $K$ do not yield the desired object decomposition.
We demonstrate that the choice of the objective function and incorporating instance-level annotations can moderately mitigate this behavior.
arXiv Detail & Related papers (2023-05-30T09:44:12Z) - Invariant Slot Attention: Object Discovery with Slot-Centric Reference
Frames [18.84636947819183]
Slot-based neural networks that learn about objects in a self-supervised manner have made exciting progress.
We present a simple yet highly effective method for incorporating spatial symmetries via slot-centric reference frames.
We evaluate our method on a range of synthetic object discovery benchmarks namely CLEVR, Tetrominoes, CLEVR, Objects Room and MultiShapeNet.
arXiv Detail & Related papers (2023-02-09T23:25:28Z) - IoU-Enhanced Attention for End-to-End Task Specific Object Detection [17.617133414432836]
R-CNN achieves promising results without densely tiled anchor boxes or grid points in the image.
Due to the sparse nature and the one-to-one relation between the query and its attending region, it heavily depends on the self attention.
This paper proposes to use IoU between different boxes as a prior for the value routing in self attention.
arXiv Detail & Related papers (2022-09-21T14:36:18Z) - Complex-Valued Autoencoders for Object Discovery [62.26260974933819]
We propose a distributed approach to object-centric representations: the Complex AutoEncoder.
We show that this simple and efficient approach achieves better reconstruction performance than an equivalent real-valued autoencoder on simple multi-object datasets.
We also show that it achieves competitive unsupervised object discovery performance to a SlotAttention model on two datasets, and manages to disentangle objects in a third dataset where SlotAttention fails - all while being 7-70 times faster to train.
arXiv Detail & Related papers (2022-04-05T09:25:28Z) - Towards Real-World Prohibited Item Detection: A Large-Scale X-ray
Benchmark [53.9819155669618]
This paper presents a large-scale dataset, named as PIDray, which covers various cases in real-world scenarios for prohibited item detection.
With an intensive amount of effort, our dataset contains $12$ categories of prohibited items in $47,677$ X-ray images with high-quality annotated segmentation masks and bounding boxes.
The proposed method performs favorably against the state-of-the-art methods, especially for detecting the deliberately hidden items.
arXiv Detail & Related papers (2021-08-16T11:14:16Z) - Coach: A Coarse-to-Fine Approach for Cross-domain Slot Filling [65.09621991654745]
Cross-domain slot filling is an essential task in task-oriented dialog systems.
We propose a Coarse-to-fine approach (Coach) for cross-domain slot filling.
Experimental results show that our model significantly outperforms state-of-the-art approaches in slot filling.
arXiv Detail & Related papers (2020-04-24T13:07:12Z) - Multi-Granularity Reference-Aided Attentive Feature Aggregation for
Video-based Person Re-identification [98.7585431239291]
Video-based person re-identification aims at matching the same person across video clips.
In this paper, we propose an attentive feature aggregation module, namely Multi-Granularity Reference-Attentive Feature aggregation module MG-RAFA.
Our framework achieves the state-of-the-art ablation performance on three benchmark datasets.
arXiv Detail & Related papers (2020-03-27T03:49:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.