MetaSlot: Break Through the Fixed Number of Slots in Object-Centric Learning
- URL: http://arxiv.org/abs/2505.20772v1
- Date: Tue, 27 May 2025 06:23:03 GMT
- Title: MetaSlot: Break Through the Fixed Number of Slots in Object-Centric Learning
- Authors: Hongjia Liu, Rongzhen Zhao, Haohan Chen, Joni Pajarinen,
- Abstract summary: We introduce MetaSlot, a plug-and-play Slot Attention variant that adapts to variable object counts.<n>We show that MetaSlot achieves significant performance gains and markedly interpretable slot representations, compared with existing Slot Attention variants.
- Score: 11.365829102707014
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning object-level, structured representations is widely regarded as a key to better generalization in vision and underpins the design of next-generation Pre-trained Vision Models (PVMs). Mainstream Object-Centric Learning (OCL) methods adopt Slot Attention or its variants to iteratively aggregate objects' super-pixels into a fixed set of query feature vectors, termed slots. However, their reliance on a static slot count leads to an object being represented as multiple parts when the number of objects varies. We introduce MetaSlot, a plug-and-play Slot Attention variant that adapts to variable object counts. MetaSlot (i) maintains a codebook that holds prototypes of objects in a dataset by vector-quantizing the resulting slot representations; (ii) removes duplicate slots from the traditionally aggregated slots by quantizing them with the codebook; and (iii) injects progressively weaker noise into the Slot Attention iterations to accelerate and stabilize the aggregation. MetaSlot is a general Slot Attention variant that can be seamlessly integrated into existing OCL architectures. Across multiple public datasets and tasks--including object discovery and recognition--models equipped with MetaSlot achieve significant performance gains and markedly interpretable slot representations, compared with existing Slot Attention variants.
Related papers
- Slot Attention with Re-Initialization and Self-Distillation [22.024377849671033]
We propose Slot Attention with re-Initialization and self-Distillation (DIAS) for object discovery and recognition.<n>DIAS achieves state-of-the-art on OCL tasks like object discovery and recognition, while also improving advanced visual prediction and reasoning.
arXiv Detail & Related papers (2025-07-31T17:41:18Z) - Are We Done with Object-Centric Learning? [65.67948794110212]
Object-centric learning (OCL) seeks to learn representations that only encode an object, isolated from other objects or background cues in a scene.<n>With recent sample-efficient segmentation models, we can separate objects in the pixel space and encode them independently.<n>We address the OOD generalization challenge caused by spurious background cues through the lens of OCL.
arXiv Detail & Related papers (2025-04-09T17:59:05Z) - M$^3$-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation [51.82272563578793]
We introduce the concept of phase in segmentation, which categorizes real-world objects based on their visual characteristics and potential morphological and appearance changes.<n>We present a new benchmark, Multi-Phase, Multi-Transition, and Multi-Scenery Video Object (M$3$-VOS), to verify the ability of models to understand object phases.
arXiv Detail & Related papers (2024-12-18T12:50:11Z) - Guided Latent Slot Diffusion for Object-Centric Learning [13.721373817758307]
We introduce Guided Latent Slot Diffusion - GLASS, an object-centric model that uses generated captions as a guiding signal to better align slots with objects.
For object discovery, GLASS achieves approx. a +35% and +10% relative improvement for mIoU over the previous state-of-the-art (SOTA) method.
For the segmentation task, GLASS surpasses SOTA weakly-supervised and language-based segmentation models, which were specifically designed for the task.
arXiv Detail & Related papers (2024-07-25T10:38:32Z) - Attention Normalization Impacts Cardinality Generalization in Slot Attention [6.9099729240700825]
We propose and investigate alternatives to the original normalization scheme which increase the capabilities of Slot Attention to varying slot and object counts.
The newly proposed normalizations represent minimal and easy to implement modifications of the usual Slot Attention module.
arXiv Detail & Related papers (2024-07-04T22:09:01Z) - Adaptive Slot Attention: Object Discovery with Dynamic Slot Number [64.45419820717754]
A major drawback of most object-centric models, including slot attention, is their reliance on predefining the number of slots.
Within this framework, we introduce an adaptive slot attention (AdaSlot) mechanism that dynamically determines the optimal number of slots.
Our framework, tested extensively on object discovery tasks with various datasets, shows performance matching or exceeding top fixed-slot models.
arXiv Detail & Related papers (2024-06-13T14:55:11Z) - Masked Multi-Query Slot Attention for Unsupervised Object Discovery [7.613552182035413]
In this work, we consider an object-centric approach in which DINO ViT features are reconstructed via a set of representations queried called slots.
We propose a masking scheme on input features that disregards the background regions, inducing our model to focus more on salient objects during the reconstruction phase.
Our experimental results and ablations on the PASCAL-VOC 2012 dataset show the importance of each component and highlight how their combination consistently improves object localization.
arXiv Detail & Related papers (2024-04-30T15:51:05Z) - Object-Centric Multiple Object Tracking [124.30650395969126]
This paper proposes a video object-centric model for multiple-object tracking pipelines.
It consists of an index-merge module that adapts the object-centric slots into detection outputs and an object memory module.
Benefited from object-centric learning, we only require sparse detection labels for object localization and feature binding.
arXiv Detail & Related papers (2023-09-01T03:34:12Z) - Enhancing Interpretable Object Abstraction via Clustering-based Slot
Initialization [17.25953277219166]
We present a new method for object-centric representations using slots.
Our method outperforms prior works consistently.
We evaluate our method on object discovery and novel view synthesis tasks with various datasets.
arXiv Detail & Related papers (2023-08-22T11:48:43Z) - Complex-Valued Autoencoders for Object Discovery [62.26260974933819]
We propose a distributed approach to object-centric representations: the Complex AutoEncoder.
We show that this simple and efficient approach achieves better reconstruction performance than an equivalent real-valued autoencoder on simple multi-object datasets.
We also show that it achieves competitive unsupervised object discovery performance to a SlotAttention model on two datasets, and manages to disentangle objects in a third dataset where SlotAttention fails - all while being 7-70 times faster to train.
arXiv Detail & Related papers (2022-04-05T09:25:28Z) - Scalable Video Object Segmentation with Identification Mechanism [125.4229430216776]
This paper explores the challenges of achieving scalable and effective multi-object modeling for semi-supervised Video Object (VOS)
We present two innovative approaches, Associating Objects with Transformers (AOT) and Associating Objects with Scalable Transformers (AOST)
Our approaches surpass the state-of-the-art competitors and display exceptional efficiency and scalability consistently across all six benchmarks.
arXiv Detail & Related papers (2022-03-22T03:33:27Z) - CARAFE++: Unified Content-Aware ReAssembly of FEatures [132.49582482421246]
We propose unified Content-Aware ReAssembly of FEatures (CARAFE++), a universal, lightweight and highly effective operator to fulfill this goal.
CARAFE++ generates adaptive kernels on-the-fly to enable instance-specific content-aware handling.
It shows consistent and substantial gains across all the tasks with negligible computational overhead.
arXiv Detail & Related papers (2020-12-07T07:34:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.