QASA: Quality-Guided K-Adaptive Slot Attention for Unsupervised Object-Centric Learning
- URL: http://arxiv.org/abs/2601.12936v1
- Date: Mon, 19 Jan 2026 10:42:07 GMT
- Title: QASA: Quality-Guided K-Adaptive Slot Attention for Unsupervised Object-Centric Learning
- Authors: Tianran Ouyang, Xingping Dong, Jing Zhang, Mang Ye, Jun Chen, Bo Du,
- Abstract summary: Slot Attention is an approach that binds different objects in a scene to a set of "slots"<n>Previous K-adaptive methods do not explicitly constrain slot-binding quality.<n>We propose Quality-Guided K-Adaptive Slot Attention (QASA)
- Score: 80.82392186401354
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Slot Attention, an approach that binds different objects in a scene to a set of "slots", has become a leading method in unsupervised object-centric learning. Most methods assume a fixed slot count K, and to better accommodate the dynamic nature of object cardinality, a few works have explored K-adaptive variants. However, existing K-adaptive methods still suffer from two limitations. First, they do not explicitly constrain slot-binding quality, so low-quality slots lead to ambiguous feature attribution. Second, adding a slot-count penalty to the reconstruction objective creates conflicting optimization goals between reducing the number of active slots and maintaining reconstruction fidelity. As a result, they still lag significantly behind strong K-fixed baselines. To address these challenges, we propose Quality-Guided K-Adaptive Slot Attention (QASA). First, we decouple slot selection from reconstruction, eliminating the mutual constraints between the two objectives. Then, we propose an unsupervised Slot-Quality metric to assess per-slot quality, providing a principled signal for fine-grained slot--object binding. Based on this metric, we design a Quality-Guided Slot Selection scheme that dynamically selects a subset of high-quality slots and feeds them into our newly designed gated decoder for reconstruction during training. At inference, token-wise competition on slot attention yields a K-adaptive outcome. Experiments show that QASA substantially outperforms existing K-adaptive methods on both real and synthetic datasets. Moreover, on real-world datasets QASA surpasses K-fixed methods.
Related papers
- Continuous Optimization for Feature Selection with Permutation-Invariant Embedding and Policy-Guided Search [31.460557834760873]
We develop an encoder-decoder paradigm to preserve feature selection knowledge into a continuous embedding space.<n>We also employ a policy-based reinforcement learning approach to guide the exploration of the embedding space.
arXiv Detail & Related papers (2025-05-16T18:08:16Z) - Towards Continual Learning Desiderata via HSIC-Bottleneck
Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion.
Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z) - Small Object Detection via Coarse-to-fine Proposal Generation and
Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning.
CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z) - IoU-Enhanced Attention for End-to-End Task Specific Object Detection [17.617133414432836]
R-CNN achieves promising results without densely tiled anchor boxes or grid points in the image.
Due to the sparse nature and the one-to-one relation between the query and its attending region, it heavily depends on the self attention.
This paper proposes to use IoU between different boxes as a prior for the value routing in self attention.
arXiv Detail & Related papers (2022-09-21T14:36:18Z) - Semi-supervised Domain Adaptive Structure Learning [72.01544419893628]
Semi-supervised domain adaptation (SSDA) is a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains.
We introduce an adaptive structure learning method to regularize the cooperation of SSL and DA.
arXiv Detail & Related papers (2021-12-12T06:11:16Z) - Uncertainty-aware Clustering for Unsupervised Domain Adaptive Object
Re-identification [123.75412386783904]
State-of-the-art object Re-ID approaches adopt clustering algorithms to generate pseudo-labels for the unlabeled target domain.
We propose an uncertainty-aware clustering framework (UCF) for UDA tasks.
Our UCF method consistently achieves state-of-the-art performance in multiple UDA tasks for object Re-ID.
arXiv Detail & Related papers (2021-08-22T09:57:14Z) - Unpaired Image Enhancement with Quality-Attention Generative Adversarial
Network [92.01145655155374]
We propose a quality attention generative adversarial network (QAGAN) trained on unpaired data.
Key novelty of the proposed QAGAN lies in the injected QAM for the generator.
Our proposed method achieves better performance in both objective and subjective evaluations.
arXiv Detail & Related papers (2020-12-30T05:57:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.