Enhancing Interpretable Object Abstraction via Clustering-based Slot
Initialization
- URL: http://arxiv.org/abs/2308.11369v1
- Date: Tue, 22 Aug 2023 11:48:43 GMT
- Title: Enhancing Interpretable Object Abstraction via Clustering-based Slot
Initialization
- Authors: Ning Gao, Bernard Hohmann, Gerhard Neumann
- Abstract summary: We present a new method for object-centric representations using slots.
Our method outperforms prior works consistently.
We evaluate our method on object discovery and novel view synthesis tasks with various datasets.
- Score: 17.25953277219166
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Object-centric representations using slots have shown the advances towards
efficient, flexible and interpretable abstraction from low-level perceptual
features in a compositional scene. Current approaches randomize the initial
state of slots followed by an iterative refinement. As we show in this paper,
the random slot initialization significantly affects the accuracy of the final
slot prediction. Moreover, current approaches require a predetermined number of
slots from prior knowledge of the data, which limits the applicability in the
real world. In our work, we initialize the slot representations with clustering
algorithms conditioned on the perceptual input features. This requires an
additional layer in the architecture to initialize the slots given the
identified clusters. We design permutation invariant and permutation
equivariant versions of this layer to enable the exchangeable slot
representations after clustering. Additionally, we employ mean-shift clustering
to automatically identify the number of slots for a given scene. We evaluate
our method on object discovery and novel view synthesis tasks with various
datasets. The results show that our method outperforms prior works
consistently, especially for complex scenes.
Related papers
- Adaptive Slot Attention: Object Discovery with Dynamic Slot Number [64.45419820717754]
A major drawback of most object-centric models, including slot attention, is their reliance on predefining the number of slots.
Within this framework, we introduce an adaptive slot attention (AdaSlot) mechanism that dynamically determines the optimal number of slots.
Our framework, tested extensively on object discovery tasks with various datasets, shows performance matching or exceeding top fixed-slot models.
arXiv Detail & Related papers (2024-06-13T14:55:11Z) - Object-Centric Learning with Slot Mixture Module [45.62331048595689]
Our work employs a learnable clustering method based on the Gaussian Mixture Model.
Unlike other approaches, we represent slots not only as centers of clusters but also incorporate information about the distance between clusters and assigned vectors.
Our experiments demonstrate that using this approach instead of Slot Attention improves performance in object-centric scenarios.
arXiv Detail & Related papers (2023-11-08T12:34:36Z) - Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos [63.94040814459116]
Self-supervised methods have shown remarkable progress in learning high-level semantics and low-level temporal correspondence.
We propose a novel semantic-aware masked slot attention on top of the fused semantic features and correspondence maps.
We adopt semantic- and instance-level temporal consistency as self-supervision to encourage temporally coherent object-centric representations.
arXiv Detail & Related papers (2023-08-19T09:12:13Z) - Sensitivity of Slot-Based Object-Centric Models to their Number of Slots [15.990209329609275]
We study the sensitivity of slot-based methods to $K$ and how this affects their learned correspondence to objects in the data.
We find that, especially during training, incorrect choices of $K$ do not yield the desired object decomposition.
We demonstrate that the choice of the objective function and incorporating instance-level annotations can moderately mitigate this behavior.
arXiv Detail & Related papers (2023-05-30T09:44:12Z) - Invariant Slot Attention: Object Discovery with Slot-Centric Reference
Frames [18.84636947819183]
Slot-based neural networks that learn about objects in a self-supervised manner have made exciting progress.
We present a simple yet highly effective method for incorporating spatial symmetries via slot-centric reference frames.
We evaluate our method on a range of synthetic object discovery benchmarks namely CLEVR, Tetrominoes, CLEVR, Objects Room and MultiShapeNet.
arXiv Detail & Related papers (2023-02-09T23:25:28Z) - CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point
Cloud Learning [81.85951026033787]
We set transformers in this work and incorporate them into a hierarchical framework for shape classification and part and scene segmentation.
We also compute efficient and dynamic global cross attentions by leveraging sampling and grouping at each iteration.
The proposed hierarchical model achieves state-of-the-art shape classification in mean accuracy and yields results on par with the previous segmentation methods.
arXiv Detail & Related papers (2022-07-31T21:39:15Z) - SE(3)-Equivariant Attention Networks for Shape Reconstruction in
Function Space [50.14426188851305]
We propose the first SE(3)-equivariant coordinate-based network for learning occupancy fields from point clouds.
In contrast to previous shape reconstruction methods that align the input to a regular grid, we operate directly on the irregular, unoriented point cloud.
We show that our method outperforms previous SO(3)-equivariant methods, as well as non-equivariant methods trained on SO(3)-augmented datasets.
arXiv Detail & Related papers (2022-04-05T17:59:15Z) - Learning Local Displacements for Point Cloud Completion [93.54286830844134]
We propose a novel approach aimed at object and semantic scene completion from a partial scan represented as a 3D point cloud.
Our architecture relies on three novel layers that are used successively within an encoder-decoder structure.
We evaluate both architectures on object and indoor scene completion tasks, achieving state-of-the-art performance.
arXiv Detail & Related papers (2022-03-30T18:31:37Z) - Coach: A Coarse-to-Fine Approach for Cross-domain Slot Filling [65.09621991654745]
Cross-domain slot filling is an essential task in task-oriented dialog systems.
We propose a Coarse-to-fine approach (Coach) for cross-domain slot filling.
Experimental results show that our model significantly outperforms state-of-the-art approaches in slot filling.
arXiv Detail & Related papers (2020-04-24T13:07:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.