Invariant Slot Attention: Object Discovery with Slot-Centric Reference
Frames
- URL: http://arxiv.org/abs/2302.04973v2
- Date: Fri, 21 Jul 2023 01:40:31 GMT
- Title: Invariant Slot Attention: Object Discovery with Slot-Centric Reference
Frames
- Authors: Ondrej Biza, Sjoerd van Steenkiste, Mehdi S. M. Sajjadi, Gamaleldin F.
Elsayed, Aravindh Mahendran and Thomas Kipf
- Abstract summary: Slot-based neural networks that learn about objects in a self-supervised manner have made exciting progress.
We present a simple yet highly effective method for incorporating spatial symmetries via slot-centric reference frames.
We evaluate our method on a range of synthetic object discovery benchmarks namely CLEVR, Tetrominoes, CLEVR, Objects Room and MultiShapeNet.
- Score: 18.84636947819183
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatically discovering composable abstractions from raw perceptual data is
a long-standing challenge in machine learning. Recent slot-based neural
networks that learn about objects in a self-supervised manner have made
exciting progress in this direction. However, they typically fall short at
adequately capturing spatial symmetries present in the visual world, which
leads to sample inefficiency, such as when entangling object appearance and
pose. In this paper, we present a simple yet highly effective method for
incorporating spatial symmetries via slot-centric reference frames. We
incorporate equivariance to per-object pose transformations into the attention
and generation mechanism of Slot Attention by translating, scaling, and
rotating position encodings. These changes result in little computational
overhead, are easy to implement, and can result in large gains in terms of data
efficiency and overall improvements to object discovery. We evaluate our method
on a wide range of synthetic object discovery benchmarks namely CLEVR,
Tetrominoes, CLEVRTex, Objects Room and MultiShapeNet, and show promising
improvements on the challenging real-world Waymo Open dataset.
Related papers
- Adaptive Masking Enhances Visual Grounding [12.793586888511978]
We propose IMAGE, Interpretative MAsking with Gaussian radiation modEling, to enhance vocabulary grounding in low-shot learning scenarios.
We evaluate the efficacy of our approach on benchmark datasets, including COCO and ODinW, demonstrating its superior performance in zero-shot and few-shot tasks.
arXiv Detail & Related papers (2024-10-04T05:48:02Z) - Localized Gaussians as Self-Attention Weights for Point Clouds Correspondence [92.07601770031236]
We investigate semantically meaningful patterns in the attention heads of an encoder-only Transformer architecture.
We find that fixing the attention weights not only accelerates the training process but also enhances the stability of the optimization.
arXiv Detail & Related papers (2024-09-20T07:41:47Z) - Zero-Shot Object-Centric Representation Learning [72.43369950684057]
We study current object-centric methods through the lens of zero-shot generalization.
We introduce a benchmark comprising eight different synthetic and real-world datasets.
We find that training on diverse real-world images improves transferability to unseen scenarios.
arXiv Detail & Related papers (2024-08-17T10:37:07Z) - Mesh Denoising Transformer [104.5404564075393]
Mesh denoising is aimed at removing noise from input meshes while preserving their feature structures.
SurfaceFormer is a pioneering Transformer-based mesh denoising framework.
New representation known as Local Surface Descriptor captures local geometric intricacies.
Denoising Transformer module receives the multimodal information and achieves efficient global feature aggregation.
arXiv Detail & Related papers (2024-05-10T15:27:43Z) - SOAR: Advancements in Small Body Object Detection for Aerial Imagery Using State Space Models and Programmable Gradients [0.8873228457453465]
Small object detection in aerial imagery presents significant challenges in computer vision.
Traditional methods using transformer-based models often face limitations stemming from the lack of specialized databases.
This paper introduces two innovative approaches that significantly enhance detection and segmentation capabilities for small aerial objects.
arXiv Detail & Related papers (2024-05-02T19:47:08Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Rotating Features for Object Discovery [74.1465486264609]
We present Rotating Features, a generalization of complex-valued features to higher dimensions, and a new evaluation procedure for extracting objects from distributed representations.
Together, these advancements enable us to scale distributed object-centric representations from simple toy to real-world data.
arXiv Detail & Related papers (2023-06-01T12:16:26Z) - Persistent Homology Meets Object Unity: Object Recognition in Clutter [2.356908851188234]
Recognition of occluded objects in unseen and unstructured indoor environments is a challenging problem for mobile robots.
We propose a new descriptor, TOPS, for point clouds generated from depth images and an accompanying recognition framework, THOR, inspired by human reasoning.
THOR outperforms state-of-the-art methods on both the datasets and achieves substantially higher recognition accuracy for all the scenarios of the UW-IS Occluded dataset.
arXiv Detail & Related papers (2023-05-05T19:42:39Z) - Complex-Valued Autoencoders for Object Discovery [62.26260974933819]
We propose a distributed approach to object-centric representations: the Complex AutoEncoder.
We show that this simple and efficient approach achieves better reconstruction performance than an equivalent real-valued autoencoder on simple multi-object datasets.
We also show that it achieves competitive unsupervised object discovery performance to a SlotAttention model on two datasets, and manages to disentangle objects in a third dataset where SlotAttention fails - all while being 7-70 times faster to train.
arXiv Detail & Related papers (2022-04-05T09:25:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.