Related papers: Complex-Valued Autoencoders for Object Discovery

Complex-Valued Autoencoders for Object Discovery

URL: http://arxiv.org/abs/2204.02075v1
Date: Tue, 5 Apr 2022 09:25:28 GMT
Title: Complex-Valued Autoencoders for Object Discovery
Authors: Sindy L\"owe, Phillip Lippe, Maja Rudolph, Max Welling
Abstract summary: We propose a distributed approach to object-centric representations: the Complex AutoEncoder. We show that this simple and efficient approach achieves better reconstruction performance than an equivalent real-valued autoencoder on simple multi-object datasets. We also show that it achieves competitive unsupervised object discovery performance to a SlotAttention model on two datasets, and manages to disentangle objects in a third dataset where SlotAttention fails - all while being 7-70 times faster to train.
Score: 62.26260974933819
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Object-centric representations form the basis of human perception and enable us to reason about the world and to systematically generalize to new settings. Currently, most machine learning work on unsupervised object discovery focuses on slot-based approaches, which explicitly separate the latent representations of individual objects. While the result is easily interpretable, it usually requires the design of involved architectures. In contrast to this, we propose a distributed approach to object-centric representations: the Complex AutoEncoder. Following a coding scheme theorized to underlie object representations in biological neurons, its complex-valued activations represent two messages: their magnitudes express the presence of a feature, while the relative phase differences between neurons express which features should be bound together to create joint object representations. We show that this simple and efficient approach achieves better reconstruction performance than an equivalent real-valued autoencoder on simple multi-object datasets. Additionally, we show that it achieves competitive unsupervised object discovery performance to a SlotAttention model on two datasets, and manages to disentangle objects in a third dataset where SlotAttention fails - all while being 7-70 times faster to train.

Related papers

Object-Aware DINO (Oh-A-Dino): Enhancing Self-Supervised Representations for Multi-Object Instance Retrieval [1.4272411349249627]
Self-supervised vision models like DINO have shown emergent object understanding. DINO representations excel at capturing global object attributes but struggle with object-level details like colour. We propose a method that combines global and local features by augmenting DINO representations with object-centric latent vectors.
arXiv Detail & Related papers (2025-03-12T21:57:41Z)
DistFormer: Enhancing Local and Global Features for Monocular Per-Object Distance Estimation [35.6022448037063]
Per-object distance estimation is crucial in safety-critical applications such as autonomous driving, surveillance, and robotics. Existing approaches rely on two scales: local information (i.e., the bounding box proportions) or global information. Our work aims to strengthen both local and global cues.
arXiv Detail & Related papers (2024-01-06T10:56:36Z)
Object-Centric Multiple Object Tracking [124.30650395969126]
This paper proposes a video object-centric model for multiple-object tracking pipelines. It consists of an index-merge module that adapts the object-centric slots into detection outputs and an object memory module. Benefited from object-centric learning, we only require sparse detection labels for object localization and feature binding.
arXiv Detail & Related papers (2023-09-01T03:34:12Z)
Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner. We design a semantic-guided self-supervised learning model to extract high-level semantic features from images. We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z)
Rotating Features for Object Discovery [74.1465486264609]
We present Rotating Features, a generalization of complex-valued features to higher dimensions, and a new evaluation procedure for extracting objects from distributed representations. Together, these advancements enable us to scale distributed object-centric representations from simple toy to real-world data.
arXiv Detail & Related papers (2023-06-01T12:16:26Z)
Neural Constraint Satisfaction: Hierarchical Abstraction for Combinatorial Generalization in Object Rearrangement [75.9289887536165]
We present a hierarchical abstraction approach to uncover underlying entities. We show how to learn a correspondence between intervening on states of entities in the agent's model and acting on objects in the environment. We use this correspondence to develop a method for control that generalizes to different numbers and configurations of objects.
arXiv Detail & Related papers (2023-03-20T18:19:36Z)
Invariant Slot Attention: Object Discovery with Slot-Centric Reference Frames [18.84636947819183]
Slot-based neural networks that learn about objects in a self-supervised manner have made exciting progress. We present a simple yet highly effective method for incorporating spatial symmetries via slot-centric reference frames. We evaluate our method on a range of synthetic object discovery benchmarks namely CLEVR, Tetrominoes, CLEVR, Objects Room and MultiShapeNet.
arXiv Detail & Related papers (2023-02-09T23:25:28Z)
A Deep Learning Approach to Object Affordance Segmentation [31.221897360610114]
We design an autoencoder that infers pixel-wise affordance labels in both videos and static images. Our model surpasses the need for object labels and bounding boxes by using a soft-attention mechanism. We show that our model achieves competitive results compared to strongly supervised methods on SOR3D-AFF.
arXiv Detail & Related papers (2020-04-18T15:34:41Z)
Look-into-Object: Self-supervised Structure Modeling for Object Recognition [71.68524003173219]
We propose to "look into object" (explicitly yet intrinsically model the object structure) through incorporating self-supervisions. We show the recognition backbone can be substantially enhanced for more robust representation learning. Our approach achieves large performance gain on a number of benchmarks, including generic object recognition (ImageNet) and fine-grained object recognition tasks (CUB, Cars, Aircraft)
arXiv Detail & Related papers (2020-03-31T12:22:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.