Contrastive Training of Complex-Valued Autoencoders for Object Discovery
- URL: http://arxiv.org/abs/2305.15001v3
- Date: Thu, 9 Nov 2023 13:48:26 GMT
- Title: Contrastive Training of Complex-Valued Autoencoders for Object Discovery
- Authors: Aleksandar Stani\'c, Anand Gopalakrishnan, Kazuki Irie, J\"urgen
Schmidhuber
- Abstract summary: We introduce architectural modifications and a novel contrastive learning method that greatly improve the state-of-the-art synchrony-based model.
For the first time, we obtain a class of synchrony-based models capable of discovering objects in an unsupervised manner in multi-object color datasets.
- Score: 55.280789409319716
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current state-of-the-art object-centric models use slots and attention-based
routing for binding. However, this class of models has several conceptual
limitations: the number of slots is hardwired; all slots have equal capacity;
training has high computational cost; there are no object-level relational
factors within slots. Synchrony-based models in principle can address these
limitations by using complex-valued activations which store binding information
in their phase components. However, working examples of such synchrony-based
models have been developed only very recently, and are still limited to toy
grayscale datasets and simultaneous storage of less than three objects in
practice. Here we introduce architectural modifications and a novel contrastive
learning method that greatly improve the state-of-the-art synchrony-based
model. For the first time, we obtain a class of synchrony-based models capable
of discovering objects in an unsupervised manner in multi-object color datasets
and simultaneously representing more than three objects.
Related papers
- CerberusDet: Unified Multi-Dataset Object Detection [0.0]
CerberusDet is a framework with a multi-headed model designed for handling multiple object detection tasks.
Proposed model is built on the YOLO architecture and efficiently shares visual features from both backbone and neck components.
CerberusDet achieved state-of-the-art results with 36% less inference time.
arXiv Detail & Related papers (2024-07-17T15:00:35Z) - Recurrent Complex-Weighted Autoencoders for Unsupervised Object Discovery [62.43562856605473]
We argue for the computational advantages of a recurrent architecture with complex-valued weights.
We propose a fully convolutional autoencoder, SynCx, that performs iterative constraint satisfaction.
arXiv Detail & Related papers (2024-05-27T15:47:03Z) - Slot Structured World Models [0.0]
State-of-the-art approaches use a feedforward encoder to extract object embeddings and a latent graph neural network to model the interaction between these object embeddings.
We introduce it Slot Structured World Models (SSWM), a class of world models that combines an it object-centric encoder with a latent graph-based dynamics model.
arXiv Detail & Related papers (2024-01-08T21:19:30Z) - ROAM: Robust and Object-Aware Motion Generation Using Neural Pose
Descriptors [73.26004792375556]
This paper shows that robustness and generalisation to novel scene objects in 3D object-aware character synthesis can be achieved by training a motion model with as few as one reference object.
We leverage an implicit feature representation trained on object-only datasets, which encodes an SE(3)-equivariant descriptor field around the object.
We demonstrate substantial improvements in 3D virtual character motion and interaction quality and robustness to scenarios with unseen objects.
arXiv Detail & Related papers (2023-08-24T17:59:51Z) - Multi-Task Learning of Object State Changes from Uncurated Videos [55.60442251060871]
We learn to temporally localize object state changes by observing people interacting with objects in long uncurated web videos.
We show that our multi-task model achieves a relative improvement of 40% over the prior single-task methods.
We also test our method on long egocentric videos of the EPIC-KITCHENS and the Ego4D datasets in a zero-shot setup.
arXiv Detail & Related papers (2022-11-24T09:42:46Z) - Complex-Valued Autoencoders for Object Discovery [62.26260974933819]
We propose a distributed approach to object-centric representations: the Complex AutoEncoder.
We show that this simple and efficient approach achieves better reconstruction performance than an equivalent real-valued autoencoder on simple multi-object datasets.
We also show that it achieves competitive unsupervised object discovery performance to a SlotAttention model on two datasets, and manages to disentangle objects in a third dataset where SlotAttention fails - all while being 7-70 times faster to train.
arXiv Detail & Related papers (2022-04-05T09:25:28Z) - Federated Action Recognition on Heterogeneous Embedded Devices [16.88104153104136]
In this work, we enable clients with limited computing power to perform action recognition, a computationally heavy task.
We first perform model compression at the central server through knowledge distillation on a large dataset.
The fine-tuning is required because limited data present in smaller datasets is not adequate for action recognition models to learn complextemporal features.
arXiv Detail & Related papers (2021-07-18T02:33:24Z) - Mutual Modality Learning for Video Action Classification [74.83718206963579]
We show how to embed multi-modality into a single model for video action classification.
We achieve state-of-the-art results in the Something-Something-v2 benchmark.
arXiv Detail & Related papers (2020-11-04T21:20:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.