Unsupervised Object-Based Transition Models for 3D Partially Observable
Environments
- URL: http://arxiv.org/abs/2103.04693v1
- Date: Mon, 8 Mar 2021 12:10:02 GMT
- Title: Unsupervised Object-Based Transition Models for 3D Partially Observable
Environments
- Authors: Antonia Creswell, Rishabh Kabra, Chris Burgess, Murray Shanahan
- Abstract summary: The model is trained end-to-end without supervision using losses at the level of the object-structured representation rather than pixels.
We show that the combination of an object-level loss and correct object alignment over time enables the model to outperform a state-of-the-art baseline.
- Score: 13.598250346370467
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a slot-wise, object-based transition model that decomposes a scene
into objects, aligns them (with respect to a slot-wise object memory) to
maintain a consistent order across time, and predicts how those objects evolve
over successive frames. The model is trained end-to-end without supervision
using losses at the level of the object-structured representation rather than
pixels. Thanks to its alignment module, the model deals properly with two
issues that are not handled satisfactorily by other transition models, namely
object persistence and object identity. We show that the combination of an
object-level loss and correct object alignment over time enables the model to
outperform a state-of-the-art baseline, and allows it to deal well with object
occlusion and re-appearance in partially observable environments.
Related papers
- Object-Centric Multiple Object Tracking [124.30650395969126]
This paper proposes a video object-centric model for multiple-object tracking pipelines.
It consists of an index-merge module that adapts the object-centric slots into detection outputs and an object memory module.
Benefited from object-centric learning, we only require sparse detection labels for object localization and feature binding.
arXiv Detail & Related papers (2023-09-01T03:34:12Z) - ROAM: Robust and Object-Aware Motion Generation Using Neural Pose
Descriptors [73.26004792375556]
This paper shows that robustness and generalisation to novel scene objects in 3D object-aware character synthesis can be achieved by training a motion model with as few as one reference object.
We leverage an implicit feature representation trained on object-only datasets, which encodes an SE(3)-equivariant descriptor field around the object.
We demonstrate substantial improvements in 3D virtual character motion and interaction quality and robustness to scenarios with unseen objects.
arXiv Detail & Related papers (2023-08-24T17:59:51Z) - Boosting Object Representation Learning via Motion and Object Continuity [22.512380611375846]
We propose to exploit object motion and continuity, i.e., objects do not pop in and out of existence.
The resulting Motion and Object Continuity scheme can be instantiated using any baseline object detection model.
Our results show large improvements in the performances of a SOTA model in terms of object discovery, convergence speed and overall latent object representations.
arXiv Detail & Related papers (2022-11-16T09:36:41Z) - SOS! Self-supervised Learning Over Sets Of Handled Objects In Egocentric
Action Recognition [35.4163266882568]
We introduce Self-Supervised Learning Over Sets (SOS) to pre-train a generic Objects In Contact (OIC) representation model.
Our OIC significantly boosts the performance of multiple state-of-the-art video classification models.
arXiv Detail & Related papers (2022-04-10T23:27:19Z) - Complex-Valued Autoencoders for Object Discovery [62.26260974933819]
We propose a distributed approach to object-centric representations: the Complex AutoEncoder.
We show that this simple and efficient approach achieves better reconstruction performance than an equivalent real-valued autoencoder on simple multi-object datasets.
We also show that it achieves competitive unsupervised object discovery performance to a SlotAttention model on two datasets, and manages to disentangle objects in a third dataset where SlotAttention fails - all while being 7-70 times faster to train.
arXiv Detail & Related papers (2022-04-05T09:25:28Z) - SIMONe: View-Invariant, Temporally-Abstracted Object Representations via
Unsupervised Video Decomposition [69.90530987240899]
We present an unsupervised variational approach to this problem.
Our model learns to infer two sets of latent representations from RGB video input alone.
It represents object attributes in an allocentric manner which does not depend on viewpoint.
arXiv Detail & Related papers (2021-06-07T17:59:23Z) - Leveraging Photometric Consistency over Time for Sparsely Supervised
Hand-Object Reconstruction [118.21363599332493]
We present a method to leverage photometric consistency across time when annotations are only available for a sparse subset of frames in a video.
Our model is trained end-to-end on color images to jointly reconstruct hands and objects in 3D by inferring their poses.
We achieve state-of-the-art results on 3D hand-object reconstruction benchmarks and demonstrate that our approach allows us to improve the pose estimation accuracy.
arXiv Detail & Related papers (2020-04-28T12:03:14Z) - Object-Centric Image Generation from Layouts [93.10217725729468]
We develop a layout-to-image-generation method to generate complex scenes with multiple objects.
Our method learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity.
We introduce SceneFID, an object-centric adaptation of the popular Fr'echet Inception Distance metric, that is better suited for multi-object images.
arXiv Detail & Related papers (2020-03-16T21:40:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.