Boosting Object Representation Learning via Motion and Object Continuity
- URL: http://arxiv.org/abs/2211.09771v3
- Date: Wed, 21 Feb 2024 14:36:17 GMT
- Title: Boosting Object Representation Learning via Motion and Object Continuity
- Authors: Quentin Delfosse, Wolfgang Stammer, Thomas Rothenbacher, Dwarak
Vittal, Kristian Kersting
- Abstract summary: We propose to exploit object motion and continuity, i.e., objects do not pop in and out of existence.
The resulting Motion and Object Continuity scheme can be instantiated using any baseline object detection model.
Our results show large improvements in the performances of a SOTA model in terms of object discovery, convergence speed and overall latent object representations.
- Score: 22.512380611375846
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent unsupervised multi-object detection models have shown impressive
performance improvements, largely attributed to novel architectural inductive
biases. Unfortunately, they may produce suboptimal object encodings for
downstream tasks. To overcome this, we propose to exploit object motion and
continuity, i.e., objects do not pop in and out of existence. This is
accomplished through two mechanisms: (i) providing priors on the location of
objects through integration of optical flow, and (ii) a contrastive object
continuity loss across consecutive image frames. Rather than developing an
explicit deep architecture, the resulting Motion and Object Continuity (MOC)
scheme can be instantiated using any baseline object detection model. Our
results show large improvements in the performances of a SOTA model in terms of
object discovery, convergence speed and overall latent object representations,
particularly for playing Atari games. Overall, we show clear benefits of
integrating motion and object continuity for downstream tasks, moving beyond
object representation learning based only on reconstruction.
Related papers
- ROAM: Robust and Object-Aware Motion Generation Using Neural Pose
Descriptors [73.26004792375556]
This paper shows that robustness and generalisation to novel scene objects in 3D object-aware character synthesis can be achieved by training a motion model with as few as one reference object.
We leverage an implicit feature representation trained on object-only datasets, which encodes an SE(3)-equivariant descriptor field around the object.
We demonstrate substantial improvements in 3D virtual character motion and interaction quality and robustness to scenarios with unseen objects.
arXiv Detail & Related papers (2023-08-24T17:59:51Z) - Cycle Consistency Driven Object Discovery [75.60399804639403]
We introduce a method that explicitly optimize the constraint that each object in a scene should be associated with a distinct slot.
By integrating these consistency objectives into various existing slot-based object-centric methods, we showcase substantial improvements in object-discovery performance.
Our results suggest that the proposed approach not only improves object discovery, but also provides richer features for downstream tasks.
arXiv Detail & Related papers (2023-06-03T21:49:06Z) - Tackling Background Distraction in Video Object Segmentation [7.187425003801958]
A video object segmentation (VOS) aims to densely track certain objects in videos.
One of the main challenges in this task is the existence of background distractors that appear similar to the target objects.
We propose three novel strategies to suppress such distractors.
Our model achieves a comparable performance to contemporary state-of-the-art approaches, even with real-time performance.
arXiv Detail & Related papers (2022-07-14T14:25:19Z) - Segmenting Moving Objects via an Object-Centric Layered Representation [100.26138772664811]
We introduce an object-centric segmentation model with a depth-ordered layer representation.
We introduce a scalable pipeline for generating synthetic training data with multiple objects.
We evaluate the model on standard video segmentation benchmarks.
arXiv Detail & Related papers (2022-07-05T17:59:43Z) - SOS! Self-supervised Learning Over Sets Of Handled Objects In Egocentric
Action Recognition [35.4163266882568]
We introduce Self-Supervised Learning Over Sets (SOS) to pre-train a generic Objects In Contact (OIC) representation model.
Our OIC significantly boosts the performance of multiple state-of-the-art video classification models.
arXiv Detail & Related papers (2022-04-10T23:27:19Z) - Complex-Valued Autoencoders for Object Discovery [62.26260974933819]
We propose a distributed approach to object-centric representations: the Complex AutoEncoder.
We show that this simple and efficient approach achieves better reconstruction performance than an equivalent real-valued autoencoder on simple multi-object datasets.
We also show that it achieves competitive unsupervised object discovery performance to a SlotAttention model on two datasets, and manages to disentangle objects in a third dataset where SlotAttention fails - all while being 7-70 times faster to train.
arXiv Detail & Related papers (2022-04-05T09:25:28Z) - AirDOS: Dynamic SLAM benefits from Articulated Objects [9.045690662672659]
Object-aware SLAM (DOS) exploits object-level information to enable robust motion estimation in dynamic environments.
AirDOS is the first dynamic object-aware SLAM system demonstrating that camera pose estimation can be improved by incorporating dynamic articulated objects.
arXiv Detail & Related papers (2021-09-21T01:23:48Z) - Unsupervised Object-Based Transition Models for 3D Partially Observable
Environments [13.598250346370467]
The model is trained end-to-end without supervision using losses at the level of the object-structured representation rather than pixels.
We show that the combination of an object-level loss and correct object alignment over time enables the model to outperform a state-of-the-art baseline.
arXiv Detail & Related papers (2021-03-08T12:10:02Z) - Progressive Self-Guided Loss for Salient Object Detection [102.35488902433896]
We present a progressive self-guided loss function to facilitate deep learning-based salient object detection in images.
Our framework takes advantage of adaptively aggregated multi-scale features to locate and detect salient objects effectively.
arXiv Detail & Related papers (2021-01-07T07:33:38Z) - Object-Centric Image Generation from Layouts [93.10217725729468]
We develop a layout-to-image-generation method to generate complex scenes with multiple objects.
Our method learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity.
We introduce SceneFID, an object-centric adaptation of the popular Fr'echet Inception Distance metric, that is better suited for multi-object images.
arXiv Detail & Related papers (2020-03-16T21:40:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.