Continual Learning of Conjugated Visual Representations through Higher-order Motion Flows
- URL: http://arxiv.org/abs/2409.11441v1
- Date: Mon, 16 Sep 2024 19:08:32 GMT
- Title: Continual Learning of Conjugated Visual Representations through Higher-order Motion Flows
- Authors: Simone Marullo, Matteo Tiezzi, Marco Gori, Stefano Melacci,
- Abstract summary: Learning with neural networks presents several challenges due to the non-i.i.d. nature of the data.
It also offers novel opportunities to develop representations that are consistent with the information flow.
In this paper we investigate the case of unsupervised continual learning of pixel-wise features subject to multiple motion-induced constraints.
- Score: 21.17248975377718
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning with neural networks from a continuous stream of visual information presents several challenges due to the non-i.i.d. nature of the data. However, it also offers novel opportunities to develop representations that are consistent with the information flow. In this paper we investigate the case of unsupervised continual learning of pixel-wise features subject to multiple motion-induced constraints, therefore named motion-conjugated feature representations. Differently from existing approaches, motion is not a given signal (either ground-truth or estimated by external modules), but is the outcome of a progressive and autonomous learning process, occurring at various levels of the feature hierarchy. Multiple motion flows are estimated with neural networks and characterized by different levels of abstractions, spanning from traditional optical flow to other latent signals originating from higher-level features, hence called higher-order motions. Continuously learning to develop consistent multi-order flows and representations is prone to trivial solutions, which we counteract by introducing a self-supervised contrastive loss, spatially-aware and based on flow-induced similarity. We assess our model on photorealistic synthetic streams and real-world videos, comparing to pre-trained state-of-the art feature extractors (also based on Transformers) and to recent unsupervised learning models, significantly outperforming these alternatives.
Related papers
- Decomposed Linear Dynamical Systems (dLDS) for learning the latent
components of neural dynamics [6.829711787905569]
We propose a new decomposed dynamical system model that represents complex non-stationary and nonlinear dynamics of time series data.
Our model is trained through a dictionary learning procedure, where we leverage recent results in tracking sparse vectors over time.
In both continuous-time and discrete-time instructional examples we demonstrate that our model can well approximate the original system.
arXiv Detail & Related papers (2022-06-07T02:25:38Z) - Stochastic Coherence Over Attention Trajectory For Continuous Learning
In Video Streams [64.82800502603138]
This paper proposes a novel neural-network-based approach to progressively and autonomously develop pixel-wise representations in a video stream.
The proposed method is based on a human-like attention mechanism that allows the agent to learn by observing what is moving in the attended locations.
Our experiments leverage 3D virtual environments and they show that the proposed agents can learn to distinguish objects just by observing the video stream.
arXiv Detail & Related papers (2022-04-26T09:52:31Z) - Visual Adversarial Imitation Learning using Variational Models [60.69745540036375]
Reward function specification remains a major impediment for learning behaviors through deep reinforcement learning.
Visual demonstrations of desired behaviors often presents an easier and more natural way to teach agents.
We develop a variational model-based adversarial imitation learning algorithm.
arXiv Detail & Related papers (2021-07-16T00:15:18Z) - Causal Navigation by Continuous-time Neural Networks [108.84958284162857]
We propose a theoretical and experimental framework for learning causal representations using continuous-time neural networks.
We evaluate our method in the context of visual-control learning of drones over a series of complex tasks.
arXiv Detail & Related papers (2021-06-15T17:45:32Z) - A Broad Study on the Transferability of Visual Representations with
Contrastive Learning [15.667240680328922]
We study the transferability of learned representations of contrastive approaches for linear evaluation, full-network transfer, and few-shot recognition.
The results show that the contrastive approaches learn representations that are easily transferable to a different downstream task.
Our analysis reveals that the representations learned from the contrastive approaches contain more low/mid-level semantics than cross-entropy models.
arXiv Detail & Related papers (2021-03-24T22:55:04Z) - Social NCE: Contrastive Learning of Socially-aware Motion
Representations [87.82126838588279]
Experimental results show that the proposed method dramatically reduces the collision rates of recent trajectory forecasting, behavioral cloning and reinforcement learning algorithms.
Our method makes few assumptions about neural architecture designs, and hence can be used as a generic way to promote the robustness of neural motion models.
arXiv Detail & Related papers (2020-12-21T22:25:06Z) - Self-Supervised Learning of Non-Rigid Residual Flow and Ego-Motion [63.18340058854517]
We present an alternative method for end-to-end scene flow learning by joint estimation of non-rigid residual flow and ego-motion flow for dynamic 3D scenes.
We extend the supervised framework with self-supervisory signals based on the temporal consistency property of a point cloud sequence.
arXiv Detail & Related papers (2020-09-22T11:39:19Z) - Hierarchical Contrastive Motion Learning for Video Action Recognition [100.9807616796383]
We present hierarchical contrastive motion learning, a new self-supervised learning framework to extract effective motion representations from raw video frames.
Our approach progressively learns a hierarchy of motion features that correspond to different abstraction levels in a network.
Our motion learning module is lightweight and flexible to be embedded into various backbone networks.
arXiv Detail & Related papers (2020-07-20T17:59:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.