Continual Learning of Predictive Models in Video Sequences via
Variational Autoencoders
- URL: http://arxiv.org/abs/2006.01945v1
- Date: Tue, 2 Jun 2020 21:17:38 GMT
- Title: Continual Learning of Predictive Models in Video Sequences via
Variational Autoencoders
- Authors: Damian Campo, Giulia Slavic, Mohamad Baydoun, Lucio Marcenaro, Carlo
Regazzoni
- Abstract summary: This paper proposes a method for performing continual learning of predictive models that facilitate the inference of future frames in video sequences.
An initial Variational Autoencoder, together with a set of fully connected neural networks are utilized to respectively learn the appearance of video frames and their dynamics at the latent space level.
- Score: 6.698751933050415
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes a method for performing continual learning of predictive
models that facilitate the inference of future frames in video sequences. For a
first given experience, an initial Variational Autoencoder, together with a set
of fully connected neural networks are utilized to respectively learn the
appearance of video frames and their dynamics at the latent space level. By
employing an adapted Markov Jump Particle Filter, the proposed method
recognizes new situations and integrates them as predictive models avoiding
catastrophic forgetting of previously learned tasks. For evaluating the
proposed method, this article uses video sequences from a vehicle that performs
different tasks in a controlled environment.
Related papers
- Video In-context Learning [46.40277880351059]
In this paper, we study video in-context learning, where the model starts from an existing video clip and generates diverse potential future sequences.
To achieve this, we provide a clear definition of the task, and train an autoregressive Transformer on video datasets.
We design various evaluation metrics, including both objective and subjective measures, to demonstrate the visual quality and semantic accuracy of generation results.
arXiv Detail & Related papers (2024-07-10T04:27:06Z) - Zero-Shot Video Semantic Segmentation based on Pre-Trained Diffusion Models [96.97910688908956]
We introduce the first zero-shot approach for Video Semantic (VSS) based on pre-trained diffusion models.
We propose a framework tailored for VSS based on pre-trained image and video diffusion models.
Experiments show that our proposed approach outperforms existing zero-shot image semantic segmentation approaches.
arXiv Detail & Related papers (2024-05-27T08:39:38Z) - Predicting Long-horizon Futures by Conditioning on Geometry and Time [49.86180975196375]
We explore the task of generating future sensor observations conditioned on the past.
We leverage the large-scale pretraining of image diffusion models which can handle multi-modality.
We create a benchmark for video prediction on a diverse set of videos spanning indoor and outdoor scenes.
arXiv Detail & Related papers (2024-04-17T16:56:31Z) - Learning Knowledge-Rich Sequential Model for Planar Homography
Estimation in Aerial Video [12.853493070295457]
We develop a sequential estimator that processes a sequence of video frames and estimates their pairwise planar homographic transformations in batches.
We also incorporate a set of spatial-temporal knowledge to regularize the learning of such a sequence-to-sequence model.
Empirical studies suggest that our sequential model achieves significant improvement over alternative image-based methods.
arXiv Detail & Related papers (2023-04-05T19:28:58Z) - IDM-Follower: A Model-Informed Deep Learning Method for Long-Sequence
Car-Following Trajectory Prediction [24.94160059351764]
Most car-following models are generative and only consider the inputs of the speed, position, and acceleration of the last time step.
We implement a novel structure with two independent encoders and a self-attention decoder that could sequentially predict the following trajectories.
Numerical experiments with multiple settings on simulation and NGSIM datasets show that the IDM-Follower can improve the prediction performance.
arXiv Detail & Related papers (2022-10-20T02:24:27Z) - Revisiting Classifier: Transferring Vision-Language Models for Video
Recognition [102.93524173258487]
Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is an important topic in computer vision research.
In this study, we focus on transferring knowledge for video classification tasks.
We utilize the well-pretrained language model to generate good semantic target for efficient transferring learning.
arXiv Detail & Related papers (2022-07-04T10:00:47Z) - Insights from Generative Modeling for Neural Video Compression [31.59496634465347]
We present newly proposed neural video coding algorithms through the lens of deep autoregressive and latent variable modeling.
We propose several architectures that yield state-of-the-art video compression performance on high-resolution video.
We provide further evidence that the generative modeling viewpoint can advance the neural video coding field.
arXiv Detail & Related papers (2021-07-28T02:19:39Z) - CCVS: Context-aware Controllable Video Synthesis [95.22008742695772]
presentation introduces a self-supervised learning approach to the synthesis of new video clips from old ones.
It conditions the synthesis process on contextual information for temporal continuity and ancillary information for fine control.
arXiv Detail & Related papers (2021-07-16T17:57:44Z) - Unsupervised Learning of Video Representations via Dense Trajectory
Clustering [86.45054867170795]
This paper addresses the task of unsupervised learning of representations for action recognition in videos.
We first propose to adapt two top performing objectives in this class - instance recognition and local aggregation.
We observe promising performance, but qualitative analysis shows that the learned representations fail to capture motion patterns.
arXiv Detail & Related papers (2020-06-28T22:23:03Z) - Non-Adversarial Video Synthesis with Learned Priors [53.26777815740381]
We focus on the problem of generating videos from latent noise vectors, without any reference input frames.
We develop a novel approach that jointly optimize the input latent space, the weights of a recurrent neural network and a generator through non-adversarial learning.
Our approach generates superior quality videos compared to the existing state-of-the-art methods.
arXiv Detail & Related papers (2020-03-21T02:57:33Z) - Anomaly Detection in Video Data Based on Probabilistic Latent Space
Models [7.269230232703388]
A Variational Autoencoder (VAE) is used for reducing the dimensionality of video frames.
An Adapted Markov Jump Particle Filter defined by discrete and continuous inference levels is employed to predict the following frames.
Our method is evaluated on different video scenarios where a semi-autonomous vehicle performs a set of tasks in a closed environment.
arXiv Detail & Related papers (2020-03-17T10:32:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.