Fourier-based Video Prediction through Relational Object Motion
- URL: http://arxiv.org/abs/2110.05881v1
- Date: Tue, 12 Oct 2021 10:43:05 GMT
- Title: Fourier-based Video Prediction through Relational Object Motion
- Authors: Malte Mosbach, Sven Behnke
- Abstract summary: Deep recurrent architectures have been applied to the task of video prediction.
Here, we explore a different approach by using frequency-domain approaches for video prediction.
The resulting predictions are consistent with the observed dynamics in a scene and do not suffer from blur.
- Score: 28.502280038100167
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ability to predict future outcomes conditioned on observed video frames
is crucial for intelligent decision-making in autonomous systems. Recently,
deep recurrent architectures have been applied to the task of video prediction.
However, this often results in blurry predictions and requires tedious training
on large datasets. Here, we explore a different approach by (1) using
frequency-domain approaches for video prediction and (2) explicitly inferring
object-motion relationships in the observed scene. The resulting predictions
are consistent with the observed dynamics in a scene and do not suffer from
blur.
Related papers
- Predicting Long-horizon Futures by Conditioning on Geometry and Time [49.86180975196375]
We explore the task of generating future sensor observations conditioned on the past.
We leverage the large-scale pretraining of image diffusion models which can handle multi-modality.
We create a benchmark for video prediction on a diverse set of videos spanning indoor and outdoor scenes.
arXiv Detail & Related papers (2024-04-17T16:56:31Z) - Object-Centric Video Prediction via Decoupling of Object Dynamics and
Interactions [27.112210225969733]
We propose a novel framework for the task of object-centric video prediction, i.e., extracting the structure of a video sequence, as well as modeling objects dynamics and interactions from visual observations.
With the goal of learning meaningful object representations, we propose two object-centric video predictor (OCVP) transformer modules, which de-couple processing of temporal dynamics and object interactions.
In our experiments, we show how our object-centric prediction framework utilizing our OCVP predictors outperforms object-agnostic video prediction models on two different datasets.
arXiv Detail & Related papers (2023-02-23T08:29:26Z) - PREF: Predictability Regularized Neural Motion Fields [68.60019434498703]
Knowing 3D motions in a dynamic scene is essential to many vision applications.
We leverage a neural motion field for estimating the motion of all points in a multiview setting.
We propose to regularize the estimated motion to be predictable.
arXiv Detail & Related papers (2022-09-21T22:32:37Z) - Video Prediction at Multiple Scales with Hierarchical Recurrent Networks [24.536256844130996]
We propose a novel video prediction model able to forecast future possible outcomes of different levels of granularity simultaneously.
By combining spatial and temporal downsampling, MSPred is able to efficiently predict abstract representations over long time horizons.
In our experiments, we demonstrate that our proposed model accurately predicts future video frames as well as other representations on various scenarios.
arXiv Detail & Related papers (2022-03-17T13:08:28Z) - Investigating Pose Representations and Motion Contexts Modeling for 3D
Motion Prediction [63.62263239934777]
We conduct an indepth study on various pose representations with a focus on their effects on the motion prediction task.
We propose a novel RNN architecture termed AHMR (Attentive Hierarchical Motion Recurrent network) for motion prediction.
Our approach outperforms the state-of-the-art methods in short-term prediction and achieves much enhanced long-term prediction proficiency.
arXiv Detail & Related papers (2021-12-30T10:45:22Z) - Semantic Prediction: Which One Should Come First, Recognition or
Prediction? [21.466783934830925]
One of the primary downstream tasks is interpreting the scene's semantic composition and using it for decision-making.
There are two main ways to achieve the same outcome, given a pre-trained video prediction and pre-trained semantic extraction model.
We investigate these configurations using the Local Frequency Domain Transformer Network (LFDTN) as the video prediction model and U-Net as the semantic extraction model on synthetic and real datasets.
arXiv Detail & Related papers (2021-10-06T15:01:05Z) - Learning Semantic-Aware Dynamics for Video Prediction [68.04359321855702]
We propose an architecture and training scheme to predict video frames by explicitly modeling dis-occlusions.
The appearance of the scene is warped from past frames using the predicted motion in co-visible regions.
arXiv Detail & Related papers (2021-04-20T05:00:24Z) - Panoptic Segmentation Forecasting [71.75275164959953]
Our goal is to forecast the near future given a set of recent observations.
We think this ability to forecast, i.e., to anticipate, is integral for the success of autonomous agents.
We develop a two-component model: one component learns the dynamics of the background stuff by anticipating odometry, the other one anticipates the dynamics of detected things.
arXiv Detail & Related papers (2021-04-08T17:59:16Z) - Motion Segmentation using Frequency Domain Transformer Networks [29.998917158604694]
We propose a novel end-to-end learnable architecture that predicts the next frame by modeling foreground and background separately.
Our approach can outperform some widely used video prediction methods like Video Ladder Network and Predictive Gated Pyramids on synthetic data.
arXiv Detail & Related papers (2020-04-18T15:05:11Z) - Dynamic Inference: A New Approach Toward Efficient Video Action
Recognition [69.9658249941149]
Action recognition in videos has achieved great success recently, but it remains a challenging task due to the massive computational cost.
We propose a general dynamic inference idea to improve inference efficiency by leveraging the variation in the distinguishability of different videos.
arXiv Detail & Related papers (2020-02-09T11:09:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.