A polar prediction model for learning to represent visual
transformations
- URL: http://arxiv.org/abs/2303.03432v2
- Date: Tue, 31 Oct 2023 01:06:44 GMT
- Title: A polar prediction model for learning to represent visual
transformations
- Authors: Pierre-\'Etienne H. Fiquet, Eero P. Simoncelli
- Abstract summary: We propose a self-supervised representation-learning framework that exploits the regularities of natural videos to compute accurate predictions.
When trained on natural video datasets, our framework achieves better prediction performance than traditional motion compensation.
Our framework offers a principled framework for understanding how the visual system represents sensory inputs in a form that simplifies temporal prediction.
- Score: 10.857320773825357
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: All organisms make temporal predictions, and their evolutionary fitness level
depends on the accuracy of these predictions. In the context of visual
perception, the motions of both the observer and objects in the scene structure
the dynamics of sensory signals, allowing for partial prediction of future
signals based on past ones. Here, we propose a self-supervised
representation-learning framework that extracts and exploits the regularities
of natural videos to compute accurate predictions. We motivate the polar
architecture by appealing to the Fourier shift theorem and its group-theoretic
generalization, and we optimize its parameters on next-frame prediction.
Through controlled experiments, we demonstrate that this approach can discover
the representation of simple transformation groups acting in data. When trained
on natural video datasets, our framework achieves better prediction performance
than traditional motion compensation and rivals conventional deep networks,
while maintaining interpretability and speed. Furthermore, the polar
computations can be restructured into components resembling normalized simple
and direction-selective complex cell models of primate V1 neurons. Thus, polar
prediction offers a principled framework for understanding how the visual
system represents sensory inputs in a form that simplifies temporal prediction.
Related papers
- Learning predictable and robust neural representations by straightening image sequences [16.504807843249196]
We develop a self-supervised learning (SSL) objective that explicitly quantifies and promotes straightening.
We demonstrate the power of this objective in training deep feedforward neural networks on smoothly-rendered synthetic image sequences.
arXiv Detail & Related papers (2024-11-04T03:58:09Z) - Self-supervised Multi-future Occupancy Forecasting for Autonomous Driving [45.886941596233974]
LiDAR-generated occupancy grid maps (L-OGMs) offer a robust bird's-eye view for the scene representation.
Our proposed framework performs L-OGM prediction in the latent space of a generative architecture.
We decode predictions using either a single-step decoder, which provides high-quality predictions in real-time, or a diffusion-based batch decoder.
arXiv Detail & Related papers (2024-07-30T18:37:59Z) - GaussianPrediction: Dynamic 3D Gaussian Prediction for Motion Extrapolation and Free View Synthesis [71.24791230358065]
We introduce a novel framework that empowers 3D Gaussian representations with dynamic scene modeling and future scenario synthesis.
GaussianPrediction can forecast future states from any viewpoint, using video observations of dynamic scenes.
Our framework shows outstanding performance on both synthetic and real-world datasets, demonstrating its efficacy in predicting and rendering future environments.
arXiv Detail & Related papers (2024-05-30T06:47:55Z) - Towards Generalizable and Interpretable Motion Prediction: A Deep
Variational Bayes Approach [54.429396802848224]
This paper proposes an interpretable generative model for motion prediction with robust generalizability to out-of-distribution cases.
For interpretability, the model achieves the target-driven motion prediction by estimating the spatial distribution of long-term destinations.
Experiments on motion prediction datasets validate that the fitted model can be interpretable and generalizable.
arXiv Detail & Related papers (2024-03-10T04:16:04Z) - Brain-like representational straightening of natural movies in robust
feedforward neural networks [2.8749107965043286]
Representational straightening refers to a decrease in curvature of visual feature representations of a sequence of frames taken from natural movies.
We show robustness to noise in the input image can produce representational straightening in feedforward neural networks.
arXiv Detail & Related papers (2023-08-26T13:04:36Z) - LOPR: Latent Occupancy PRediction using Generative Models [49.15687400958916]
LiDAR generated occupancy grid maps (L-OGMs) offer a robust bird's eye-view scene representation.
We propose a framework that decouples occupancy prediction into: representation learning and prediction within the learned latent space.
arXiv Detail & Related papers (2022-10-03T22:04:00Z) - Hybrid Predictive Coding: Inferring, Fast and Slow [62.997667081978825]
We propose a hybrid predictive coding network that combines both iterative and amortized inference in a principled manner.
We demonstrate that our model is inherently sensitive to its uncertainty and adaptively balances balances to obtain accurate beliefs using minimum computational expense.
arXiv Detail & Related papers (2022-04-05T12:52:45Z) - Wide and Narrow: Video Prediction from Context and Motion [54.21624227408727]
We propose a new framework to integrate these complementary attributes to predict complex pixel dynamics through deep networks.
We present global context propagation networks that aggregate the non-local neighboring representations to preserve the contextual information over the past frames.
We also devise local filter memory networks that generate adaptive filter kernels by storing the motion of moving objects in the memory.
arXiv Detail & Related papers (2021-10-22T04:35:58Z) - Fourier-based Video Prediction through Relational Object Motion [28.502280038100167]
Deep recurrent architectures have been applied to the task of video prediction.
Here, we explore a different approach by using frequency-domain approaches for video prediction.
The resulting predictions are consistent with the observed dynamics in a scene and do not suffer from blur.
arXiv Detail & Related papers (2021-10-12T10:43:05Z) - Local Frequency Domain Transformer Networks for Video Prediction [24.126513851779936]
Video prediction is of interest not only in anticipating visual changes in the real world but has, above all, emerged as an unsupervised learning rule.
This paper proposes a fully differentiable building block that can perform all of those tasks separately while maintaining interpretability.
arXiv Detail & Related papers (2021-05-10T19:48:42Z) - Predicting Temporal Sets with Deep Neural Networks [50.53727580527024]
We propose an integrated solution based on the deep neural networks for temporal sets prediction.
A unique perspective is to learn element relationship by constructing set-level co-occurrence graph.
We design an attention-based module to adaptively learn the temporal dependency of elements and sets.
arXiv Detail & Related papers (2020-06-20T03:29:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.