Predictive Coding Based Multiscale Network with Encoder-Decoder LSTM for
Video Prediction
- URL: http://arxiv.org/abs/2212.11642v3
- Date: Sun, 8 Oct 2023 15:19:18 GMT
- Title: Predictive Coding Based Multiscale Network with Encoder-Decoder LSTM for
Video Prediction
- Authors: Chaofan Ling, Junpei Zhong and Weihua Li
- Abstract summary: We present a multi-scale predictive coding model for future video frames prediction.
Our model employs a multi-scale approach (Coarse to Fine) where the higher level neurons generate coarser predictions (lower resolution)
We propose several improvements to the training strategy to mitigate the accumulation of prediction errors in long-term prediction.
- Score: 1.2537993038844142
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a multi-scale predictive coding model for future video frames
prediction. Drawing inspiration on the ``Predictive Coding" theories in
cognitive science, it is updated by a combination of bottom-up and top-down
information flows, which can enhance the interaction between different network
levels. However, traditional predictive coding models only predict what is
happening hierarchically rather than predicting the future. To address the
problem, our model employs a multi-scale approach (Coarse to Fine), where the
higher level neurons generate coarser predictions (lower resolution), while the
lower level generate finer predictions (higher resolution). In terms of network
architecture, we directly incorporate the encoder-decoder network within the
LSTM module and share the final encoded high-level semantic information across
different network levels. This enables comprehensive interaction between the
current input and the historical states of LSTM compared with the traditional
Encoder-LSTM-Decoder architecture, thus learning more believable temporal and
spatial dependencies. Furthermore, to tackle the instability in adversarial
training and mitigate the accumulation of prediction errors in long-term
prediction, we propose several improvements to the training strategy. Our
approach achieves good performance on datasets such as KTH, Moving MNIST and
Caltech Pedestrian. Code is available at https://github.com/Ling-CF/MSPN.
Related papers
- Dynamic Semantic Compression for CNN Inference in Multi-access Edge
Computing: A Graph Reinforcement Learning-based Autoencoder [82.8833476520429]
We propose a novel semantic compression method, autoencoder-based CNN architecture (AECNN) for effective semantic extraction and compression in partial offloading.
In the semantic encoder, we introduce a feature compression module based on the channel attention mechanism in CNNs, to compress intermediate data by selecting the most informative features.
In the semantic decoder, we design a lightweight decoder to reconstruct the intermediate data through learning from the received compressed data to improve accuracy.
arXiv Detail & Related papers (2024-01-19T15:19:47Z) - Dynamic Encoding and Decoding of Information for Split Learning in
Mobile-Edge Computing: Leveraging Information Bottleneck Theory [1.1151919978983582]
Split learning is a privacy-preserving distributed learning paradigm in which an ML model is split into two parts (i.e., an encoder and a decoder)
In mobile-edge computing, network functions can be trained via split learning where an encoder resides in a user equipment (UE) and a decoder resides in the edge network.
We present a new framework and training mechanism to enable a dynamic balancing of the transmission resource consumption with the informativeness of the shared latent representations.
arXiv Detail & Related papers (2023-09-06T07:04:37Z) - Set-based Neural Network Encoding Without Weight Tying [91.37161634310819]
We propose a neural network weight encoding method for network property prediction.
Our approach is capable of encoding neural networks in a model zoo of mixed architecture.
We introduce two new tasks for neural network property prediction: cross-dataset and cross-architecture.
arXiv Detail & Related papers (2023-05-26T04:34:28Z) - Pyramidal Predictive Network: A Model for Visual-frame Prediction Based
on Predictive Coding Theory [1.4610038284393165]
We propose a novel neural network model for the task of visual-frame prediction.
The model is composed of a series of recurrent and convolutional units forming the top-down and bottom-up streams.
It learns to predict future frames in a visual sequence, with ConvLSTMs on each layer in the network making local prediction from top to down.
arXiv Detail & Related papers (2022-08-15T06:28:34Z) - On the Prediction Network Architecture in RNN-T for ASR [1.7262456746016954]
We compare 4 types of prediction networks based on a common state-of-the-art Conformer encoder.
Inspired by our scoreboard, we propose a new simple prediction network architecture, N-Concat.
arXiv Detail & Related papers (2022-06-29T13:11:46Z) - Learning Cross-Scale Prediction for Efficient Neural Video Compression [30.051859347293856]
We present the first neural video that can compete with the latest coding standard H.266/VVC in terms of sRGB PSNR on UVG dataset for the low-latency mode.
We propose a novel cross-scale prediction module that achieves more effective motion compensation.
arXiv Detail & Related papers (2021-12-26T03:12:17Z) - Evaluation of deep learning models for multi-step ahead time series
prediction [1.3764085113103222]
We present an evaluation study that compares the performance of deep learning models for multi-step ahead time series prediction.
Our deep learning methods compromise of simple recurrent neural networks, long short term memory (LSTM) networks, bidirectional LSTM, encoder-decoder LSTM networks, and convolutional neural networks.
arXiv Detail & Related papers (2021-03-26T04:07:11Z) - PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive
Learning [109.84770951839289]
We present PredRNN, a new recurrent network for learning visual dynamics from historical context.
We show that our approach obtains highly competitive results on three standard datasets.
arXiv Detail & Related papers (2021-03-17T08:28:30Z) - Dynamic Hierarchical Mimicking Towards Consistent Optimization
Objectives [73.15276998621582]
We propose a generic feature learning mechanism to advance CNN training with enhanced generalization ability.
Partially inspired by DSN, we fork delicately designed side branches from the intermediate layers of a given neural network.
Experiments on both category and instance recognition tasks demonstrate the substantial improvements of our proposed method.
arXiv Detail & Related papers (2020-03-24T09:56:13Z) - Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model.
This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs)
The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.