Related papers: Predictive Coding Based Multiscale Network with Encoder-Decoder LSTM for Video Prediction

Predictive Coding Based Multiscale Network with Encoder-Decoder LSTM for Video Prediction

URL: http://arxiv.org/abs/2212.11642v3
Date: Sun, 8 Oct 2023 15:19:18 GMT
Title: Predictive Coding Based Multiscale Network with Encoder-Decoder LSTM for Video Prediction
Authors: Chaofan Ling, Junpei Zhong and Weihua Li
Abstract summary: We present a multi-scale predictive coding model for future video frames prediction. Our model employs a multi-scale approach (Coarse to Fine) where the higher level neurons generate coarser predictions (lower resolution) We propose several improvements to the training strategy to mitigate the accumulation of prediction errors in long-term prediction.
Score: 1.2537993038844142
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a multi-scale predictive coding model for future video frames prediction. Drawing inspiration on the ``Predictive Coding" theories in cognitive science, it is updated by a combination of bottom-up and top-down information flows, which can enhance the interaction between different network levels. However, traditional predictive coding models only predict what is happening hierarchically rather than predicting the future. To address the problem, our model employs a multi-scale approach (Coarse to Fine), where the higher level neurons generate coarser predictions (lower resolution), while the lower level generate finer predictions (higher resolution). In terms of network architecture, we directly incorporate the encoder-decoder network within the LSTM module and share the final encoded high-level semantic information across different network levels. This enables comprehensive interaction between the current input and the historical states of LSTM compared with the traditional Encoder-LSTM-Decoder architecture, thus learning more believable temporal and spatial dependencies. Furthermore, to tackle the instability in adversarial training and mitigate the accumulation of prediction errors in long-term prediction, we propose several improvements to the training strategy. Our approach achieves good performance on datasets such as KTH, Moving MNIST and Caltech Pedestrian. Code is available at https://github.com/Ling-CF/MSPN.

Related papers

DMPCN: Dynamic Modulated Predictive Coding Network with Hybrid Feedback Representations [2.9923113351846076]
This paper introduces a hybrid prediction error feedback mechanism with dynamic modulation for deep predictive coding networks. We also present a loss function tailored to this framework to improve accuracy by focusing on precise prediction error minimization. Experimental results demonstrate the superiority of our model over other approaches.
arXiv Detail & Related papers (2025-04-20T16:14:07Z)
Differential Machine Learning for Time Series Prediction [1.3812010983144802]
We propose a novel approach that enhances neural network predictions through differential learning. We develop a differential long short-term memory (Diff-LSTM) network that uses a shared LSTM cell to simultaneously process both data streams.
arXiv Detail & Related papers (2025-03-05T09:36:57Z)
Dynamic Semantic Compression for CNN Inference in Multi-access Edge Computing: A Graph Reinforcement Learning-based Autoencoder [82.8833476520429]
We propose a novel semantic compression method, autoencoder-based CNN architecture (AECNN) for effective semantic extraction and compression in partial offloading. In the semantic encoder, we introduce a feature compression module based on the channel attention mechanism in CNNs, to compress intermediate data by selecting the most informative features. In the semantic decoder, we design a lightweight decoder to reconstruct the intermediate data through learning from the received compressed data to improve accuracy.
arXiv Detail & Related papers (2024-01-19T15:19:47Z)
Dynamic Encoding and Decoding of Information for Split Learning in Mobile-Edge Computing: Leveraging Information Bottleneck Theory [1.1151919978983582]
Split learning is a privacy-preserving distributed learning paradigm in which an ML model is split into two parts (i.e., an encoder and a decoder) In mobile-edge computing, network functions can be trained via split learning where an encoder resides in a user equipment (UE) and a decoder resides in the edge network. We present a new framework and training mechanism to enable a dynamic balancing of the transmission resource consumption with the informativeness of the shared latent representations.
arXiv Detail & Related papers (2023-09-06T07:04:37Z)
Set-based Neural Network Encoding Without Weight Tying [91.37161634310819]
We propose a neural network weight encoding method for network property prediction. Our approach is capable of encoding neural networks in a model zoo of mixed architecture. We introduce two new tasks for neural network property prediction: cross-dataset and cross-architecture.
arXiv Detail & Related papers (2023-05-26T04:34:28Z)
Pyramidal Predictive Network: A Model for Visual-frame Prediction Based on Predictive Coding Theory [1.4610038284393165]
We propose a novel neural network model for the task of visual-frame prediction. The model is composed of a series of recurrent and convolutional units forming the top-down and bottom-up streams. It learns to predict future frames in a visual sequence, with ConvLSTMs on each layer in the network making local prediction from top to down.
arXiv Detail & Related papers (2022-08-15T06:28:34Z)
On the Prediction Network Architecture in RNN-T for ASR [1.7262456746016954]
We compare 4 types of prediction networks based on a common state-of-the-art Conformer encoder. Inspired by our scoreboard, we propose a new simple prediction network architecture, N-Concat.
arXiv Detail & Related papers (2022-06-29T13:11:46Z)
Learning Cross-Scale Prediction for Efficient Neural Video Compression [30.051859347293856]
We present the first neural video that can compete with the latest coding standard H.266/VVC in terms of sRGB PSNR on UVG dataset for the low-latency mode. We propose a novel cross-scale prediction module that achieves more effective motion compensation.
arXiv Detail & Related papers (2021-12-26T03:12:17Z)
Evaluation of deep learning models for multi-step ahead time series prediction [1.3764085113103222]
We present an evaluation study that compares the performance of deep learning models for multi-step ahead time series prediction. Our deep learning methods compromise of simple recurrent neural networks, long short term memory (LSTM) networks, bidirectional LSTM, encoder-decoder LSTM networks, and convolutional neural networks.
arXiv Detail & Related papers (2021-03-26T04:07:11Z)
PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive Learning [109.84770951839289]
We present PredRNN, a new recurrent network for learning visual dynamics from historical context. We show that our approach obtains highly competitive results on three standard datasets.
arXiv Detail & Related papers (2021-03-17T08:28:30Z)
Dynamic Hierarchical Mimicking Towards Consistent Optimization Objectives [73.15276998621582]
We propose a generic feature learning mechanism to advance CNN training with enhanced generalization ability. Partially inspired by DSN, we fork delicately designed side branches from the intermediate layers of a given neural network. Experiments on both category and instance recognition tasks demonstrate the substantial improvements of our proposed method.
arXiv Detail & Related papers (2020-03-24T09:56:13Z)
Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model. This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs) The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z)
Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks. We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.