Predictive coding, precision and natural gradients
- URL: http://arxiv.org/abs/2111.06942v1
- Date: Fri, 12 Nov 2021 21:05:03 GMT
- Title: Predictive coding, precision and natural gradients
- Authors: Andre Ofner, Raihan Kabir Ratul, Suhita Ghosh, Sebastian Stober
- Abstract summary: We show that hierarchical predictive coding networks with learnable precision are able to solve various supervised and unsupervised learning tasks.
When applied to unsupervised auto-encoding of image inputs, the deterministic network produces hierarchically organized and disentangled embeddings.
- Score: 2.1601966913620325
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There is an increasing convergence between biologically plausible
computational models of inference and learning with local update rules and the
global gradient-based optimization of neural network models employed in machine
learning. One particularly exciting connection is the correspondence between
the locally informed optimization in predictive coding networks and the error
backpropagation algorithm that is used to train state-of-the-art deep
artificial neural networks. Here we focus on the related, but still largely
under-explored connection between precision weighting in predictive coding
networks and the Natural Gradient Descent algorithm for deep neural networks.
Precision-weighted predictive coding is an interesting candidate for scaling up
uncertainty-aware optimization -- particularly for models with large parameter
spaces -- due to its distributed nature of the optimization process and the
underlying local approximation of the Fisher information metric, the adaptive
learning rate that is central to Natural Gradient Descent. Here, we show that
hierarchical predictive coding networks with learnable precision indeed are
able to solve various supervised and unsupervised learning tasks with
performance comparable to global backpropagation with natural gradients and
outperform their classical gradient descent counterpart on tasks where high
amounts of noise are embedded in data or label inputs. When applied to
unsupervised auto-encoding of image inputs, the deterministic network produces
hierarchically organized and disentangled embeddings, hinting at the close
connections between predictive coding and hierarchical variational inference.
Related papers
- Adaptive Class Emergence Training: Enhancing Neural Network Stability and Generalization through Progressive Target Evolution [0.0]
We propose a novel training methodology for neural networks in classification problems.
We evolve the target outputs from a null vector to one-hot encoded vectors throughout the training process.
This gradual transition allows the network to adapt more smoothly to the increasing complexity of the classification task.
arXiv Detail & Related papers (2024-09-04T03:25:48Z) - Improving Generalization of Deep Neural Networks by Optimum Shifting [33.092571599896814]
We propose a novel method called emphoptimum shifting, which changes the parameters of a neural network from a sharp minimum to a flatter one.
Our method is based on the observation that when the input and output of a neural network are fixed, the matrix multiplications within the network can be treated as systems of under-determined linear equations.
arXiv Detail & Related papers (2024-05-23T02:31:55Z) - Precision Machine Learning [5.15188009671301]
We compare various function approximation methods and study how they scale with increasing parameters and data.
We find that neural networks can often outperform classical approximation methods on high-dimensional examples.
We develop training tricks which enable us to train neural networks to extremely low loss, close to the limits allowed by numerical precision.
arXiv Detail & Related papers (2022-10-24T17:58:30Z) - Scalable computation of prediction intervals for neural networks via
matrix sketching [79.44177623781043]
Existing algorithms for uncertainty estimation require modifying the model architecture and training procedure.
This work proposes a new algorithm that can be applied to a given trained neural network and produces approximate prediction intervals.
arXiv Detail & Related papers (2022-05-06T13:18:31Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Non-Gradient Manifold Neural Network [79.44066256794187]
Deep neural network (DNN) generally takes thousands of iterations to optimize via gradient descent.
We propose a novel manifold neural network based on non-gradient optimization.
arXiv Detail & Related papers (2021-06-15T06:39:13Z) - Learning Structures for Deep Neural Networks [99.8331363309895]
We propose to adopt the efficient coding principle, rooted in information theory and developed in computational neuroscience.
We show that sparse coding can effectively maximize the entropy of the output signals.
Our experiments on a public image classification dataset demonstrate that using the structure learned from scratch by our proposed algorithm, one can achieve a classification accuracy comparable to the best expert-designed structure.
arXiv Detail & Related papers (2021-05-27T12:27:24Z) - Rethinking Skip Connection with Layer Normalization in Transformers and
ResNets [49.87919454950763]
Skip connection is a widely-used technique to improve the performance of deep neural networks.
In this work, we investigate how the scale factors in the effectiveness of the skip connection.
arXiv Detail & Related papers (2021-05-15T11:44:49Z) - Analytically Tractable Inference in Deep Neural Networks [0.0]
Tractable Approximate Inference (TAGI) algorithm was shown to be a viable and scalable alternative to backpropagation for shallow fully-connected neural networks.
We are demonstrating how TAGI matches or exceeds the performance of backpropagation, for training classic deep neural network architectures.
arXiv Detail & Related papers (2021-03-09T14:51:34Z) - Understanding the Effects of Data Parallelism and Sparsity on Neural
Network Training [126.49572353148262]
We study two factors in neural network training: data parallelism and sparsity.
Despite their promising benefits, understanding of their effects on neural network training remains elusive.
arXiv Detail & Related papers (2020-03-25T10:49:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.