Faster Convergence in Deep-Predictive-Coding Networks to Learn Deeper
Representations
- URL: http://arxiv.org/abs/2101.06848v2
- Date: Fri, 5 Feb 2021 07:03:20 GMT
- Title: Faster Convergence in Deep-Predictive-Coding Networks to Learn Deeper
Representations
- Authors: Isaac J. Sledge and Jose C. Principe
- Abstract summary: Deep-predictive-coding networks (DPCNs) are hierarchical, generative models that rely on feed-forward and feed-back connections.
A crucial element of DPCNs is a forward-backward inference procedure to uncover sparse states of a dynamic model.
We propose an optimization strategy, with better empirical and theoretical convergence, based on accelerated proximal gradients.
- Score: 12.716429755564821
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep-predictive-coding networks (DPCNs) are hierarchical, generative models
that rely on feed-forward and feed-back connections to modulate latent feature
representations of stimuli in a dynamic and context-sensitive manner. A crucial
element of DPCNs is a forward-backward inference procedure to uncover sparse
states of a dynamic model, which are used for invariant feature extraction.
However, this inference and the corresponding backwards network parameter
updating are major computational bottlenecks. They severely limit the network
depths that can be reasonably implemented and easily trained. We therefore
propose an optimization strategy, with better empirical and theoretical
convergence, based on accelerated proximal gradients.
We demonstrate that the ability to construct deeper DPCNs leads to receptive
fields that capture well the entire notions of objects on which the networks
are trained. This improves the feature representations. It yields completely
unsupervised classifiers that surpass convolutional and convolutional-recurrent
autoencoders and are on par with convolutional networks trained in a supervised
manner. This is despite the DPCNs having orders of magnitude fewer parameters.
Related papers
- Towards Efficient Deep Spiking Neural Networks Construction with Spiking Activity based Pruning [17.454100169491497]
We propose a structured pruning approach based on the activity levels of convolutional kernels named Spiking Channel Activity-based (SCA) network pruning framework.
Inspired by synaptic plasticity mechanisms, our method dynamically adjusts the network's structure by pruning and regenerating convolutional kernels during training, enhancing the model's adaptation to the current target task.
arXiv Detail & Related papers (2024-06-03T07:44:37Z) - Leveraging Low-Rank and Sparse Recurrent Connectivity for Robust
Closed-Loop Control [63.310780486820796]
We show how a parameterization of recurrent connectivity influences robustness in closed-loop settings.
We find that closed-form continuous-time neural networks (CfCs) with fewer parameters can outperform their full-rank, fully-connected counterparts.
arXiv Detail & Related papers (2023-10-05T21:44:18Z) - Deep Architecture Connectivity Matters for Its Convergence: A
Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training.
We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z) - SIRe-Networks: Skip Connections over Interlaced Multi-Task Learning and
Residual Connections for Structure Preserving Object Classification [28.02302915971059]
In this paper, we introduce an interlaced multi-task learning strategy, defined SIRe, to reduce the vanishing gradient in relation to the object classification task.
The presented methodology directly improves a convolutional neural network (CNN) by enforcing the input image structure preservation through auto-encoders.
To validate the presented methodology, a simple CNN and various implementations of famous networks are extended via the SIRe strategy and extensively tested on the CIFAR100 dataset.
arXiv Detail & Related papers (2021-10-06T13:54:49Z) - Latent Code-Based Fusion: A Volterra Neural Network Approach [21.25021807184103]
We propose a deep structure encoder using the recently introduced Volterra Neural Networks (VNNs)
We show that the proposed approach demonstrates a much-improved sample complexity over CNN-based auto-encoder with a superb robust classification performance.
arXiv Detail & Related papers (2021-04-10T18:29:01Z) - A Convergence Theory Towards Practical Over-parameterized Deep Neural
Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time.
We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both.
Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z) - A Differential Game Theoretic Neural Optimizer for Training Residual
Networks [29.82841891919951]
We propose a generalized Differential Dynamic Programming (DDP) neural architecture that accepts both residual connections and convolution layers.
The resulting optimal control representation admits a gameoretic perspective, in which training residual networks can be interpreted as cooperative trajectory optimization on state-augmented systems.
arXiv Detail & Related papers (2020-07-17T10:19:17Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - Dynamic Hierarchical Mimicking Towards Consistent Optimization
Objectives [73.15276998621582]
We propose a generic feature learning mechanism to advance CNN training with enhanced generalization ability.
Partially inspired by DSN, we fork delicately designed side branches from the intermediate layers of a given neural network.
Experiments on both category and instance recognition tasks demonstrate the substantial improvements of our proposed method.
arXiv Detail & Related papers (2020-03-24T09:56:13Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.