Related papers: Neuron Campaign for Initialization Guided by Information Bottleneck Theory

Neuron Campaign for Initialization Guided by Information Bottleneck Theory

URL: http://arxiv.org/abs/2108.06530v1
Date: Sat, 14 Aug 2021 13:19:43 GMT
Title: Neuron Campaign for Initialization Guided by Information Bottleneck Theory
Authors: Haitao Mao, Xu Chen, Qiang Fu, Lun Du, Shi Han and Dongmei Zhang
Abstract summary: Initialization plays a critical role in the training of deep neural networks (DNN) We use the Information Bottleneck (IB) theory to provide an explanation about the generalization of DNN. Experiments on MNIST dataset show that our method can lead to a better generalization performance with faster convergence.
Score: 31.44355490646638
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Initialization plays a critical role in the training of deep neural networks (DNN). Existing initialization strategies mainly focus on stabilizing the training process to mitigate gradient vanish/explosion problems. However, these initialization methods are lacking in consideration about how to enhance generalization ability. The Information Bottleneck (IB) theory is a well-known understanding framework to provide an explanation about the generalization of DNN. Guided by the insights provided by IB theory, we design two criteria for better initializing DNN. And we further design a neuron campaign initialization algorithm to efficiently select a good initialization for a neural network on a given dataset. The experiments on MNIST dataset show that our method can lead to a better generalization performance with faster convergence.

Related papers

When the Left Foot Leads to the Right Path: Bridging Initial Prejudice and Trainability [0.07767214588770123]
Mean-field (MF) analyses have demonstrated that the parameter distribution in randomly networks dictates whether gradients vanish or explode.<n>In untrained DNNs, large regions of the input space are assigned to a single class.<n>In this work, we derive a theoretical proof establishing the correspondence between IGB and previous MF theories.
arXiv Detail & Related papers (2025-05-17T17:31:56Z)
On the Initialization of Graph Neural Networks [10.153841274798829]
We analyze the variance of forward and backward propagation across Graph Neural Networks layers. We propose a new method for Variance Instability Reduction within GNN Optimization (Virgo) We conduct comprehensive experiments on 15 datasets to show that Virgo can lead to superior model performance.
arXiv Detail & Related papers (2023-12-05T09:55:49Z)
Learning Expressive Priors for Generalization and Uncertainty Estimation in Neural Networks [77.89179552509887]
We propose a novel prior learning method for advancing generalization and uncertainty estimation in deep neural networks. The key idea is to exploit scalable and structured posteriors of neural networks as informative priors with generalization guarantees. We exhaustively show the effectiveness of this method for uncertainty estimation and generalization.
arXiv Detail & Related papers (2023-07-15T09:24:33Z)
Imbedding Deep Neural Networks [0.0]
Continuous depth neural networks, such as Neural ODEs, have refashioned the understanding of residual neural networks in terms of non-linear vector-valued optimal control problems. We propose a new approach which explicates the network's depth' as a fundamental variable, thus reducing the problem to a system of forward-facing initial value problems.
arXiv Detail & Related papers (2022-01-31T22:00:41Z)
Subquadratic Overparameterization for Shallow Neural Networks [60.721751363271146]
We provide an analytical framework that allows us to adopt standard neural training strategies. We achieve the desiderata viaak-Lojasiewicz, smoothness, and standard assumptions.
arXiv Detail & Related papers (2021-11-02T20:24:01Z)
A Weight Initialization Based on the Linear Product Structure for Neural Networks [0.0]
We study neural networks from a nonlinear point of view and propose a novel weight initialization strategy that is based on the linear product structure (LPS) of neural networks. The proposed strategy is derived from the approximation of activation functions by using theories of numerical algebra to guarantee to find all the local minima.
arXiv Detail & Related papers (2021-09-01T00:18:59Z)
Initialization Matters: Regularizing Manifold-informed Initialization for Neural Recommendation Systems [47.49065927541129]
We propose a new scheme for user embeddings called Laplacian Eigenmaps with Popularity-based Regularization for Isolated Data (LEPORID) LEPORID endows the embeddings with information regarding multi-scale neighborhood structures on the data manifold and performs adaptive regularization to compensate for high embedding variance on the tail of the data distribution. We show that existing neural systems with LEPORID often perform on par or better than KNN.
arXiv Detail & Related papers (2021-06-09T11:26:18Z)
Learning Structures for Deep Neural Networks [99.8331363309895]
We propose to adopt the efficient coding principle, rooted in information theory and developed in computational neuroscience. We show that sparse coding can effectively maximize the entropy of the output signals. Our experiments on a public image classification dataset demonstrate that using the structure learned from scratch by our proposed algorithm, one can achieve a classification accuracy comparable to the best expert-designed structure.
arXiv Detail & Related papers (2021-05-27T12:27:24Z)
Persistent Neurons [4.061135251278187]
We propose a trajectory-based strategy that optimize the learning task using information from previous solutions. Persistent neurons can be regarded as a method with gradient informed bias where individual updates are corrupted by deterministic error terms. We evaluate the full and partial persistent model and show it can be used to boost the performance on a range of NN structures.
arXiv Detail & Related papers (2020-07-02T22:36:49Z)
Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix. Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks. The use of gradient combined nonvolutionity renders learning susceptible to novel problems. We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.