Related papers: Information Bottleneck Analysis of Deep Neural Networks via Lossy Compression

Information Bottleneck Analysis of Deep Neural Networks via Lossy Compression

URL: http://arxiv.org/abs/2305.08013v2
Date: Wed, 8 May 2024 21:52:52 GMT
Title: Information Bottleneck Analysis of Deep Neural Networks via Lossy Compression
Authors: Ivan Butakov, Alexander Tolmachev, Sofia Malanchuk, Anna Neopryatnaya, Alexey Frolov, Kirill Andreev,
Abstract summary: The Information Bottleneck (IB) principle offers an information-theoretic framework for analyzing the training process of deep neural networks (DNNs) In this paper, we introduce a framework for conducting IB analysis of general NNs. We also perform IB analysis on a close-to-real-scale, which reveals new features of the MI dynamics.
Score: 37.69303106863453
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Information Bottleneck (IB) principle offers an information-theoretic framework for analyzing the training process of deep neural networks (DNNs). Its essence lies in tracking the dynamics of two mutual information (MI) values: between the hidden layer output and the DNN input/target. According to the hypothesis put forth by Shwartz-Ziv & Tishby (2017), the training process consists of two distinct phases: fitting and compression. The latter phase is believed to account for the good generalization performance exhibited by DNNs. Due to the challenging nature of estimating MI between high-dimensional random vectors, this hypothesis was only partially verified for NNs of tiny sizes or specific types, such as quantized NNs. In this paper, we introduce a framework for conducting IB analysis of general NNs. Our approach leverages the stochastic NN method proposed by Goldfeld et al. (2019) and incorporates a compression step to overcome the obstacles associated with high dimensionality. In other words, we estimate the MI between the compressed representations of high-dimensional random vectors. The proposed method is supported by both theoretical and practical justifications. Notably, we demonstrate the accuracy of our estimator through synthetic experiments featuring predefined MI values and comparison with MINE (Belghazi et al., 2018). Finally, we perform IB analysis on a close-to-real-scale convolutional DNN, which reveals new features of the MI dynamics.

Related papers

An Analysis Framework for Understanding Deep Neural Networks Based on Network Dynamics [11.44947569206928]
Deep neural networks (DNNs) maximize information extraction by rationally allocating the proportion of neurons in different modes across deep layers. This framework provides a unified explanation for fundamental DNN behaviors such as the "flat minima effect," "grokking," and double descent phenomena.
arXiv Detail & Related papers (2025-01-05T04:23:21Z)
Empowering Bayesian Neural Networks with Functional Priors through Anchored Ensembling for Mechanics Surrogate Modeling Applications [0.0]
We present a novel BNN training scheme based on anchored ensembling that can integrate a priori information available in the function space. The anchoring scheme makes use of low-rank correlations between NN parameters, learnt from pre-training to realizations of the functional prior. We also perform a study to demonstrate how correlations between NN weights, which are often neglected in existing BNN implementations, is critical to appropriately transfer knowledge between the function-space and parameter-space priors.
arXiv Detail & Related papers (2024-09-08T22:27:50Z)
Convergence of Implicit Gradient Descent for Training Two-Layer Physics-Informed Neural Networks [4.554284689395686]
implicit gradient descent (IGD) outperforms the common gradient descent (GD) algorithm in handling certain multi-scale problems.<n>We show that IGD converges to a globally optimal solution at a linear convergence rate.
arXiv Detail & Related papers (2024-07-03T06:10:41Z)
Residual-based attention and connection to information bottleneck theory in PINNs [0.393259574660092]
Physics-informed neural networks (PINNs) have seen a surge of interest in recent years. We propose an efficient, gradient-less weighting scheme for PINNs, that accelerates the convergence of dynamic or static systems.
arXiv Detail & Related papers (2023-07-01T16:29:55Z)
Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification. Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z)
Comparative Analysis of Interval Reachability for Robust Implicit and Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs) INNs are a class of implicit learning models that use implicit equations as layers. We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z)
Neural Estimation of Statistical Divergences [24.78742908726579]
A modern method for estimating statistical divergences relies on parametrizing an empirical variational form by a neural network (NN) In particular, there is a fundamental tradeoff between the two sources of error involved: approximation and empirical estimation. We show that neural estimators with a slightly different NN growth-rate are near minimax rate-optimal, achieving the parametric convergence rate up to logarithmic factors.
arXiv Detail & Related papers (2021-10-07T17:42:44Z)
Inter-layer Information Similarity Assessment of Deep Neural Networks Via Topological Similarity and Persistence Analysis of Data Neighbour Dynamics [93.4221402881609]
The quantitative analysis of information structure through a deep neural network (DNN) can unveil new insights into the theoretical performance of DNN architectures. Inspired by both LS and ID strategies for quantitative information structure analysis, we introduce two novel complimentary methods for inter-layer information similarity assessment. We demonstrate their efficacy in this study by performing analysis on a deep convolutional neural network architecture on image data.
arXiv Detail & Related papers (2020-12-07T15:34:58Z)
Modeling from Features: a Mean-field Framework for Over-parameterized Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs) In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit. We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z)
Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs) We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent. For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z)
A deep learning framework for solution and discovery in solid mechanics [1.4699455652461721]
We present the application of a class of deep learning, known as Physics Informed Neural Networks (PINN), to learning and discovery in solid mechanics. We explain how to incorporate the momentum balance and elasticity relations into PINN, and explore in detail the application to linear elasticity.
arXiv Detail & Related papers (2020-02-14T08:24:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.