A Generalized Information Bottleneck Theory of Deep Learning
- URL: http://arxiv.org/abs/2509.26327v2
- Date: Tue, 14 Oct 2025 14:46:14 GMT
- Title: A Generalized Information Bottleneck Theory of Deep Learning
- Authors: Charles Westphal, Stephen Hailes, Mirco Musolesi,
- Abstract summary: The Information Bottleneck (IB) principle offers a compelling theoretical framework to understand how neural networks (NNs) learn.<n>We present a framework that reformulates the original IB principle through the lens of synergy.
- Score: 10.454976783057086
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Information Bottleneck (IB) principle offers a compelling theoretical framework to understand how neural networks (NNs) learn. However, its practical utility has been constrained by unresolved theoretical ambiguities and significant challenges in accurate estimation. In this paper, we present a \textit{Generalized Information Bottleneck (GIB)} framework that reformulates the original IB principle through the lens of synergy, i.e., the information obtainable only through joint processing of features. We provide theoretical and empirical evidence demonstrating that synergistic functions achieve superior generalization compared to their non-synergistic counterparts. Building on these foundations we re-formulate the IB using a computable definition of synergy based on the average interaction information (II) of each feature with those remaining. We demonstrate that the original IB objective is upper bounded by our GIB in the case of perfect estimation, ensuring compatibility with existing IB theory while addressing its limitations. Our experimental results demonstrate that GIB consistently exhibits compression phases across a wide range of architectures (including those with \textit{ReLU} activations where the standard IB fails), while yielding interpretable dynamics in both CNNs and Transformers and aligning more closely with our understanding of adversarial robustness.
Related papers
- Takeuchi's Information Criteria as Generalization Measures for DNNs Close to NTK Regime [56.89793618576349]
Generalization measures have been studied extensively in the machine learning community to better characterize generalization gaps.<n>This study focuses on Takeuchi's information criterion (TIC) to investigate the conditions under which this classical measure can effectively explain the generalization gaps of deep neural networks (DNNs)
arXiv Detail & Related papers (2026-02-26T17:01:14Z) - Conjugate Learning Theory: Uncovering the Mechanisms of Trainability and Generalization in Deep Neural Networks [0.0]
We develop a conjugate learning theoretical framework based on convex conjugate duality to characterize this learnability property.<n>We demonstrate that training deep neural networks (DNNs) with mini-batch descent (SGD) achieves global optima of empirical risk.<n>We derive deterministic and probabilistic bounds on generalization error based on conditional generalized entropy measures.
arXiv Detail & Related papers (2026-02-18T04:26:55Z) - Understanding the Information Propagation Effects of Communication Topologies in LLM-based Multi-Agent Systems [58.95962217043371]
We present a causal framework to analyze how agent outputs, whether correct or erroneous, propagate under topologies with varying sparsity.<n>Our empirical studies reveal that moderately sparse topologies, which effectively suppress error propagation while preserving beneficial information diffusion, typically achieve optimal task performance.<n>We propose a novel topology design approach, EIB-leanrner, that balances error suppression and beneficial information propagation by fusing connectivity patterns from both dense and sparse graphs.
arXiv Detail & Related papers (2025-05-29T11:21:48Z) - Structured IB: Improving Information Bottleneck with Structured Feature Learning [32.774660308233635]
We introduce Structured IB, a framework for investigating potential structured features.<n>Our experiments demonstrate superior prediction accuracy and task-relevant information compared to the original IB Lagrangian method.
arXiv Detail & Related papers (2024-12-11T09:17:45Z) - Elastic Information Bottleneck [34.90040361806197]
Information bottleneck is an information-theoretic principle of representation learning.
We propose an elastic information bottleneck (EIB) to interpolate between the IB and DIB regularizers.
simulations and real data experiments show that EIB has the ability to achieve better domain adaptation results than IB and DIB.
arXiv Detail & Related papers (2023-11-07T12:53:55Z) - Disentangled Representation Learning with Transmitted Information Bottleneck [57.22757813140418]
We present textbfDisTIB (textbfTransmitted textbfInformation textbfBottleneck for textbfDisd representation learning), a novel objective that navigates the balance between information compression and preservation.
arXiv Detail & Related papers (2023-11-03T03:18:40Z) - Networked Communication for Decentralised Agents in Mean-Field Games [59.01527054553122]
We introduce networked communication to the mean-field game framework.<n>We prove that our architecture has sample guarantees bounded between those of the centralised- and independent-learning cases.<n>We show that our networked approach has significant advantages over both alternatives in terms of robustness to update failures and to changes in population size.
arXiv Detail & Related papers (2023-06-05T10:45:39Z) - Recognizable Information Bottleneck [31.993478081354958]
Information Bottlenecks (IBs) learn representations that generalize to unseen data by information compression.
IBs are practically unable to guarantee generalization in real-world scenarios due to the vacuous generalization bound.
We propose a Recognizable Information Bottleneck (RIB) which regularizes the recognizability of representations through a recognizability critic.
arXiv Detail & Related papers (2023-04-28T03:55:33Z) - Synergies between Disentanglement and Sparsity: Generalization and
Identifiability in Multi-Task Learning [79.83792914684985]
We prove a new identifiability result that provides conditions under which maximally sparse base-predictors yield disentangled representations.
Motivated by this theoretical result, we propose a practical approach to learn disentangled representations based on a sparsity-promoting bi-level optimization problem.
arXiv Detail & Related papers (2022-11-26T21:02:09Z) - A Chain Graph Interpretation of Real-World Neural Networks [58.78692706974121]
We propose an alternative interpretation that identifies NNs as chain graphs (CGs) and feed-forward as an approximate inference procedure.
The CG interpretation specifies the nature of each NN component within the rich theoretical framework of probabilistic graphical models.
We demonstrate with concrete examples that the CG interpretation can provide novel theoretical support and insights for various NN techniques.
arXiv Detail & Related papers (2020-06-30T14:46:08Z) - A Theoretical Framework for Target Propagation [75.52598682467817]
We analyze target propagation (TP), a popular but not yet fully understood alternative to backpropagation (BP)
Our theory shows that TP is closely related to Gauss-Newton optimization and thus substantially differs from BP.
We provide a first solution to this problem through a novel reconstruction loss that improves feedback weight training.
arXiv Detail & Related papers (2020-06-25T12:07:06Z) - The Dual Information Bottleneck [1.6559345531428509]
The Information Bottleneck (IB) framework is a general characterization of optimal representations obtained using a principled approach for balancing accuracy and complexity.
We present a new framework, the Dual Information Bottleneck (dualIB) which resolves some of the known drawbacks of the IB.
arXiv Detail & Related papers (2020-06-08T14:43:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.