Related papers: Perturbation Theory for the Information Bottleneck

Perturbation Theory for the Information Bottleneck

URL: http://arxiv.org/abs/2105.13977v1
Date: Fri, 28 May 2021 16:59:01 GMT
Title: Perturbation Theory for the Information Bottleneck
Authors: Vudtiwat Ngampruetikorn, David J. Schwab
Abstract summary: Information bottleneck (IB) method formalizes extracting relevant information from data. nonlinearity of the IB problem makes it computationally expensive and analytically intractable in general. We derive a perturbation theory for the IB method and report the first complete characterization of the learning onset.
Score: 6.117084972237769
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Extracting relevant information from data is crucial for all forms of learning. The information bottleneck (IB) method formalizes this, offering a mathematically precise and conceptually appealing framework for understanding learning phenomena. However the nonlinearity of the IB problem makes it computationally expensive and analytically intractable in general. Here we derive a perturbation theory for the IB method and report the first complete characterization of the learning onset, the limit of maximum relevant information per bit extracted from data. We test our results on synthetic probability distributions, finding good agreement with the exact numerical solution near the onset of learning. We explore the difference and subtleties in our derivation and previous attempts at deriving a perturbation theory for the learning onset and attribute the discrepancy to a flawed assumption. Our work also provides a fresh perspective on the intimate relationship between the IB method and the strong data processing inequality.

Related papers

Generalization Error Analysis for Attack-Free and Byzantine-Resilient Decentralized Learning with Data Heterogeneity [23.509076905112526]
We present fine-grained error analysis for both attack-free and Byzantine-resilient decentralized learning with heterogeneous data.<n>We also reveal that attacks performed by malicious agents largely affect the error.
arXiv Detail & Related papers (2025-06-11T06:44:34Z)
Learning Latent Graph Structures and their Uncertainty [63.95971478893842]
Graph Neural Networks (GNNs) use relational information as an inductive bias to enhance the model's accuracy. As task-relevant relations might be unknown, graph structure learning approaches have been proposed to learn them while solving the downstream prediction task.
arXiv Detail & Related papers (2024-05-30T10:49:22Z)
Information-Theoretic Generalization Bounds for Transductive Learning and its Applications [16.408850979966623]
We develop generalization bounds for transductive learning algorithms in the context of information theory and PAC-Bayesian theory. Our theoretic results are validated on both synthetic and real-world datasets.
arXiv Detail & Related papers (2023-11-08T09:48:42Z)
Elastic Information Bottleneck [34.90040361806197]
Information bottleneck is an information-theoretic principle of representation learning. We propose an elastic information bottleneck (EIB) to interpolate between the IB and DIB regularizers. simulations and real data experiments show that EIB has the ability to achieve better domain adaptation results than IB and DIB.
arXiv Detail & Related papers (2023-11-07T12:53:55Z)
On the Joint Interaction of Models, Data, and Features [82.60073661644435]
We introduce a new tool, the interaction tensor, for empirically analyzing the interaction between data and model through features. Based on these observations, we propose a conceptual framework for feature learning. Under this framework, the expected accuracy for a single hypothesis and agreement for a pair of hypotheses can both be derived in closed-form.
arXiv Detail & Related papers (2023-06-07T21:35:26Z)
On the Generalization for Transfer Learning: An Information-Theoretic Analysis [8.102199960821165]
We give an information-theoretic analysis of the generalization error and excess risk of transfer learning algorithms. Our results suggest, perhaps as expected, that the Kullback-Leibler divergenceD(mu|mu')$ plays an important role in the characterizations. We then generalize the mutual information bound with other divergences such as $phi$-divergence and Wasserstein distance.
arXiv Detail & Related papers (2022-07-12T08:20:41Z)
Variational Distillation for Multi-View Learning [104.17551354374821]
We design several variational information bottlenecks to exploit two key characteristics for multi-view representation learning. Under rigorously theoretical guarantee, our approach enables IB to grasp the intrinsic correlation between observations and semantic labels.
arXiv Detail & Related papers (2022-06-20T03:09:46Z)
Learning Bias-Invariant Representation by Cross-Sample Mutual Information Minimization [77.8735802150511]
We propose a cross-sample adversarial debiasing (CSAD) method to remove the bias information misused by the target task. The correlation measurement plays a critical role in adversarial debiasing and is conducted by a cross-sample neural mutual information estimator. We conduct thorough experiments on publicly available datasets to validate the advantages of the proposed method over state-of-the-art approaches.
arXiv Detail & Related papers (2021-08-11T21:17:02Z)
Bounding Information Leakage in Machine Learning [26.64770573405079]
This paper investigates fundamental bounds on information leakage. We identify and bound the success rate of the worst-case membership inference attack. We derive bounds on the mutual information between the sensitive attributes and model parameters.
arXiv Detail & Related papers (2021-05-09T08:49:14Z)
Enhancing ensemble learning and transfer learning in multimodal data analysis by adaptive dimensionality reduction [10.646114896709717]
In multimodal data analysis, not all observations would show the same level of reliability or information quality. We propose an adaptive approach for dimensionality reduction to overcome this issue. We test our approach on multimodal datasets acquired in diverse research fields.
arXiv Detail & Related papers (2021-05-08T11:53:12Z)
Leveraging Unlabeled Data for Entity-Relation Extraction through Probabilistic Constraint Satisfaction [54.06292969184476]
We study the problem of entity-relation extraction in the presence of symbolic domain knowledge. Our approach employs semantic loss which captures the precise meaning of a logical sentence. With a focus on low-data regimes, we show that semantic loss outperforms the baselines by a wide margin.
arXiv Detail & Related papers (2021-03-20T00:16:29Z)
Information fusion between knowledge and data in Bayesian network structure learning [5.994412766684843]
This paper describes and evaluates a set of information fusion methods that have been implemented in the open-source Bayesys structure learning system. The results are illustrated both with limited and big data, with application to three BN structure learning algorithms available in Bayesys.
arXiv Detail & Related papers (2021-01-31T15:45:29Z)
Nonparametric Estimation of Heterogeneous Treatment Effects: From Theory to Learning Algorithms [91.3755431537592]
We analyze four broad meta-learning strategies which rely on plug-in estimation and pseudo-outcome regression. We highlight how this theoretical reasoning can be used to guide principled algorithm design and translate our analyses into practice.
arXiv Detail & Related papers (2021-01-26T17:11:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.