The Information Bottleneck's Ordinary Differential Equation: First-Order
Root-Tracking for the IB
- URL: http://arxiv.org/abs/2306.09790v2
- Date: Tue, 25 Jul 2023 19:07:07 GMT
- Title: The Information Bottleneck's Ordinary Differential Equation: First-Order
Root-Tracking for the IB
- Authors: Shlomi Agmon
- Abstract summary: The Information Bottleneck (IB) is a method of lossy compression of relevant information.
We exploit the dynamics underlying the IB's optimal tradeoff curve.
We translate an understanding of IB bifurcations into a surprisingly accurate numerical algorithm.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Information Bottleneck (IB) is a method of lossy compression of relevant
information. Its rate-distortion (RD) curve describes the fundamental tradeoff
between input compression and the preservation of relevant information embedded
in the input. However, it conceals the underlying dynamics of optimal input
encodings. We argue that these typically follow a piecewise smooth trajectory
when input information is being compressed, as recently shown in RD. These
smooth dynamics are interrupted when an optimal encoding changes qualitatively,
at a bifurcation. By leveraging the IB's intimate relations with RD, we provide
substantial insights into its solution structure, highlighting caveats in its
finite-dimensional treatments. Sub-optimal solutions are seen to collide or
exchange optimality at its bifurcations.
Despite the acceptance of the IB and its applications, there are surprisingly
few techniques to solve it numerically, even for finite problems whose
distribution is known. We derive anew the IB's first-order Ordinary
Differential Equation, which describes the dynamics underlying its optimal
tradeoff curve. To exploit these dynamics, we not only detect IB bifurcations
but also identify their type in order to handle them accordingly. Rather than
approaching the IB's optimal curve from sub-optimal directions, the latter
allows us to follow a solution's trajectory along the optimal curve under mild
assumptions. We thereby translate an understanding of IB bifurcations into a
surprisingly accurate numerical algorithm.
Related papers
- Differentially Private Optimization with Sparse Gradients [60.853074897282625]
We study differentially private (DP) optimization problems under sparsity of individual gradients.
Building on this, we obtain pure- and approximate-DP algorithms with almost optimal rates for convex optimization with sparse gradients.
arXiv Detail & Related papers (2024-04-16T20:01:10Z) - Disentangled Representation Learning with Transmitted Information Bottleneck [57.22757813140418]
We present textbfDisTIB (textbfTransmitted textbfInformation textbfBottleneck for textbfDisd representation learning), a novel objective that navigates the balance between information compression and preservation.
arXiv Detail & Related papers (2023-11-03T03:18:40Z) - Kernel-Whitening: Overcome Dataset Bias with Isotropic Sentence
Embedding [51.48582649050054]
We propose a representation normalization method which aims at disentangling the correlations between features of encoded sentences.
We also propose Kernel-Whitening, a Nystrom kernel approximation method to achieve more thorough debiasing on nonlinear spurious correlations.
Experiments show that Kernel-Whitening significantly improves the performance of BERT on out-of-distribution datasets while maintaining in-distribution accuracy.
arXiv Detail & Related papers (2022-10-14T05:56:38Z) - Perturbation Theory for the Information Bottleneck [6.117084972237769]
Information bottleneck (IB) method formalizes extracting relevant information from data.
nonlinearity of the IB problem makes it computationally expensive and analytically intractable in general.
We derive a perturbation theory for the IB method and report the first complete characterization of the learning onset.
arXiv Detail & Related papers (2021-05-28T16:59:01Z) - Physics-aware deep neural networks for surrogate modeling of turbulent
natural convection [0.0]
We investigate the use of PINNs surrogate modeling for turbulent Rayleigh-B'enard convection flows.
We show how it comes to play as a regularization close to the training boundaries which are zones of poor accuracy for standard PINNs.
The predictive accuracy of the surrogate over the entire half a billion DNS coordinates yields errors for all flow variables ranging between [0.3% -- 4%] in the relative L 2 norm.
arXiv Detail & Related papers (2021-03-05T09:48:57Z) - Adversarial Information Bottleneck [2.66512000865131]
The information bottleneck (IB) principle has been adopted to explain deep learning in terms of information compression and prediction.
Previous methods attempted to optimize the IB principle by introducing random noise into learning the representation.
We propose an adversarial information bottleneck (AIB) method without any explicit assumptions about the underlying distribution of the representations.
arXiv Detail & Related papers (2021-02-28T03:14:56Z) - Efficient Semi-Implicit Variational Inference [65.07058307271329]
We propose an efficient and scalable semi-implicit extrapolational (SIVI)
Our method maps SIVI's evidence to a rigorous inference of lower gradient values.
arXiv Detail & Related papers (2021-01-15T11:39:09Z) - Disentangled Information Bottleneck [22.587164077221917]
We introduce Disentangled Information Bottleneck (DisenIB) that is consistent on compressing source maximally without target prediction performance loss.
Our method is consistent on maximum compression, and performs well in terms of generalization, robustness to adversarial attack, out-of-distribution detection, and supervised disentangling.
arXiv Detail & Related papers (2020-12-14T09:44:07Z) - BAMSProd: A Step towards Generalizing the Adaptive Optimization Methods
to Deep Binary Model [34.093978443640616]
Recent methods have significantly reduced the performance of Binary Neural Networks (BNNs)
guaranteeing the effective and efficient training of BNNs is an unsolved problem.
We propose a BAMSProd algorithm with a key observation that the convergence property of optimizing deep binary model is strongly related to the quantization errors.
arXiv Detail & Related papers (2020-09-29T06:12:32Z) - Linear Convergent Decentralized Optimization with Compression [50.44269451541387]
Existing decentralized algorithms with compression mainly focus on compressing DGD-type algorithms.
Motivated by primal-dual algorithms, this paper proposes first underlineLinunderlineEAr convergent.
underlineDecentralized with compression, LEAD.
arXiv Detail & Related papers (2020-07-01T04:35:00Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.