On the Difference Between the Information Bottleneck and the Deep
Information Bottleneck
- URL: http://arxiv.org/abs/1912.13480v1
- Date: Tue, 31 Dec 2019 18:31:42 GMT
- Title: On the Difference Between the Information Bottleneck and the Deep
Information Bottleneck
- Authors: Aleksander Wieczorek and Volker Roth
- Abstract summary: We revisit the Deep Variational Information Bottleneck and the assumptions needed for its derivation.
We show how to circumvent this limitation by optimising a lower bound for $I(T;Y)$ for which only the latter Markov chain has to be satisfied.
- Score: 81.89141311906552
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Combining the Information Bottleneck model with deep learning by replacing
mutual information terms with deep neural nets has proved successful in areas
ranging from generative modelling to interpreting deep neural networks. In this
paper, we revisit the Deep Variational Information Bottleneck and the
assumptions needed for its derivation. The two assumed properties of the data
$X$, $Y$ and their latent representation $T$ take the form of two Markov chains
$T-X-Y$ and $X-T-Y$. Requiring both to hold during the optimisation process can
be limiting for the set of potential joint distributions $P(X,Y,T)$. We
therefore show how to circumvent this limitation by optimising a lower bound
for $I(T;Y)$ for which only the latter Markov chain has to be satisfied. The
actual mutual information consists of the lower bound which is optimised in
DVIB and cognate models in practice and of two terms measuring how much the
former requirement $T-X-Y$ is violated. Finally, we propose to interpret the
family of information bottleneck models as directed graphical models and show
that in this framework the original and deep information bottlenecks are
special cases of a fundamental IB model.
Related papers
- $α$-TCVAE: On the relationship between Disentanglement and Diversity [21.811889512977924]
In this work, we introduce $alpha$-TCVAE, a variational autoencoder optimized using a novel total correlation (TC) lower bound.
We present quantitative analyses that support the idea that disentangled representations lead to better generative capabilities and diversity.
Our results demonstrate that $alpha$-TCVAE consistently learns more disentangled representations than baselines and generates more diverse observations.
arXiv Detail & Related papers (2024-11-01T13:50:06Z) - Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative
Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models.
In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z) - Gibbs-Based Information Criteria and the Over-Parameterized Regime [20.22034560278484]
Double-descent refers to the unexpected drop in test loss of a learning algorithm beyond an interpolating threshold.
We update these analyses using the information risk minimization framework and provide Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) for models learned by the Gibbs algorithm.
arXiv Detail & Related papers (2023-06-08T22:54:48Z) - Improving Robustness and Uncertainty Modelling in Neural Ordinary
Differential Equations [0.2538209532048866]
We propose a novel approach to model uncertainty in NODE by considering a distribution over the end-time $T$ of the ODE solver.
We also propose, adaptive latent time NODE (ALT-NODE), which allow each data point to have a distinct posterior distribution over end-times.
We demonstrate the effectiveness of the proposed approaches in modelling uncertainty and robustness through experiments on synthetic and several real-world image classification data.
arXiv Detail & Related papers (2021-12-23T16:56:10Z) - Besov Function Approximation and Binary Classification on
Low-Dimensional Manifolds Using Convolutional Residual Networks [42.43493635899849]
We establish theoretical guarantees of convolutional residual networks (ConvResNet) in terms of function approximation and statistical estimation for binary classification.
Our results demonstrate that ConvResNets are adaptive to low-dimensional structures of data sets.
arXiv Detail & Related papers (2021-09-07T02:58:11Z) - Analysis of feature learning in weight-tied autoencoders via the mean
field lens [3.553493344868413]
We analyze a class of two-layer weight-tied nonlinear autoencoders in the mean field framework.
Models trained with gradient descent are shown to admit a mean field limiting dynamics.
Experiments on real-life data demonstrate an interesting match with the theory.
arXiv Detail & Related papers (2021-02-16T18:58:37Z) - Improving Robustness and Generality of NLP Models Using Disentangled
Representations [62.08794500431367]
Supervised neural networks first map an input $x$ to a single representation $z$, and then map $z$ to the output label $y$.
We present methods to improve robustness and generality of NLP models from the standpoint of disentangled representation learning.
We show that models trained with the proposed criteria provide better robustness and domain adaptation ability in a wide range of supervised learning tasks.
arXiv Detail & Related papers (2020-09-21T02:48:46Z) - Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts.
We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively.
Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively.
Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z) - The Information Bottleneck Problem and Its Applications in Machine
Learning [53.57797720793437]
Inference capabilities of machine learning systems skyrocketed in recent years, now playing a pivotal role in various aspect of society.
The information bottleneck (IB) theory emerged as a bold information-theoretic paradigm for analyzing deep learning (DL) systems.
In this tutorial we survey the information-theoretic origins of this abstract principle, and its recent impact on DL.
arXiv Detail & Related papers (2020-04-30T16:48:51Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.