The Information Bottleneck Problem and Its Applications in Machine
Learning
- URL: http://arxiv.org/abs/2004.14941v2
- Date: Fri, 1 May 2020 16:24:03 GMT
- Title: The Information Bottleneck Problem and Its Applications in Machine
Learning
- Authors: Ziv Goldfeld and Yury Polyanskiy
- Abstract summary: Inference capabilities of machine learning systems skyrocketed in recent years, now playing a pivotal role in various aspect of society.
The information bottleneck (IB) theory emerged as a bold information-theoretic paradigm for analyzing deep learning (DL) systems.
In this tutorial we survey the information-theoretic origins of this abstract principle, and its recent impact on DL.
- Score: 53.57797720793437
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Inference capabilities of machine learning (ML) systems skyrocketed in recent
years, now playing a pivotal role in various aspect of society. The goal in
statistical learning is to use data to obtain simple algorithms for predicting
a random variable $Y$ from a correlated observation $X$. Since the dimension of
$X$ is typically huge, computationally feasible solutions should summarize it
into a lower-dimensional feature vector $T$, from which $Y$ is predicted. The
algorithm will successfully make the prediction if $T$ is a good proxy of $Y$,
despite the said dimensionality-reduction. A myriad of ML algorithms (mostly
employing deep learning (DL)) for finding such representations $T$ based on
real-world data are now available. While these methods are often effective in
practice, their success is hindered by the lack of a comprehensive theory to
explain it. The information bottleneck (IB) theory recently emerged as a bold
information-theoretic paradigm for analyzing DL systems. Adopting mutual
information as the figure of merit, it suggests that the best representation
$T$ should be maximally informative about $Y$ while minimizing the mutual
information with $X$. In this tutorial we survey the information-theoretic
origins of this abstract principle, and its recent impact on DL. For the
latter, we cover implications of the IB problem on DL theory, as well as
practical algorithms inspired by it. Our goal is to provide a unified and
cohesive description. A clear view of current knowledge is particularly
important for further leveraging IB and other information-theoretic ideas to
study DL models.
Related papers
- Sample-Efficient Linear Representation Learning from Non-IID Non-Isotropic Data [4.971690889257356]
We introduce an adaptation of the alternating minimization-descent scheme proposed by Collins and Nayer and Vaswani.
We show that vanilla alternating-minimization descent fails catastrophically even for iid, but mildly non-isotropic data.
Our analysis unifies and generalizes prior work, and provides a flexible framework for a wider range of applications.
arXiv Detail & Related papers (2023-08-08T17:56:20Z) - Effective Minkowski Dimension of Deep Nonparametric Regression: Function
Approximation and Statistical Theories [70.90012822736988]
Existing theories on deep nonparametric regression have shown that when the input data lie on a low-dimensional manifold, deep neural networks can adapt to intrinsic data structures.
This paper introduces a relaxed assumption that input data are concentrated around a subset of $mathbbRd$ denoted by $mathcalS$, and the intrinsic dimension $mathcalS$ can be characterized by a new complexity notation -- effective Minkowski dimension.
arXiv Detail & Related papers (2023-06-26T17:13:31Z) - Spatially heterogeneous learning by a deep student machine [0.0]
Deep neural networks (DNN) with a huge number of adjustable parameters remain largely black boxes.
We study supervised learning by a DNN of width $N$ and depth $L$ consisting of $NL$ perceptrons with $c$ inputs by a statistical mechanics approach called the teacher-student setting.
We show that the problem becomes exactly solvable in what we call as 'dense limit': $N gg c gg 1$ and $M gg 1$ with fixed $alpha=M/c$ using the replica method developed in (H. Yoshino, (
arXiv Detail & Related papers (2023-02-15T01:09:03Z) - Analysis of feature learning in weight-tied autoencoders via the mean
field lens [3.553493344868413]
We analyze a class of two-layer weight-tied nonlinear autoencoders in the mean field framework.
Models trained with gradient descent are shown to admit a mean field limiting dynamics.
Experiments on real-life data demonstrate an interesting match with the theory.
arXiv Detail & Related papers (2021-02-16T18:58:37Z) - Budget Learning via Bracketing [50.085728094234476]
The budget learning problem poses the learner's goal as minimising use of the cloud while suffering no discernible loss in accuracy.
We propose a new formulation for the BL problem via the concept of bracketings.
We empirically validate our theory on real-world datasets, demonstrating improved performance over prior gating based methods.
arXiv Detail & Related papers (2020-04-14T04:38:14Z) - Towards Deep Learning Models Resistant to Large Perturbations [0.0]
Adversarial robustness has proven to be a required property of machine learning algorithms.
We show that the well-established algorithm called "adversarial training" fails to train a deep neural network given a large, but reasonable, perturbation magnitude.
arXiv Detail & Related papers (2020-03-30T12:03:09Z) - An Information-Theoretic Approach to Personalized Explainable Machine
Learning [92.53970625312665]
We propose a simple probabilistic model for the predictions and user knowledge.
We quantify the effect of an explanation by the conditional mutual information between the explanation and prediction.
arXiv Detail & Related papers (2020-03-01T13:06:29Z) - A Theory of Usable Information Under Computational Constraints [103.5901638681034]
We propose a new framework for reasoning about information in complex systems.
Our foundation is based on a variational extension of Shannon's information theory.
We show that by incorporating computational constraints, $mathcalV$-information can be reliably estimated from data.
arXiv Detail & Related papers (2020-02-25T06:09:30Z) - On the Difference Between the Information Bottleneck and the Deep
Information Bottleneck [81.89141311906552]
We revisit the Deep Variational Information Bottleneck and the assumptions needed for its derivation.
We show how to circumvent this limitation by optimising a lower bound for $I(T;Y)$ for which only the latter Markov chain has to be satisfied.
arXiv Detail & Related papers (2019-12-31T18:31:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.