Learning Optimal Representations with the Decodable Information
Bottleneck
- URL: http://arxiv.org/abs/2009.12789v2
- Date: Fri, 16 Jul 2021 09:22:20 GMT
- Title: Learning Optimal Representations with the Decodable Information
Bottleneck
- Authors: Yann Dubois, Douwe Kiela, David J. Schwab, Ramakrishna Vedantam
- Abstract summary: In machine learning, our goal is not compression but rather generalization, which is intimately linked to the predictive family or decoder of interest.
We propose the Decodable Information Bottleneck (DIB) that considers information retention and compression from the perspective of the desired predictive family.
As a result, DIB gives rise to representations that are optimal in terms of expected test performance and can be estimated with guarantees.
- Score: 43.30367159353152
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address the question of characterizing and finding optimal representations
for supervised learning. Traditionally, this question has been tackled using
the Information Bottleneck, which compresses the inputs while retaining
information about the targets, in a decoder-agnostic fashion. In machine
learning, however, our goal is not compression but rather generalization, which
is intimately linked to the predictive family or decoder of interest (e.g.
linear classifier). We propose the Decodable Information Bottleneck (DIB) that
considers information retention and compression from the perspective of the
desired predictive family. As a result, DIB gives rise to representations that
are optimal in terms of expected test performance and can be estimated with
guarantees. Empirically, we show that the framework can be used to enforce a
small generalization gap on downstream classifiers and to predict the
generalization ability of neural networks.
Related papers
- XAL: EXplainable Active Learning Makes Classifiers Better Low-resource Learners [71.8257151788923]
We propose a novel Explainable Active Learning framework (XAL) for low-resource text classification.
XAL encourages classifiers to justify their inferences and delve into unlabeled data for which they cannot provide reasonable explanations.
Experiments on six datasets show that XAL achieves consistent improvement over 9 strong baselines.
arXiv Detail & Related papers (2023-10-09T08:07:04Z) - Prediction-Oriented Bayesian Active Learning [51.426960808684655]
Expected predictive information gain (EPIG) is an acquisition function that measures information gain in the space of predictions rather than parameters.
EPIG leads to stronger predictive performance compared with BALD across a range of datasets and models.
arXiv Detail & Related papers (2023-04-17T10:59:57Z) - FUNCK: Information Funnels and Bottlenecks for Invariant Representation
Learning [7.804994311050265]
We investigate a set of related information funnels and bottleneck problems that claim to learn invariant representations from the data.
We propose a new element to this family of information-theoretic objectives: The Conditional Privacy Funnel with Side Information.
Given the generally intractable objectives, we derive tractable approximations using amortized variational inference parameterized by neural networks.
arXiv Detail & Related papers (2022-11-02T19:37:55Z) - Self-Supervised Learning via Maximum Entropy Coding [57.56570417545023]
We propose Maximum Entropy Coding (MEC) as a principled objective that explicitly optimize on the structure of the representation.
MEC learns a more generalizable representation than previous methods based on specific pretext tasks.
It achieves state-of-the-art performance consistently on various downstream tasks, including not only ImageNet linear probe, but also semi-supervised classification, object detection, instance segmentation, and object tracking.
arXiv Detail & Related papers (2022-10-20T17:58:30Z) - Interpretable by Design: Learning Predictors by Composing Interpretable
Queries [8.054701719767293]
We argue that machine learning algorithms should be interpretable by design.
We minimize the expected number of queries needed for accurate prediction.
Experiments on vision and NLP tasks demonstrate the efficacy of our approach.
arXiv Detail & Related papers (2022-07-03T02:40:34Z) - Approximability and Generalisation [0.0]
We study the role of approximability in learning, both in the full precision and the approximated settings of the predictor.
We show that under mild conditions, approximable target concepts are learnable from a smaller labelled sample.
We give algorithms that guarantee a good predictor whose approximation also enjoys the same generalisation guarantees.
arXiv Detail & Related papers (2022-03-15T15:21:48Z) - Representation Learning for Sequence Data with Deep Autoencoding
Predictive Components [96.42805872177067]
We propose a self-supervised representation learning method for sequence data, based on the intuition that useful representations of sequence data should exhibit a simple structure in the latent space.
We encourage this latent structure by maximizing an estimate of predictive information of latent feature sequences, which is the mutual information between past and future windows at each time step.
We demonstrate that our method recovers the latent space of noisy dynamical systems, extracts predictive features for forecasting tasks, and improves automatic speech recognition when used to pretrain the encoder on large amounts of unlabeled data.
arXiv Detail & Related papers (2020-10-07T03:34:01Z) - The Variational Bandwidth Bottleneck: Stochastic Evaluation on an
Information Budget [164.65771897804404]
In many applications, it is desirable to extract only the relevant information from complex input data.
The information bottleneck method formalizes this as an information-theoretic optimization problem.
We propose the variational bandwidth bottleneck, which decides for each example on the estimated value of the privileged information.
arXiv Detail & Related papers (2020-04-24T18:29:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.