The Variational Bandwidth Bottleneck: Stochastic Evaluation on an
Information Budget
- URL: http://arxiv.org/abs/2004.11935v1
- Date: Fri, 24 Apr 2020 18:29:31 GMT
- Title: The Variational Bandwidth Bottleneck: Stochastic Evaluation on an
Information Budget
- Authors: Anirudh Goyal, Yoshua Bengio, Matthew Botvinick, Sergey Levine
- Abstract summary: In many applications, it is desirable to extract only the relevant information from complex input data.
The information bottleneck method formalizes this as an information-theoretic optimization problem.
We propose the variational bandwidth bottleneck, which decides for each example on the estimated value of the privileged information.
- Score: 164.65771897804404
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In many applications, it is desirable to extract only the relevant
information from complex input data, which involves making a decision about
which input features are relevant. The information bottleneck method formalizes
this as an information-theoretic optimization problem by maintaining an optimal
tradeoff between compression (throwing away irrelevant input information), and
predicting the target. In many problem settings, including the reinforcement
learning problems we consider in this work, we might prefer to compress only
part of the input. This is typically the case when we have a standard
conditioning input, such as a state observation, and a "privileged" input,
which might correspond to the goal of a task, the output of a costly planning
algorithm, or communication with another agent. In such cases, we might prefer
to compress the privileged input, either to achieve better generalization
(e.g., with respect to goals) or to minimize access to costly information
(e.g., in the case of communication). Practical implementations of the
information bottleneck based on variational inference require access to the
privileged input in order to compute the bottleneck variable, so although they
perform compression, this compression operation itself needs unrestricted,
lossless access. In this work, we propose the variational bandwidth bottleneck,
which decides for each example on the estimated value of the privileged
information before seeing it, i.e., only based on the standard input, and then
accordingly chooses stochastically, whether to access the privileged input or
not. We formulate a tractable approximation to this framework and demonstrate
in a series of reinforcement learning experiments that it can improve
generalization and reduce access to computationally costly information.
Related papers
- Differential error feedback for communication-efficient decentralized learning [48.924131251745266]
We propose a new decentralized communication-efficient learning approach that blends differential quantization with error feedback.
We show that the resulting communication-efficient strategy is stable both in terms of mean-square error and average bit rate.
The results establish that, in the small step-size regime and with a finite number of bits, it is possible to attain the performance achievable in the absence of compression.
arXiv Detail & Related papers (2024-06-26T15:11:26Z) - Incrementally-Computable Neural Networks: Efficient Inference for
Dynamic Inputs [75.40636935415601]
Deep learning often faces the challenge of efficiently processing dynamic inputs, such as sensor data or user inputs.
We take an incremental computing approach, looking to reuse calculations as the inputs change.
We apply this approach to the transformers architecture, creating an efficient incremental inference algorithm with complexity proportional to the fraction of modified inputs.
arXiv Detail & Related papers (2023-07-27T16:30:27Z) - Discrete Key-Value Bottleneck [95.61236311369821]
Deep neural networks perform well on classification tasks where data streams are i.i.d. and labeled data is abundant.
One powerful approach that has addressed this challenge involves pre-training of large encoders on volumes of readily available data, followed by task-specific tuning.
Given a new task, however, updating the weights of these encoders is challenging as a large number of weights needs to be fine-tuned, and as a result, they forget information about the previous tasks.
We propose a model architecture to address this issue, building upon a discrete bottleneck containing pairs of separate and learnable key-value codes.
arXiv Detail & Related papers (2022-07-22T17:52:30Z) - Robust Predictable Control [149.71263296079388]
We show that our method achieves much tighter compression than prior methods, achieving up to 5x higher reward than a standard information bottleneck.
We also demonstrate that our method learns policies that are more robust and generalize better to new tasks.
arXiv Detail & Related papers (2021-09-07T17:29:34Z) - Scalable Vector Gaussian Information Bottleneck [19.21005180893519]
We study a variation of the problem, called scalable information bottleneck, in which the encoder outputs multiple descriptions of the observation.
We derive a variational inference type algorithm for general sources with unknown distribution; and show means of parametrizing it using neural networks.
arXiv Detail & Related papers (2021-02-15T12:51:26Z) - On the Relevance-Complexity Region of Scalable Information Bottleneck [15.314757778110955]
We study a variation of the problem, called scalable information bottleneck, where the encoder outputs multiple descriptions of the observation.
The problem at hand is motivated by some application scenarios that require varying levels of accuracy depending on the allowed level of generalization.
arXiv Detail & Related papers (2020-11-02T22:25:28Z) - Learning Optimal Representations with the Decodable Information
Bottleneck [43.30367159353152]
In machine learning, our goal is not compression but rather generalization, which is intimately linked to the predictive family or decoder of interest.
We propose the Decodable Information Bottleneck (DIB) that considers information retention and compression from the perspective of the desired predictive family.
As a result, DIB gives rise to representations that are optimal in terms of expected test performance and can be estimated with guarantees.
arXiv Detail & Related papers (2020-09-27T08:33:08Z) - Information-theoretic User Interaction: Significant Inputs for Program
Synthesis [11.473616777800318]
We introduce the em significant questions problem, and show that it is hard in general.
We develop an information-theoretic greedy approach for solving the problem.
In the context of interactive program synthesis, we use the above result to develop an emactive program learner
Our active learner is able to tradeoff false negatives for false positives and converge in a small number of iterations on a real-world dataset.
arXiv Detail & Related papers (2020-06-22T21:46:40Z) - Focus of Attention Improves Information Transfer in Visual Features [80.22965663534556]
This paper focuses on unsupervised learning for transferring visual information in a truly online setting.
The computation of the entropy terms is carried out by a temporal process which yields online estimation of the entropy terms.
In order to better structure the input probability distribution, we use a human-like focus of attention model.
arXiv Detail & Related papers (2020-06-16T15:07:25Z) - On the Information Bottleneck Problems: Models, Connections,
Applications and Information Theoretic Views [39.49498500593645]
This tutorial paper focuses on the variants of the bottleneck problem taking an information theoretic perspective.
It discusses practical methods to solve it, as well as its connection to coding and learning aspects.
arXiv Detail & Related papers (2020-01-31T15:23:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.