Related papers: The Variational Bandwidth Bottleneck: Stochastic Evaluation on an Information Budget

The Variational Bandwidth Bottleneck: Stochastic Evaluation on an Information Budget

URL: http://arxiv.org/abs/2004.11935v1
Date: Fri, 24 Apr 2020 18:29:31 GMT
Title: The Variational Bandwidth Bottleneck: Stochastic Evaluation on an Information Budget
Authors: Anirudh Goyal, Yoshua Bengio, Matthew Botvinick, Sergey Levine
Abstract summary: In many applications, it is desirable to extract only the relevant information from complex input data. The information bottleneck method formalizes this as an information-theoretic optimization problem. We propose the variational bandwidth bottleneck, which decides for each example on the estimated value of the privileged information.
Score: 164.65771897804404
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In many applications, it is desirable to extract only the relevant information from complex input data, which involves making a decision about which input features are relevant. The information bottleneck method formalizes this as an information-theoretic optimization problem by maintaining an optimal tradeoff between compression (throwing away irrelevant input information), and predicting the target. In many problem settings, including the reinforcement learning problems we consider in this work, we might prefer to compress only part of the input. This is typically the case when we have a standard conditioning input, such as a state observation, and a "privileged" input, which might correspond to the goal of a task, the output of a costly planning algorithm, or communication with another agent. In such cases, we might prefer to compress the privileged input, either to achieve better generalization (e.g., with respect to goals) or to minimize access to costly information (e.g., in the case of communication). Practical implementations of the information bottleneck based on variational inference require access to the privileged input in order to compute the bottleneck variable, so although they perform compression, this compression operation itself needs unrestricted, lossless access. In this work, we propose the variational bandwidth bottleneck, which decides for each example on the estimated value of the privileged information before seeing it, i.e., only based on the standard input, and then accordingly chooses stochastically, whether to access the privileged input or not. We formulate a tractable approximation to this framework and demonstrate in a series of reinforcement learning experiments that it can improve generalization and reduce access to computationally costly information.

Related papers

The Condition Number as a Scale-Invariant Proxy for Information Encoding in Neural Units [0.0]
We argue that a high condition number, though not sufficient for effective knowledge encoding, may indicate that the unit has learned to selectively amplify and compress information.<n>We present a practical case study where these principles are applied to guide selective fine-tuning of a multimodal Large Language Model.
arXiv Detail & Related papers (2025-06-19T13:06:16Z)
Differential error feedback for communication-efficient decentralized learning [48.924131251745266]
We propose a new decentralized communication-efficient learning approach that blends differential quantization with error feedback. We show that the resulting communication-efficient strategy is stable both in terms of mean-square error and average bit rate. The results establish that, in the small step-size regime and with a finite number of bits, it is possible to attain the performance achievable in the absence of compression.
arXiv Detail & Related papers (2024-06-26T15:11:26Z)
Incrementally-Computable Neural Networks: Efficient Inference for Dynamic Inputs [75.40636935415601]
Deep learning often faces the challenge of efficiently processing dynamic inputs, such as sensor data or user inputs. We take an incremental computing approach, looking to reuse calculations as the inputs change. We apply this approach to the transformers architecture, creating an efficient incremental inference algorithm with complexity proportional to the fraction of modified inputs.
arXiv Detail & Related papers (2023-07-27T16:30:27Z)
Discrete Key-Value Bottleneck [95.61236311369821]
Deep neural networks perform well on classification tasks where data streams are i.i.d. and labeled data is abundant. One powerful approach that has addressed this challenge involves pre-training of large encoders on volumes of readily available data, followed by task-specific tuning. Given a new task, however, updating the weights of these encoders is challenging as a large number of weights needs to be fine-tuned, and as a result, they forget information about the previous tasks. We propose a model architecture to address this issue, building upon a discrete bottleneck containing pairs of separate and learnable key-value codes.
arXiv Detail & Related papers (2022-07-22T17:52:30Z)
Robust Predictable Control [149.71263296079388]
We show that our method achieves much tighter compression than prior methods, achieving up to 5x higher reward than a standard information bottleneck. We also demonstrate that our method learns policies that are more robust and generalize better to new tasks.
arXiv Detail & Related papers (2021-09-07T17:29:34Z)
Scalable Vector Gaussian Information Bottleneck [19.21005180893519]
We study a variation of the problem, called scalable information bottleneck, in which the encoder outputs multiple descriptions of the observation. We derive a variational inference type algorithm for general sources with unknown distribution; and show means of parametrizing it using neural networks.
arXiv Detail & Related papers (2021-02-15T12:51:26Z)
On the Relevance-Complexity Region of Scalable Information Bottleneck [15.314757778110955]
We study a variation of the problem, called scalable information bottleneck, where the encoder outputs multiple descriptions of the observation. The problem at hand is motivated by some application scenarios that require varying levels of accuracy depending on the allowed level of generalization.
arXiv Detail & Related papers (2020-11-02T22:25:28Z)
Learning Optimal Representations with the Decodable Information Bottleneck [43.30367159353152]
In machine learning, our goal is not compression but rather generalization, which is intimately linked to the predictive family or decoder of interest. We propose the Decodable Information Bottleneck (DIB) that considers information retention and compression from the perspective of the desired predictive family. As a result, DIB gives rise to representations that are optimal in terms of expected test performance and can be estimated with guarantees.
arXiv Detail & Related papers (2020-09-27T08:33:08Z)
Information-theoretic User Interaction: Significant Inputs for Program Synthesis [11.473616777800318]
We introduce the em significant questions problem, and show that it is hard in general. We develop an information-theoretic greedy approach for solving the problem. In the context of interactive program synthesis, we use the above result to develop an emactive program learner Our active learner is able to tradeoff false negatives for false positives and converge in a small number of iterations on a real-world dataset.
arXiv Detail & Related papers (2020-06-22T21:46:40Z)
Focus of Attention Improves Information Transfer in Visual Features [80.22965663534556]
This paper focuses on unsupervised learning for transferring visual information in a truly online setting. The computation of the entropy terms is carried out by a temporal process which yields online estimation of the entropy terms. In order to better structure the input probability distribution, we use a human-like focus of attention model.
arXiv Detail & Related papers (2020-06-16T15:07:25Z)
On the Information Bottleneck Problems: Models, Connections, Applications and Information Theoretic Views [39.49498500593645]
This tutorial paper focuses on the variants of the bottleneck problem taking an information theoretic perspective. It discusses practical methods to solve it, as well as its connection to coding and learning aspects.
arXiv Detail & Related papers (2020-01-31T15:23:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.