Scalable Vector Gaussian Information Bottleneck
- URL: http://arxiv.org/abs/2102.07525v1
- Date: Mon, 15 Feb 2021 12:51:26 GMT
- Title: Scalable Vector Gaussian Information Bottleneck
- Authors: Mohammad Mahdi Mahvari and Mari Kobayashi and Abdellatif Zaidi
- Abstract summary: We study a variation of the problem, called scalable information bottleneck, in which the encoder outputs multiple descriptions of the observation.
We derive a variational inference type algorithm for general sources with unknown distribution; and show means of parametrizing it using neural networks.
- Score: 19.21005180893519
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the context of statistical learning, the Information Bottleneck method
seeks a right balance between accuracy and generalization capability through a
suitable tradeoff between compression complexity, measured by minimum
description length, and distortion evaluated under logarithmic loss measure. In
this paper, we study a variation of the problem, called scalable information
bottleneck, in which the encoder outputs multiple descriptions of the
observation with increasingly richer features. The model, which is of
successive-refinement type with degraded side information streams at the
decoders, is motivated by some application scenarios that require varying
levels of accuracy depending on the allowed (or targeted) level of complexity.
We establish an analytic characterization of the optimal relevance-complexity
region for vector Gaussian sources. Then, we derive a variational inference
type algorithm for general sources with unknown distribution; and show means of
parametrizing it using neural networks. Finally, we provide experimental
results on the MNIST dataset which illustrate that the proposed method
generalizes better to unseen data during the training phase.
Related papers
- Localized Gaussians as Self-Attention Weights for Point Clouds Correspondence [92.07601770031236]
We investigate semantically meaningful patterns in the attention heads of an encoder-only Transformer architecture.
We find that fixing the attention weights not only accelerates the training process but also enhances the stability of the optimization.
arXiv Detail & Related papers (2024-09-20T07:41:47Z) - Gradient-Based Feature Learning under Structured Data [57.76552698981579]
In the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction.
We show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue.
In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent.
arXiv Detail & Related papers (2023-09-07T16:55:50Z) - GIT: Detecting Uncertainty, Out-Of-Distribution and Adversarial Samples
using Gradients and Invariance Transformations [77.34726150561087]
We propose a holistic approach for the detection of generalization errors in deep neural networks.
GIT combines the usage of gradient information and invariance transformations.
Our experiments demonstrate the superior performance of GIT compared to the state-of-the-art on a variety of network architectures.
arXiv Detail & Related papers (2023-07-05T22:04:38Z) - IB-UQ: Information bottleneck based uncertainty quantification for
neural function regression and neural operator learning [11.5992081385106]
We propose a novel framework for uncertainty quantification via information bottleneck (IB-UQ) for scientific machine learning tasks.
We incorporate the bottleneck by a confidence-aware encoder, which encodes inputs into latent representations according to the confidence of the input data.
We also propose a data augmentation based information bottleneck objective which can enhance the quality of the extrapolation uncertainty.
arXiv Detail & Related papers (2023-02-07T05:56:42Z) - Information bottleneck theory of high-dimensional regression: relevancy,
efficiency and optimality [6.700873164609009]
Overfitting is a central challenge in machine learning, yet many large neural networks readily achieve zero training loss.
We quantify overfitting via residual information, defined as the bits in fitted models that encode noise in training data.
arXiv Detail & Related papers (2022-08-08T00:09:12Z) - Nonlinear Isometric Manifold Learning for Injective Normalizing Flows [58.720142291102135]
We use isometries to separate manifold learning and density estimation.
We also employ autoencoders to design embeddings with explicit inverses that do not distort the probability distribution.
arXiv Detail & Related papers (2022-03-08T08:57:43Z) - Learning Optical Flow from a Few Matches [67.83633948984954]
We show that the dense correlation volume representation is redundant and accurate flow estimation can be achieved with only a fraction of elements in it.
Experiments show that our method can reduce computational cost and memory use significantly, while maintaining high accuracy.
arXiv Detail & Related papers (2021-04-05T21:44:00Z) - On the Relevance-Complexity Region of Scalable Information Bottleneck [15.314757778110955]
We study a variation of the problem, called scalable information bottleneck, where the encoder outputs multiple descriptions of the observation.
The problem at hand is motivated by some application scenarios that require varying levels of accuracy depending on the allowed level of generalization.
arXiv Detail & Related papers (2020-11-02T22:25:28Z) - Information Theory Measures via Multidimensional Gaussianization [7.788961560607993]
Information theory is an outstanding framework to measure uncertainty, dependence and relevance in data and systems.
It has several desirable properties for real world applications.
However, obtaining information from multidimensional data is a challenging problem due to the curse of dimensionality.
arXiv Detail & Related papers (2020-10-08T07:22:16Z) - Evaluating representations by the complexity of learning low-loss
predictors [55.94170724668857]
We consider the problem of evaluating representations of data for use in solving a downstream task.
We propose to measure the quality of a representation by the complexity of learning a predictor on top of the representation that achieves low loss on a task of interest.
arXiv Detail & Related papers (2020-09-15T22:06:58Z) - Anomaly Detection in Trajectory Data with Normalizing Flows [0.0]
We propose an approach based on normalizing flows that enables complex density estimation from data with neural networks.
Our proposal computes exact model likelihood values, an important feature of normalizing flows, for each segment of the trajectory.
We evaluate our methodology, named aggregated anomaly detection with normalizing flows (GRADINGS), using real world trajectory data and compare it with more traditional anomaly detection techniques.
arXiv Detail & Related papers (2020-04-13T14:16:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.