Disentangled Information Bottleneck
- URL: http://arxiv.org/abs/2012.07372v3
- Date: Tue, 22 Dec 2020 03:06:22 GMT
- Title: Disentangled Information Bottleneck
- Authors: Ziqi Pan, Li Niu, Jianfu Zhang, Liqing Zhang
- Abstract summary: We introduce Disentangled Information Bottleneck (DisenIB) that is consistent on compressing source maximally without target prediction performance loss.
Our method is consistent on maximum compression, and performs well in terms of generalization, robustness to adversarial attack, out-of-distribution detection, and supervised disentangling.
- Score: 22.587164077221917
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: The information bottleneck (IB) method is a technique for extracting
information that is relevant for predicting the target random variable from the
source random variable, which is typically implemented by optimizing the IB
Lagrangian that balances the compression and prediction terms. However, the IB
Lagrangian is hard to optimize, and multiple trials for tuning values of
Lagrangian multiplier are required. Moreover, we show that the prediction
performance strictly decreases as the compression gets stronger during
optimizing the IB Lagrangian. In this paper, we implement the IB method from
the perspective of supervised disentangling. Specifically, we introduce
Disentangled Information Bottleneck (DisenIB) that is consistent on compressing
source maximally without target prediction performance loss (maximum
compression). Theoretical and experimental results demonstrate that our method
is consistent on maximum compression, and performs well in terms of
generalization, robustness to adversarial attack, out-of-distribution
detection, and supervised disentangling.
Related papers
- Gradient Normalization with(out) Clipping Ensures Convergence of Nonconvex SGD under Heavy-Tailed Noise with Improved Results [60.92029979853314]
This paper investigates Gradient Normalization without (NSGDC) its gradient reduction variant (NSGDC-VR)
We present significant improvements in the theoretical results for both algorithms.
arXiv Detail & Related papers (2024-10-21T22:40:42Z) - Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - Stochastic Bayesian Optimization with Unknown Continuous Context
Distribution via Kernel Density Estimation [28.413085548038932]
We propose two algorithms that employ kernel density estimation to learn the probability density function (PDF) of continuous context variable online.
Theoretical results demonstrate that both algorithms have sub-linear Bayesian cumulative regret on the expectation objective.
arXiv Detail & Related papers (2023-12-16T11:32:28Z) - Robust Stochastic Optimization via Gradient Quantile Clipping [6.2844649973308835]
We introduce a quant clipping strategy for Gradient Descent (SGD)
We use gradient new outliers as norm clipping chains.
We propose an implementation of the algorithm using Huberiles.
arXiv Detail & Related papers (2023-09-29T15:24:48Z) - Towards Efficient and Accurate Approximation: Tensor Decomposition Based
on Randomized Block Krylov Iteration [27.85452105378894]
This work designs an rBKI-based Tucker decomposition (rBKI-TK) for accurate approximation, together with a hierarchical tensor ring decomposition based on rBKI-TK for efficient compression of large-scale data.
Numerical experiences demonstrate the efficiency, accuracy and scalability of the proposed methods in both data compression and denoising.
arXiv Detail & Related papers (2022-11-27T13:45:28Z) - Unified Multivariate Gaussian Mixture for Efficient Neural Image
Compression [151.3826781154146]
latent variables with priors and hyperpriors is an essential problem in variational image compression.
We find inter-correlations and intra-correlations exist when observing latent variables in a vectorized perspective.
Our model has better rate-distortion performance and an impressive $3.18times$ compression speed up.
arXiv Detail & Related papers (2022-03-21T11:44:17Z) - Optimizing Information-theoretical Generalization Bounds via Anisotropic
Noise in SGLD [73.55632827932101]
We optimize the information-theoretical generalization bound by manipulating the noise structure in SGLD.
We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance.
arXiv Detail & Related papers (2021-10-26T15:02:27Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Adversarial Information Bottleneck [2.66512000865131]
The information bottleneck (IB) principle has been adopted to explain deep learning in terms of information compression and prediction.
Previous methods attempted to optimize the IB principle by introducing random noise into learning the representation.
We propose an adversarial information bottleneck (AIB) method without any explicit assumptions about the underlying distribution of the representations.
arXiv Detail & Related papers (2021-02-28T03:14:56Z) - Efficient Semi-Implicit Variational Inference [65.07058307271329]
We propose an efficient and scalable semi-implicit extrapolational (SIVI)
Our method maps SIVI's evidence to a rigorous inference of lower gradient values.
arXiv Detail & Related papers (2021-01-15T11:39:09Z) - On Compression Principle and Bayesian Optimization for Neural Networks [0.0]
We propose a compression principle that states that an optimal predictive model is the one that minimizes a total compressed message length of all data and model definition while guarantees decodability.
We show that dropout can be used for a continuous dimensionality reduction that allows to find optimal network dimensions as required by the compression principle.
arXiv Detail & Related papers (2020-06-23T03:23:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.