Variational Classification
- URL: http://arxiv.org/abs/2305.10406v5
- Date: Tue, 9 Jan 2024 11:25:48 GMT
- Title: Variational Classification
- Authors: Shehzaad Dhuliawala, Mrinmaya Sachan, Carl Allen
- Abstract summary: We derive a variational objective to train the model, analogous to the evidence lower bound (ELBO) used to train variational auto-encoders.
Treating inputs to the softmax layer as samples of a latent variable, our abstracted perspective reveals a potential inconsistency.
We induce a chosen latent distribution, instead of the implicit assumption found in a standard softmax layer.
- Score: 51.2541371924591
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a latent variable model for classification that provides a novel
probabilistic interpretation of neural network softmax classifiers. We derive a
variational objective to train the model, analogous to the evidence lower bound
(ELBO) used to train variational auto-encoders, that generalises the softmax
cross-entropy loss. Treating inputs to the softmax layer as samples of a latent
variable, our abstracted perspective reveals a potential inconsistency between
their anticipated distribution, required for accurate label predictions, and
their empirical distribution found in practice. We augment the variational
objective to mitigate such inconsistency and induce a chosen latent
distribution, instead of the implicit assumption found in a standard softmax
layer. Overall, we provide new theoretical insight into the inner workings of
widely-used softmax classifiers. Empirical evaluation on image and text
classification datasets demonstrates that our proposed approach, variational
classification, maintains classification accuracy while the reshaped latent
space improves other desirable properties of a classifier, such as calibration,
adversarial robustness, robustness to distribution shift and sample efficiency
useful in low data settings.
Related papers
- SoftCVI: Contrastive variational inference with self-generated soft labels [2.5398014196797614]
Variational inference and Markov chain Monte Carlo methods are the predominant tools for this task.
We introduce Soft Contrastive Variational Inference (SoftCVI), which allows a family of variational objectives to be derived through a contrastive estimation framework.
We find that SoftCVI can be used to form objectives which are stable to train and mass-covering, frequently outperforming inference with other variational approaches.
arXiv Detail & Related papers (2024-07-22T14:54:12Z) - Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.
We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.
Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z) - Exploring Beyond Logits: Hierarchical Dynamic Labeling Based on Embeddings for Semi-Supervised Classification [49.09505771145326]
We propose a Hierarchical Dynamic Labeling (HDL) algorithm that does not depend on model predictions and utilizes image embeddings to generate sample labels.
Our approach has the potential to change the paradigm of pseudo-label generation in semi-supervised learning.
arXiv Detail & Related papers (2024-04-26T06:00:27Z) - Variational autoencoder with weighted samples for high-dimensional
non-parametric adaptive importance sampling [0.0]
We extend the existing framework to the case of weighted samples by introducing a new objective function.
In order to add flexibility to the model and to be able to learn multimodal distributions, we consider a learnable prior distribution.
We exploit the proposed procedure in existing adaptive importance sampling algorithms to draw points from a target distribution and to estimate a rare event probability in high dimension.
arXiv Detail & Related papers (2023-10-13T15:40:55Z) - Implicit Variational Inference for High-Dimensional Posteriors [7.924706533725115]
In variational inference, the benefits of Bayesian models rely on accurately capturing the true posterior distribution.
We propose using neural samplers that specify implicit distributions, which are well-suited for approximating complex multimodal and correlated posteriors.
Our approach introduces novel bounds for approximate inference using implicit distributions by locally linearising the neural sampler.
arXiv Detail & Related papers (2023-10-10T14:06:56Z) - ReCAB-VAE: Gumbel-Softmax Variational Inference Based on Analytic
Divergence [17.665255113864795]
We present a novel divergence-like metric which corresponds to the upper bound of the Kullback-Leibler divergence (KLD) of a relaxed categorical distribution.
We also propose a relaxed categorical analytic bound variational autoencoder (ReCAB-VAE) that successfully models both continuous and relaxed latent representations.
arXiv Detail & Related papers (2022-05-09T08:11:46Z) - X-model: Improving Data Efficiency in Deep Learning with A Minimax Model [78.55482897452417]
We aim at improving data efficiency for both classification and regression setups in deep learning.
To take the power of both worlds, we propose a novel X-model.
X-model plays a minimax game between the feature extractor and task-specific heads.
arXiv Detail & Related papers (2021-10-09T13:56:48Z) - Shaping Deep Feature Space towards Gaussian Mixture for Visual
Classification [74.48695037007306]
We propose a Gaussian mixture (GM) loss function for deep neural networks for visual classification.
With a classification margin and a likelihood regularization, the GM loss facilitates both high classification performance and accurate modeling of the feature distribution.
The proposed model can be implemented easily and efficiently without using extra trainable parameters.
arXiv Detail & Related papers (2020-11-18T03:32:27Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.