Learning Sparsity of Representations with Discrete Latent Variables
- URL: http://arxiv.org/abs/2304.00935v1
- Date: Mon, 3 Apr 2023 12:47:18 GMT
- Title: Learning Sparsity of Representations with Discrete Latent Variables
- Authors: Zhao Xu, Daniel Onoro Rubio, Giuseppe Serra, Mathias Niepert
- Abstract summary: We propose a sparse deep latent generative model SDLGM to explicitly model degree of sparsity.
The resulting sparsity of a representation is not fixed, but fits to the observation itself under the pre-defined restriction.
For inference and learning, we develop an amortized variational method based on MC gradient estimator.
- Score: 15.05207849434673
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep latent generative models have attracted increasing attention due to the
capacity of combining the strengths of deep learning and probabilistic models
in an elegant way. The data representations learned with the models are often
continuous and dense. However in many applications, sparse representations are
expected, such as learning sparse high dimensional embedding of data in an
unsupervised setting, and learning multi-labels from thousands of candidate
tags in a supervised setting. In some scenarios, there could be further
restriction on degree of sparsity: the number of non-zero features of a
representation cannot be larger than a pre-defined threshold $L_0$. In this
paper we propose a sparse deep latent generative model SDLGM to explicitly
model degree of sparsity and thus enable to learn the sparse structure of the
data with the quantified sparsity constraint. The resulting sparsity of a
representation is not fixed, but fits to the observation itself under the
pre-defined restriction. In particular, we introduce to each observation $i$ an
auxiliary random variable $L_i$, which models the sparsity of its
representation. The sparse representations are then generated with a two-step
sampling process via two Gumbel-Softmax distributions. For inference and
learning, we develop an amortized variational method based on MC gradient
estimator. The resulting sparse representations are differentiable with
backpropagation. The experimental evaluation on multiple datasets for
unsupervised and supervised learning problems shows the benefits of the
proposed method.
Related papers
- Exploring Beyond Logits: Hierarchical Dynamic Labeling Based on Embeddings for Semi-Supervised Classification [49.09505771145326]
We propose a Hierarchical Dynamic Labeling (HDL) algorithm that does not depend on model predictions and utilizes image embeddings to generate sample labels.
Our approach has the potential to change the paradigm of pseudo-label generation in semi-supervised learning.
arXiv Detail & Related papers (2024-04-26T06:00:27Z) - Interpretable time series neural representation for classification
purposes [3.1201323892302444]
The proposed model produces consistent, discrete, interpretable, and visualizable representations.
The experiments show that the proposed model yields, on average better results than other interpretable approaches on multiple datasets.
arXiv Detail & Related papers (2023-10-25T15:06:57Z) - ChiroDiff: Modelling chirographic data with Diffusion Models [132.5223191478268]
We introduce a powerful model-class namely "Denoising Diffusion Probabilistic Models" or DDPMs for chirographic data.
Our model named "ChiroDiff", being non-autoregressive, learns to capture holistic concepts and therefore remains resilient to higher temporal sampling rate.
arXiv Detail & Related papers (2023-04-07T15:17:48Z) - Learning Sparse Latent Representations for Generator Model [7.467412443287767]
We present a new unsupervised learning method to enforce sparsity on the latent space for the generator model.
Our model consists of only one top-down generator network that maps the latent variable to the observed data.
arXiv Detail & Related papers (2022-09-20T18:58:24Z) - Attribute Graphs Underlying Molecular Generative Models: Path to Learning with Limited Data [42.517927809224275]
We provide an algorithm that relies on perturbation experiments on latent codes of a pre-trained generative autoencoder to uncover an attribute graph.
We show that one can fit an effective graphical model that models a structural equation model between latent codes.
Using a pre-trained generative autoencoder trained on a large dataset of small molecules, we demonstrate that the graphical model can be used to predict a specific property.
arXiv Detail & Related papers (2022-07-14T19:20:30Z) - Training Discrete Deep Generative Models via Gapped Straight-Through
Estimator [72.71398034617607]
We propose a Gapped Straight-Through ( GST) estimator to reduce the variance without incurring resampling overhead.
This estimator is inspired by the essential properties of Straight-Through Gumbel-Softmax.
Experiments demonstrate that the proposed GST estimator enjoys better performance compared to strong baselines on two discrete deep generative modeling tasks.
arXiv Detail & Related papers (2022-06-15T01:46:05Z) - Learning Debiased and Disentangled Representations for Semantic
Segmentation [52.35766945827972]
We propose a model-agnostic and training scheme for semantic segmentation.
By randomly eliminating certain class information in each training iteration, we effectively reduce feature dependencies among classes.
Models trained with our approach demonstrate strong results on multiple semantic segmentation benchmarks.
arXiv Detail & Related papers (2021-10-31T16:15:09Z) - Hierarchical Few-Shot Generative Models [18.216729811514718]
We study a latent variables approach that extends the Neural Statistician to a fully hierarchical approach with an attention-based point to set-level aggregation.
Our results show that the hierarchical formulation better captures the intrinsic variability within the sets in the small data regime.
arXiv Detail & Related papers (2021-10-23T19:19:39Z) - X-model: Improving Data Efficiency in Deep Learning with A Minimax Model [78.55482897452417]
We aim at improving data efficiency for both classification and regression setups in deep learning.
To take the power of both worlds, we propose a novel X-model.
X-model plays a minimax game between the feature extractor and task-specific heads.
arXiv Detail & Related papers (2021-10-09T13:56:48Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.