Variational Sparse Coding with Learned Thresholding
- URL: http://arxiv.org/abs/2205.03665v1
- Date: Sat, 7 May 2022 14:49:50 GMT
- Title: Variational Sparse Coding with Learned Thresholding
- Authors: Kion Fallah and Christopher J. Rozell
- Abstract summary: We propose a new approach to variational sparse coding that allows us to learn sparse distributions by thresholding samples.
We first evaluate and analyze our method by training a linear generator, showing that it has superior performance, statistical efficiency, and gradient estimation.
- Score: 6.737133300781134
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sparse coding strategies have been lauded for their parsimonious
representations of data that leverage low dimensional structure. However,
inference of these codes typically relies on an optimization procedure with
poor computational scaling in high-dimensional problems. For example, sparse
inference in the representations learned in the high-dimensional intermediary
layers of deep neural networks (DNNs) requires an iterative minimization to be
performed at each training step. As such, recent, quick methods in variational
inference have been proposed to infer sparse codes by learning a distribution
over the codes with a DNN. In this work, we propose a new approach to
variational sparse coding that allows us to learn sparse distributions by
thresholding samples, avoiding the use of problematic relaxations. We first
evaluate and analyze our method by training a linear generator, showing that it
has superior performance, statistical efficiency, and gradient estimation
compared to other sparse distributions. We then compare to a standard
variational autoencoder using a DNN generator on the Fashion MNIST and CelebA
datasets
Related papers
- Low-rank extended Kalman filtering for online learning of neural
networks from streaming data [71.97861600347959]
We propose an efficient online approximate Bayesian inference algorithm for estimating the parameters of a nonlinear function from a potentially non-stationary data stream.
The method is based on the extended Kalman filter (EKF), but uses a novel low-rank plus diagonal decomposition of the posterior matrix.
In contrast to methods based on variational inference, our method is fully deterministic, and does not require step-size tuning.
arXiv Detail & Related papers (2023-05-31T03:48:49Z) - Layer-wise Adaptive Step-Sizes for Stochastic First-Order Methods for
Deep Learning [8.173034693197351]
We propose a new per-layer adaptive step-size procedure for first-order optimization methods in deep learning.
The proposed approach exploits the layer-wise curvature information contained in the diagonal blocks of the Hessian in deep neural networks (DNNs) to compute adaptive step-sizes (i.e., LRs) for each layer.
Numerical experiments show that SGD with momentum and AdamW combined with the proposed per-layer step-sizes are able to choose effective LR schedules.
arXiv Detail & Related papers (2023-05-23T04:12:55Z) - Variational Linearized Laplace Approximation for Bayesian Deep Learning [11.22428369342346]
We propose a new method for approximating Linearized Laplace Approximation (LLA) using a variational sparse Gaussian Process (GP)
Our method is based on the dual RKHS formulation of GPs and retains, as the predictive mean, the output of the original DNN.
It allows for efficient optimization, which results in sub-linear training time in the size of the training dataset.
arXiv Detail & Related papers (2023-02-24T10:32:30Z) - Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture.
It can model the feature space more comprehensively and reduce the dominance of head classes.
The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - Partitioning sparse deep neural networks for scalable training and
inference [8.282177703075453]
State-of-the-art deep neural networks (DNNs) have significant computational and data management requirements.
Sparsification and pruning methods are shown to be effective in removing a large fraction of connections in DNNs.
The resulting sparse networks present unique challenges to further improve the computational efficiency of training and inference in deep learning.
arXiv Detail & Related papers (2021-04-23T20:05:52Z) - Consistent Sparse Deep Learning: Theory and Computation [11.24471623055182]
We propose a frequentist-like method for learning sparse deep learning networks (DNNs)
The proposed method can perform very well for large-scale network compression and high-dimensional nonlinear variable selection.
arXiv Detail & Related papers (2021-02-25T23:31:24Z) - Non-Convex Optimization with Spectral Radius Regularization [17.629499015699704]
We develop a regularization method which finds flat minima during the training of deep neural networks and other machine learning models.
These minima generalize better than sharp minima, allowing models to better generalize to real word test data.
arXiv Detail & Related papers (2021-02-22T17:39:05Z) - Improving predictions of Bayesian neural nets via local linearization [79.21517734364093]
We argue that the Gauss-Newton approximation should be understood as a local linearization of the underlying Bayesian neural network (BNN)
Because we use this linearized model for posterior inference, we should also predict using this modified model instead of the original one.
We refer to this modified predictive as "GLM predictive" and show that it effectively resolves common underfitting problems of the Laplace approximation.
arXiv Detail & Related papers (2020-08-19T12:35:55Z) - Bandit Samplers for Training Graph Neural Networks [63.17765191700203]
Several sampling algorithms with variance reduction have been proposed for accelerating the training of Graph Convolution Networks (GCNs)
These sampling algorithms are not applicable to more general graph neural networks (GNNs) where the message aggregator contains learned weights rather than fixed weights, such as Graph Attention Networks (GAT)
arXiv Detail & Related papers (2020-06-10T12:48:37Z) - Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality
Regularization and Singular Value Sparsification [53.50708351813565]
We propose SVD training, the first method to explicitly achieve low-rank DNNs during training without applying SVD on every step.
We empirically show that SVD training can significantly reduce the rank of DNN layers and achieve higher reduction on computation load under the same accuracy.
arXiv Detail & Related papers (2020-04-20T02:40:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.