On Compression Principle and Bayesian Optimization for Neural Networks
- URL: http://arxiv.org/abs/2006.12714v1
- Date: Tue, 23 Jun 2020 03:23:47 GMT
- Title: On Compression Principle and Bayesian Optimization for Neural Networks
- Authors: Michael Tetelman
- Abstract summary: We propose a compression principle that states that an optimal predictive model is the one that minimizes a total compressed message length of all data and model definition while guarantees decodability.
We show that dropout can be used for a continuous dimensionality reduction that allows to find optimal network dimensions as required by the compression principle.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Finding methods for making generalizable predictions is a fundamental problem
of machine learning. By looking into similarities between the prediction
problem for unknown data and the lossless compression we have found an approach
that gives a solution. In this paper we propose a compression principle that
states that an optimal predictive model is the one that minimizes a total
compressed message length of all data and model definition while guarantees
decodability. Following the compression principle we use Bayesian approach to
build probabilistic models of data and network definitions. A method to
approximate Bayesian integrals using a sequence of variational approximations
is implemented as an optimizer for hyper-parameters: Bayesian Stochastic
Gradient Descent (BSGD). Training with BSGD is completely defined by setting
only three parameters: number of epochs, the size of the dataset and the size
of the minibatch, which define a learning rate and a number of iterations. We
show that dropout can be used for a continuous dimensionality reduction that
allows to find optimal network dimensions as required by the compression
principle.
Related papers
- Convolutional Neural Network Compression Based on Low-Rank Decomposition [3.3295360710329738]
This paper proposes a model compression method that integrates Variational Bayesian Matrix Factorization.
VBMF is employed to estimate the rank of the weight tensor at each layer.
Experimental results show that for both high and low compression ratios, our compression model exhibits advanced performance.
arXiv Detail & Related papers (2024-08-29T06:40:34Z) - Compression of Structured Data with Autoencoders: Provable Benefit of
Nonlinearities and Depth [83.15263499262824]
We prove that gradient descent converges to a solution that completely disregards the sparse structure of the input.
We show how to improve upon Gaussian performance for the compression of sparse data by adding a denoising function to a shallow architecture.
We validate our findings on image datasets, such as CIFAR-10 and MNIST.
arXiv Detail & Related papers (2024-02-07T16:32:29Z) - Structured Radial Basis Function Network: Modelling Diversity for
Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions.
A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems.
It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z) - Unified Multivariate Gaussian Mixture for Efficient Neural Image
Compression [151.3826781154146]
latent variables with priors and hyperpriors is an essential problem in variational image compression.
We find inter-correlations and intra-correlations exist when observing latent variables in a vectorized perspective.
Our model has better rate-distortion performance and an impressive $3.18times$ compression speed up.
arXiv Detail & Related papers (2022-03-21T11:44:17Z) - Distributed Sketching for Randomized Optimization: Exact
Characterization, Concentration and Lower Bounds [54.51566432934556]
We consider distributed optimization methods for problems where forming the Hessian is computationally challenging.
We leverage randomized sketches for reducing the problem dimensions as well as preserving privacy and improving straggler resilience in asynchronous distributed systems.
arXiv Detail & Related papers (2022-03-18T05:49:13Z) - Optimal Rate Adaption in Federated Learning with Compressed
Communications [28.16239232265479]
Federated Learning incurs high communication overhead, which can be greatly alleviated by compression for model updates.
tradeoff between compression and model accuracy in the networked environment remains unclear.
We present a framework to maximize the final model accuracy by strategically adjusting the compression each iteration.
arXiv Detail & Related papers (2021-12-13T14:26:15Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - Information Theoretic Structured Generative Modeling [13.117829542251188]
A novel generative model framework called the structured generative model (SGM) is proposed that makes straightforward optimization possible.
The implementation employs a single neural network driven by an orthonormal input to a single white noise source adapted to learn an infinite Gaussian mixture model.
Preliminary results show that SGM significantly improves MINE estimation in terms of data efficiency and variance, conventional and variational Gaussian mixture models, as well as for training adversarial networks.
arXiv Detail & Related papers (2021-10-12T07:44:18Z) - Data-Independent Structured Pruning of Neural Networks via Coresets [21.436706159840018]
We propose the first efficient structured pruning algorithm with a provable trade-off between its compression rate and the approximation error for any future test sample.
Unlike previous works, our coreset is data independent, meaning that it provably guarantees the accuracy of the function for any input $xin mathbbRd$, including an adversarial one.
arXiv Detail & Related papers (2020-08-19T08:03:09Z) - Training with Quantization Noise for Extreme Model Compression [57.51832088938618]
We tackle the problem of producing compact models, maximizing their accuracy for a given model size.
A standard solution is to train networks with Quantization Aware Training, where the weights are quantized during training and the gradients approximated with the Straight-Through Estimator.
In this paper, we extend this approach to work beyond int8 fixed-point quantization with extreme compression methods.
arXiv Detail & Related papers (2020-04-15T20:10:53Z) - Variational Bayesian Quantization [31.999462074510305]
We propose a novel algorithm for quantizing continuous latent representations in trained models.
Unlike current end-to-end neural compression methods that cater the model to a fixed quantization scheme, our algorithm separates model design and training from quantization.
Our algorithm can be seen as a novel extension of arithmetic coding to the continuous domain.
arXiv Detail & Related papers (2020-02-18T00:15:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.