MAST: Model-Agnostic Sparsified Training
- URL: http://arxiv.org/abs/2311.16086v1
- Date: Mon, 27 Nov 2023 18:56:03 GMT
- Title: MAST: Model-Agnostic Sparsified Training
- Authors: Yury Demidovich, Grigory Malinovsky, Egor Shulgin, Peter Richt\'arik
- Abstract summary: We introduce a novel optimization problem formulation that departs from the conventional way of minimizing machine learning model loss as a black-box function.
Unlike traditional formulations, the proposed approach explicitly incorporates an initially pre-trained model and random sketch operators.
We present several variants of the Gradient Descent (SGD) method adapted to the new problem formulation.
- Score: 4.962431253126472
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a novel optimization problem formulation that departs from the
conventional way of minimizing machine learning model loss as a black-box
function. Unlike traditional formulations, the proposed approach explicitly
incorporates an initially pre-trained model and random sketch operators,
allowing for sparsification of both the model and gradient during training. We
establish insightful properties of the proposed objective function and
highlight its connections to the standard formulation. Furthermore, we present
several variants of the Stochastic Gradient Descent (SGD) method adapted to the
new problem formulation, including SGD with general sampling, a distributed
version, and SGD with variance reduction techniques. We achieve tighter
convergence rates and relax assumptions, bridging the gap between theoretical
principles and practical applications, covering several important techniques
such as Dropout and Sparse training. This work presents promising opportunities
to enhance the theoretical understanding of model training through a
sparsification-aware optimization approach.
Related papers
- Sparse Bayesian Generative Modeling for Compressive Sensing [8.666730973498625]
This work addresses the fundamental linear inverse problem in compressive sensing (CS) by introducing a new type of regularizing generative prior.
We support our approach theoretically through the concept of variational inference and validate it empirically using different types of compressible signals.
arXiv Detail & Related papers (2024-11-14T14:37:47Z) - Variational Sequential Optimal Experimental Design using Reinforcement
Learning [0.0]
We introduce variational sequential Optimal Experimental Design (vsOED), a new method for optimally designing a finite sequence of experiments under a Bayesian framework and with information-gain utilities.
Our vsOED results indicate substantially improved sample efficiency and reduced number of forward model simulations compared to previous sequential design algorithms.
arXiv Detail & Related papers (2023-06-17T21:47:19Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Self-Supervised Primal-Dual Learning for Constrained Optimization [19.965556179096385]
This paper studies how to train machine-learning models that directly approximate the optimal solutions of constrained optimization problems.
It proposes the idea of Primal-Dual Learning (PDL), a self-supervised training method that does not require a set of pre-solved instances or an optimization solver for training and inference.
arXiv Detail & Related papers (2022-08-18T20:07:10Z) - Planning with Diffusion for Flexible Behavior Synthesis [125.24438991142573]
We consider what it would look like to fold as much of the trajectory optimization pipeline as possible into the modeling problem.
The core of our technical approach lies in a diffusion probabilistic model that plans by iteratively denoising trajectories.
arXiv Detail & Related papers (2022-05-20T07:02:03Z) - Predictive machine learning for prescriptive applications: a coupled
training-validating approach [77.34726150561087]
We propose a new method for training predictive machine learning models for prescriptive applications.
This approach is based on tweaking the validation step in the standard training-validating-testing scheme.
Several experiments with synthetic data demonstrate promising results in reducing the prescription costs in both deterministic and real models.
arXiv Detail & Related papers (2021-10-22T15:03:20Z) - Last Layer Marginal Likelihood for Invariance Learning [12.00078928875924]
We introduce a new lower bound to the marginal likelihood, which allows us to perform inference for a larger class of likelihood functions.
We work towards bringing this approach to neural networks by using an architecture with a Gaussian process in the last layer.
arXiv Detail & Related papers (2021-06-14T15:40:51Z) - Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference.
We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z) - Learning Diverse Representations for Fast Adaptation to Distribution
Shift [78.83747601814669]
We present a method for learning multiple models, incorporating an objective that pressures each to learn a distinct way to solve the task.
We demonstrate our framework's ability to facilitate rapid adaptation to distribution shift.
arXiv Detail & Related papers (2020-06-12T12:23:50Z) - Joint Stochastic Approximation and Its Application to Learning Discrete
Latent Variable Models [19.07718284287928]
We show that the difficulty of obtaining reliable gradients for the inference model and the drawback of indirectly optimizing the target log-likelihood can be gracefully addressed.
We propose to directly maximize the target log-likelihood and simultaneously minimize the inclusive divergence between the posterior and the inference model.
The resulting learning algorithm is called joint SA (JSA)
arXiv Detail & Related papers (2020-05-28T13:50:08Z) - Adversarial Distributional Training for Robust Deep Learning [53.300984501078126]
Adversarial training (AT) is among the most effective techniques to improve model robustness by augmenting training data with adversarial examples.
Most existing AT methods adopt a specific attack to craft adversarial examples, leading to the unreliable robustness against other unseen attacks.
In this paper, we introduce adversarial distributional training (ADT), a novel framework for learning robust models.
arXiv Detail & Related papers (2020-02-14T12:36:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.