Related papers: MAST: Model-Agnostic Sparsified Training

MAST: Model-Agnostic Sparsified Training

URL: http://arxiv.org/abs/2311.16086v1
Date: Mon, 27 Nov 2023 18:56:03 GMT
Title: MAST: Model-Agnostic Sparsified Training
Authors: Yury Demidovich, Grigory Malinovsky, Egor Shulgin, Peter Richt\'arik
Abstract summary: We introduce a novel optimization problem formulation that departs from the conventional way of minimizing machine learning model loss as a black-box function. Unlike traditional formulations, the proposed approach explicitly incorporates an initially pre-trained model and random sketch operators. We present several variants of the Gradient Descent (SGD) method adapted to the new problem formulation.
Score: 4.962431253126472
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce a novel optimization problem formulation that departs from the conventional way of minimizing machine learning model loss as a black-box function. Unlike traditional formulations, the proposed approach explicitly incorporates an initially pre-trained model and random sketch operators, allowing for sparsification of both the model and gradient during training. We establish insightful properties of the proposed objective function and highlight its connections to the standard formulation. Furthermore, we present several variants of the Stochastic Gradient Descent (SGD) method adapted to the new problem formulation, including SGD with general sampling, a distributed version, and SGD with variance reduction techniques. We achieve tighter convergence rates and relax assumptions, bridging the gap between theoretical principles and practical applications, covering several important techniques such as Dropout and Sparse training. This work presents promising opportunities to enhance the theoretical understanding of model training through a sparsification-aware optimization approach.

Related papers

A Model-Based Approach to Imitation Learning through Multi-Step Predictions [8.888213496593556]
We present a novel model-based imitation learning framework inspired by model predictive control. Our method outperforms traditional behavior cloning numerical benchmarks. We provide theoretical guarantees on the sample complexity and error bounds of our method, offering insights into its convergence properties.
arXiv Detail & Related papers (2025-04-18T02:19:30Z)
Training Deep Learning Models with Norm-Constrained LMOs [56.00317694850397]
We study optimization methods that leverage the linear minimization oracle (LMO) over a norm-ball. We propose a new family of algorithms that uses the LMO to adapt to the geometry of the problem and, perhaps surprisingly, show that they can be applied to unconstrained problems.
arXiv Detail & Related papers (2025-02-11T13:10:34Z)
Sparse Bayesian Generative Modeling for Compressive Sensing [8.666730973498625]
This work addresses the fundamental linear inverse problem in compressive sensing (CS) by introducing a new type of regularizing generative prior. We support our approach theoretically through the concept of variational inference and validate it empirically using different types of compressible signals.
arXiv Detail & Related papers (2024-11-14T14:37:47Z)
Variational Sequential Optimal Experimental Design using Reinforcement Learning [0.0]
We introduce variational sequential Optimal Experimental Design (vsOED), a new method for optimally designing a finite sequence of experiments under a Bayesian framework and with information-gain utilities. Our vsOED results indicate substantially improved sample efficiency and reduced number of forward model simulations compared to previous sequential design algorithms.
arXiv Detail & Related papers (2023-06-17T21:47:19Z)
When to Update Your Model: Constrained Model-based Reinforcement Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL) Our follow-up derived bounds reveal the relationship between model shifts and performance improvement. A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z)
Self-Supervised Primal-Dual Learning for Constrained Optimization [19.965556179096385]
This paper studies how to train machine-learning models that directly approximate the optimal solutions of constrained optimization problems. It proposes the idea of Primal-Dual Learning (PDL), a self-supervised training method that does not require a set of pre-solved instances or an optimization solver for training and inference.
arXiv Detail & Related papers (2022-08-18T20:07:10Z)
Planning with Diffusion for Flexible Behavior Synthesis [125.24438991142573]
We consider what it would look like to fold as much of the trajectory optimization pipeline as possible into the modeling problem. The core of our technical approach lies in a diffusion probabilistic model that plans by iteratively denoising trajectories.
arXiv Detail & Related papers (2022-05-20T07:02:03Z)
Predictive machine learning for prescriptive applications: a coupled training-validating approach [77.34726150561087]
We propose a new method for training predictive machine learning models for prescriptive applications. This approach is based on tweaking the validation step in the standard training-validating-testing scheme. Several experiments with synthetic data demonstrate promising results in reducing the prescription costs in both deterministic and real models.
arXiv Detail & Related papers (2021-10-22T15:03:20Z)
Last Layer Marginal Likelihood for Invariance Learning [12.00078928875924]
We introduce a new lower bound to the marginal likelihood, which allows us to perform inference for a larger class of likelihood functions. We work towards bringing this approach to neural networks by using an architecture with a Gaussian process in the last layer.
arXiv Detail & Related papers (2021-06-14T15:40:51Z)
Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference. We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z)
Learning Diverse Representations for Fast Adaptation to Distribution Shift [78.83747601814669]
We present a method for learning multiple models, incorporating an objective that pressures each to learn a distinct way to solve the task. We demonstrate our framework's ability to facilitate rapid adaptation to distribution shift.
arXiv Detail & Related papers (2020-06-12T12:23:50Z)
Joint Stochastic Approximation and Its Application to Learning Discrete Latent Variable Models [19.07718284287928]
We show that the difficulty of obtaining reliable gradients for the inference model and the drawback of indirectly optimizing the target log-likelihood can be gracefully addressed. We propose to directly maximize the target log-likelihood and simultaneously minimize the inclusive divergence between the posterior and the inference model. The resulting learning algorithm is called joint SA (JSA)
arXiv Detail & Related papers (2020-05-28T13:50:08Z)
Adversarial Distributional Training for Robust Deep Learning [53.300984501078126]
Adversarial training (AT) is among the most effective techniques to improve model robustness by augmenting training data with adversarial examples. Most existing AT methods adopt a specific attack to craft adversarial examples, leading to the unreliable robustness against other unseen attacks. In this paper, we introduce adversarial distributional training (ADT), a novel framework for learning robust models.
arXiv Detail & Related papers (2020-02-14T12:36:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.