Randomized Sharpness-Aware Training for Boosting Computational
Efficiency in Deep Learning
- URL: http://arxiv.org/abs/2203.09962v2
- Date: Mon, 10 Apr 2023 06:43:15 GMT
- Title: Randomized Sharpness-Aware Training for Boosting Computational
Efficiency in Deep Learning
- Authors: Yang Zhao, Hao Zhang and Xiuyuan Hu
- Abstract summary: We propose a simple yet efficient training scheme, called Randomized Sharpness-Aware Training (RST).s in RST would perform a Bernoulli trial at each iteration to choose randomly from base algorithms (SGD) and sharpness-aware algorithms (SAM)
We show that G-RST can outperform SAM in most cases while saving 50% extra cost.
- Score: 13.937644559223548
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: By driving models to converge to flat minima, sharpness-aware learning
algorithms (such as SAM) have shown the power to achieve state-of-the-art
performances. However, these algorithms will generally incur one extra
forward-backward propagation at each training iteration, which largely burdens
the computation especially for scalable models. To this end, we propose a
simple yet efficient training scheme, called Randomized Sharpness-Aware
Training (RST). Optimizers in RST would perform a Bernoulli trial at each
iteration to choose randomly from base algorithms (SGD) and sharpness-aware
algorithms (SAM) with a probability arranged by a predefined scheduling
function. Due to the mixture of base algorithms, the overall count of
propagation pairs could be largely reduced. Also, we give theoretical analysis
on the convergence of RST. Then, we empirically study the computation cost and
effect of various types of scheduling functions, and give directions on setting
appropriate scheduling functions. Further, we extend the RST to a general
framework (G-RST), where we can adjust regularization degree on sharpness
freely for any scheduling function. We show that G-RST can outperform SAM in
most cases while saving 50\% extra computation cost.
Related papers
- Scaling LLM Inference with Optimized Sample Compute Allocation [56.524278187351925]
We propose OSCA, an algorithm to find an optimal mix of different inference configurations.
Our experiments show that with our learned mixed allocation, we can achieve accuracy better than the best single configuration.
OSCA is also shown to be effective in agentic beyond single-turn tasks, achieving a better accuracy on SWE-Bench with 3x less compute than the default configuration.
arXiv Detail & Related papers (2024-10-29T19:17:55Z) - Fast Graph Sharpness-Aware Minimization for Enhancing and Accelerating Few-Shot Node Classification [53.727688136434345]
Graph Neural Networks (GNNs) have shown superior performance in node classification.
We present Fast Graph Sharpness-Aware Minimization (FGSAM) that integrates the rapid training of Multi-Layer Perceptrons with the superior performance of GNNs.
Our proposed algorithm outperforms the standard SAM with lower computational costs in FSNC tasks.
arXiv Detail & Related papers (2024-10-22T09:33:29Z) - Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics [39.07258580928359]
We study computationally and statistically efficient Reinforcement Learning algorithms for the linear Bellman Complete setting.
This setting uses linear function approximation to capture value functions and unifies existing models like linear Markov Decision Processes (MDP) and Linear Quadratic Regulators (LQR)
Our work provides a computationally efficient algorithm for the linear Bellman complete setting that works for MDPs with large action spaces, random initial states, and random rewards but relies on the underlying dynamics to be deterministic.
arXiv Detail & Related papers (2024-06-17T17:52:38Z) - Distributive Pre-Training of Generative Modeling Using Matrix-Product
States [0.0]
We consider an alternative training scheme utilizing basic tensor network operations, e.g., summation and compression.
The training algorithm is based on compressing the superposition state constructed from all the training data in product state representation.
We benchmark the algorithm on the MNIST dataset and show reasonable results for generating new images and classification tasks.
arXiv Detail & Related papers (2023-06-26T15:46:08Z) - Representation Learning with Multi-Step Inverse Kinematics: An Efficient
and Optimal Approach to Rich-Observation RL [106.82295532402335]
Existing reinforcement learning algorithms suffer from computational intractability, strong statistical assumptions, and suboptimal sample complexity.
We provide the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level.
Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics.
arXiv Detail & Related papers (2023-04-12T14:51:47Z) - A Stable, Fast, and Fully Automatic Learning Algorithm for Predictive
Coding Networks [65.34977803841007]
Predictive coding networks are neuroscience-inspired models with roots in both Bayesian statistics and neuroscience.
We show how by simply changing the temporal scheduling of the update rule for the synaptic weights leads to an algorithm that is much more efficient and stable than the original one.
arXiv Detail & Related papers (2022-11-16T00:11:04Z) - Robust Learning of Parsimonious Deep Neural Networks [0.0]
We propose a simultaneous learning and pruning algorithm capable of identifying and eliminating irrelevant structures in a neural network.
We derive a novel hyper-prior distribution over the prior parameters that is crucial for their optimal selection.
We evaluate the proposed algorithm on the MNIST data set and commonly used fully connected and convolutional LeNet architectures.
arXiv Detail & Related papers (2022-05-10T03:38:55Z) - Non-Clairvoyant Scheduling with Predictions Revisited [77.86290991564829]
In non-clairvoyant scheduling, the task is to find an online strategy for scheduling jobs with a priori unknown processing requirements.
We revisit this well-studied problem in a recently popular learning-augmented setting that integrates (untrusted) predictions in algorithm design.
We show that these predictions have desired properties, admit a natural error measure as well as algorithms with strong performance guarantees.
arXiv Detail & Related papers (2022-02-21T13:18:11Z) - Practical, Provably-Correct Interactive Learning in the Realizable
Setting: The Power of True Believers [12.09273192079783]
We consider interactive learning in the realizable setting and develop a general framework to handle problems ranging from best arm identification to active classification.
We design novel computationally efficient algorithms for the realizable setting that match the minimax lower bound up to logarithmic factors.
arXiv Detail & Related papers (2021-11-09T02:33:36Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.