First-Passage Approach to Optimizing Perturbations for Improved Training of Machine Learning Models
- URL: http://arxiv.org/abs/2502.04121v2
- Date: Thu, 13 Mar 2025 18:41:50 GMT
- Title: First-Passage Approach to Optimizing Perturbations for Improved Training of Machine Learning Models
- Authors: Sagi Meir, Tommer D. Keidar, Shlomi Reuveni, Barak Hirshberg,
- Abstract summary: Several protocols have been developed to improve the training of machine learning models.<n>We frame them as first-passage processes and consider their response to perturbations.<n>We show that if the unperturbed learning process reaches a quasi-steady state, the response at a single perturbation frequency can predict the behavior at a wide range of timescales.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning models have become indispensable tools in applications across the physical sciences. Their training is often time-consuming, vastly exceeding the inference timescales. Several protocols have been developed to perturb the learning process and improve the training, such as shrink and perturb, warm restarts, and stochastic resetting. For classifiers, these perturbations have been shown to result in enhanced speedups or improved generalization. However, the design of such perturbations is usually done ad hoc by intuition and trial and error. To rationally optimize training protocols, we frame them as first-passage processes and consider their response to perturbations. We show that if the unperturbed learning process reaches a quasi-steady state, the response at a single perturbation frequency can predict the behavior at a wide range of frequencies. We employ this approach to a CIFAR-10 classifier using the ResNet-18 model and identify a useful perturbation and frequency among several possibilities. Our work allows optimization of perturbations for improving the training of machine learning models using a first-passage approach.
Related papers
- Instance-dependent Early Stopping [57.912273923450726]
We propose an Instance-dependent Early Stopping (IES) method that adapts the early stopping mechanism from the entire training set to the instance level.
IES considers an instance as mastered if the second-order differences of its loss value remain within a small range around zero.
IES can reduce backpropagation instances by 10%-50% while maintaining or even slightly improving the test accuracy and transfer learning performance of a model.
arXiv Detail & Related papers (2025-02-11T13:34:09Z) - Enabling Realtime Reinforcement Learning at Scale with Staggered Asynchronous Inference [22.106900089984318]
Realtime environments change even as agents perform action inference and learning.<n>Recent advances in machine learning involve larger neural networks with longer inference times.<n>We present an analysis of lower bounds on regret in realtime reinforcement learning.
arXiv Detail & Related papers (2024-12-18T21:43:40Z) - A Cost-Aware Approach to Adversarial Robustness in Neural Networks [1.622320874892682]
We propose using accelerated failure time models to measure the effect of hardware choice, batch size, number of epochs, and test-set accuracy.
We evaluate several GPU types and use the Tree Parzen Estimator to maximize model robustness and minimize model run-time simultaneously.
arXiv Detail & Related papers (2024-09-11T20:43:59Z) - Always-Sparse Training by Growing Connections with Guided Stochastic
Exploration [46.4179239171213]
We propose an efficient always-sparse training algorithm with excellent scaling to larger and sparser models.
We evaluate our method on CIFAR-10/100 and ImageNet using VGG, and ViT models, and compare it against a range of sparsification methods.
arXiv Detail & Related papers (2024-01-12T21:32:04Z) - Reset It and Forget It: Relearning Last-Layer Weights Improves Continual and Transfer Learning [2.270857464465579]
This work identifies a simple pre-training mechanism that leads to representations exhibiting better continual and transfer learning.
The repeated resetting of weights in the last layer, which we nickname "zapping," was originally designed for a meta-continual-learning procedure.
We show it is surprisingly applicable in many settings beyond both meta-learning and continual learning.
arXiv Detail & Related papers (2023-10-12T02:52:14Z) - Accelerating Multiframe Blind Deconvolution via Deep Learning [0.0]
Ground-based solar image restoration is a computationally expensive procedure.
We propose a new method to accelerate the restoration based on algorithm unrolling.
We show that both methods significantly reduce the restoration time compared to the standard optimization procedure.
arXiv Detail & Related papers (2023-06-21T07:53:00Z) - Dynamic Scheduled Sampling with Imitation Loss for Neural Text
Generation [10.306522595622651]
We introduce Dynamic Scheduled Sampling with Imitation Loss (DySI), which maintains the schedule based solely on the training time accuracy.
DySI achieves notable improvements on standard machine translation benchmarks, and significantly improves the robustness of other text generation models.
arXiv Detail & Related papers (2023-01-31T16:41:06Z) - Stabilizing Machine Learning Prediction of Dynamics: Noise and
Noise-inspired Regularization [58.720142291102135]
Recent has shown that machine learning (ML) models can be trained to accurately forecast the dynamics of chaotic dynamical systems.
In the absence of mitigating techniques, this technique can result in artificially rapid error growth, leading to inaccurate predictions and/or climate instability.
We introduce Linearized Multi-Noise Training (LMNT), a regularization technique that deterministically approximates the effect of many small, independent noise realizations added to the model input during training.
arXiv Detail & Related papers (2022-11-09T23:40:52Z) - Real-to-Sim: Predicting Residual Errors of Robotic Systems with Sparse
Data using a Learning-based Unscented Kalman Filter [65.93205328894608]
We learn the residual errors between a dynamic and/or simulator model and the real robot.
We show that with the learned residual errors, we can further close the reality gap between dynamic models, simulations, and actual hardware.
arXiv Detail & Related papers (2022-09-07T15:15:12Z) - Effective and Efficient Training for Sequential Recommendation using
Recency Sampling [91.02268704681124]
We propose a novel Recency-based Sampling of Sequences training objective.
We show that the models enhanced with our method can achieve performances exceeding or very close to stateof-the-art BERT4Rec.
arXiv Detail & Related papers (2022-07-06T13:06:31Z) - Efficient Sub-structured Knowledge Distillation [52.5931565465661]
We propose an approach that is much simpler in its formulation and far more efficient for training than existing approaches.
We transfer the knowledge from a teacher model to its student model by locally matching their predictions on all sub-structures, instead of the whole output space.
arXiv Detail & Related papers (2022-03-09T15:56:49Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less
Forgetting [66.45372974713189]
We propose a recall and learn mechanism, which adopts the idea of multi-task learning and jointly learns pretraining tasks and downstream tasks.
Experiments show that our method achieves state-of-the-art performance on the GLUE benchmark.
We provide open-source RecAdam, which integrates the proposed mechanisms into Adam to facility the NLP community.
arXiv Detail & Related papers (2020-04-27T08:59:57Z) - Overfitting in adversarially robust deep learning [86.11788847990783]
We show that overfitting to the training set does in fact harm robust performance to a very large degree in adversarially robust training.
We also show that effects such as the double descent curve do still occur in adversarially trained models, yet fail to explain the observed overfitting.
arXiv Detail & Related papers (2020-02-26T15:40:50Z) - The Two Regimes of Deep Network Training [93.84309968956941]
We study the effects of different learning schedules and the appropriate way to select them.
To this end, we isolate two distinct phases, which we refer to as the "large-step regime" and the "small-step regime"
Our training algorithm can significantly simplify learning rate schedules.
arXiv Detail & Related papers (2020-02-24T17:08:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.