Improving Line Search Methods for Large Scale Neural Network Training
- URL: http://arxiv.org/abs/2403.18519v1
- Date: Wed, 27 Mar 2024 12:50:27 GMT
- Title: Improving Line Search Methods for Large Scale Neural Network Training
- Authors: Philip Kenneweg, Tristan Kenneweg, Barbara Hammer,
- Abstract summary: We identify existing issues in state-of-the-art line search methods, propose enhancements, and rigorously evaluate their effectiveness.
We improve the Armijo line search by integrating the momentum term from ADAM in its search direction, enabling efficient large-scale training.
Our evaluation focuses on Transformers and CNNs in the domains of NLP and image data.
- Score: 4.862490782515929
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent studies, line search methods have shown significant improvements in the performance of traditional stochastic gradient descent techniques, eliminating the need for a specific learning rate schedule. In this paper, we identify existing issues in state-of-the-art line search methods, propose enhancements, and rigorously evaluate their effectiveness. We test these methods on larger datasets and more complex data domains than before. Specifically, we improve the Armijo line search by integrating the momentum term from ADAM in its search direction, enabling efficient large-scale training, a task that was previously prone to failure using Armijo line search methods. Our optimization approach outperforms both the previous Armijo implementation and tuned learning rate schedules for Adam. Our evaluation focuses on Transformers and CNNs in the domains of NLP and image data. Our work is publicly available as a Python package, which provides a hyperparameter free Pytorch optimizer.
Related papers
- Learning the Regularization Strength for Deep Fine-Tuning via a Data-Emphasized Variational Objective [4.453137996095194]
grid search is computationally expensive, requires carving out a validation set, and requires practitioners to specify candidate values.
Our proposed technique overcomes all three disadvantages of grid search.
We demonstrate effectiveness on image classification tasks on several datasets, yielding heldout accuracy comparable to existing approaches.
arXiv Detail & Related papers (2024-10-25T16:32:11Z) - No learning rates needed: Introducing SALSA -- Stable Armijo Line Search Adaptation [4.45108516823267]
We identify problems of current state-of-the-art line search methods, propose enhancements, and rigorously assess their effectiveness.
We evaluate these methods on orders of magnitude and more complex data domains than previously done.
Our work is publicly available in a Python package, which provides a simple Pytorch.
arXiv Detail & Related papers (2024-07-30T08:47:02Z) - Faster Convergence for Transformer Fine-tuning with Line Search Methods [6.138522679357102]
In this work we succeed in extending line search methods to the novel and highly popular Transformer architecture and dataset domains.
Our work is publicly available as a python package, which provides a hyper-free gradient pytorch that is compatible with arbitrary network architectures.
arXiv Detail & Related papers (2024-03-27T12:35:23Z) - Noisy Self-Training with Synthetic Queries for Dense Retrieval [49.49928764695172]
We introduce a novel noisy self-training framework combined with synthetic queries.
Experimental results show that our method improves consistently over existing methods.
Our method is data efficient and outperforms competitive baselines.
arXiv Detail & Related papers (2023-11-27T06:19:50Z) - Back to Basics: A Simple Recipe for Improving Out-of-Domain Retrieval in
Dense Encoders [63.28408887247742]
We study whether training procedures can be improved to yield better generalization capabilities in the resulting models.
We recommend a simple recipe for training dense encoders: Train on MSMARCO with parameter-efficient methods, such as LoRA, and opt for using in-batch negatives unless given well-constructed hard negatives.
arXiv Detail & Related papers (2023-11-16T10:42:58Z) - DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection [72.25697820290502]
This work introduces a straightforward and efficient strategy to identify potential novel classes through zero-shot classification.
We refer to this approach as the self-training strategy, which enhances recall and accuracy for novel classes without requiring extra annotations, datasets, and re-training.
Empirical evaluations on three datasets, including LVIS, V3Det, and COCO, demonstrate significant improvements over the baseline performance.
arXiv Detail & Related papers (2023-10-02T17:52:24Z) - Efficient Online Reinforcement Learning with Offline Data [78.92501185886569]
We show that we can simply apply existing off-policy methods to leverage offline data when learning online.
We extensively ablate these design choices, demonstrating the key factors that most affect performance.
We see that correct application of these simple recommendations can provide a $mathbf2.5times$ improvement over existing approaches.
arXiv Detail & Related papers (2023-02-06T17:30:22Z) - Reinforcement Learning in the Wild: Scalable RL Dispatching Algorithm
Deployed in Ridehailing Marketplace [12.298997392937876]
This study proposes a real-time dispatching algorithm based on reinforcement learning.
It is deployed online in multiple cities under DiDi's operation for A/B testing and is launched in one of the major international markets.
The deployed algorithm shows over 1.3% improvement in total driver income from A/B testing.
arXiv Detail & Related papers (2022-02-10T16:07:17Z) - Efficient Nearest Neighbor Language Models [114.40866461741795]
Non-parametric neural language models (NLMs) learn predictive distributions of text utilizing an external datastore.
We show how to achieve up to a 6x speed-up in inference speed while retaining comparable performance.
arXiv Detail & Related papers (2021-09-09T12:32:28Z) - Learning the Step-size Policy for the Limited-Memory
Broyden-Fletcher-Goldfarb-Shanno Algorithm [3.7470451129384825]
We consider the problem of how to learn a step-size policy for the Limited-Memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) algorithm.
We propose a neural network architecture with local information of the current gradient as the input.
The step-length policy is learned from data of similar optimization problems, avoids additional evaluations of the objective function, and guarantees that the output step remains inside a pre-defined interval.
arXiv Detail & Related papers (2020-10-03T09:34:03Z) - Dynamic Scale Training for Object Detection [111.33112051962514]
We propose a Dynamic Scale Training paradigm (abbreviated as DST) to mitigate scale variation challenge in object detection.
Experimental results demonstrate the efficacy of our proposed DST towards scale variation handling.
It does not introduce inference overhead and could serve as a free lunch for general detection configurations.
arXiv Detail & Related papers (2020-04-26T16:48:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.