Evaluating Deep Learning in SystemML using Layer-wise Adaptive Rate
Scaling(LARS) Optimizer
- URL: http://arxiv.org/abs/2102.03018v1
- Date: Fri, 5 Feb 2021 06:23:56 GMT
- Title: Evaluating Deep Learning in SystemML using Layer-wise Adaptive Rate
Scaling(LARS) Optimizer
- Authors: Kanchan Chowdhury, Ankita Sharma and Arun Deepak Chandrasekar
- Abstract summary: We apply LARS to a deep learning model implemented using SystemML.
We perform experiments with various batch sizes and compare the performance of LARS with distributed machine learning framework.
- Score: 0.3857494091717916
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Increasing the batch size of a deep learning model is a challenging task.
Although it might help in utilizing full available system memory during
training phase of a model, it results in significant loss of test accuracy most
often. LARS solved this issue by introducing an adaptive learning rate for each
layer of a deep learning model. However, there are doubts on how popular
distributed machine learning systems such as SystemML or MLlib will perform
with this optimizer. In this work, we apply LARS optimizer to a deep learning
model implemented using SystemML.We perform experiments with various batch
sizes and compare the performance of LARS optimizer with \textit{Stochastic
Gradient Descent}. Our experimental results show that LARS optimizer performs
significantly better than Stochastic Gradient Descent for large batch sizes
even with the distributed machine learning framework, SystemML.
Related papers
- CubicML: Automated ML for Large ML Systems Co-design with ML Prediction of Performance [7.425372356516303]
Scaling up deep learning models has been proven effective to improve intelligence of machine learning (ML) models.
In this paper, we propose CubicML which uses ML to automatically optimize training performance of large distributed ML systems.
We prove that CubicML can effectively optimize training speed of in-house recommendation models with 73 billion parameters and large language models up to 405 billion parameters at Meta ads.
arXiv Detail & Related papers (2024-09-06T19:55:21Z) - CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models [68.64605538559312]
In this paper, we analyze the MLLM instruction tuning from both theoretical and empirical perspectives.
Inspired by our findings, we propose a measurement to quantitatively evaluate the learning balance.
In addition, we introduce an auxiliary loss regularization method to promote updating of the generation distribution of MLLMs.
arXiv Detail & Related papers (2024-07-29T23:18:55Z) - SLMRec: Empowering Small Language Models for Sequential Recommendation [38.51895517016953]
Sequential Recommendation task involves predicting the next item a user is likely to interact with, given their past interactions.
Recent research demonstrates the great impact of LLMs on sequential recommendation systems.
Due to the huge size of LLMs, it is inefficient and impractical to apply a LLM-based model in real-world platforms.
arXiv Detail & Related papers (2024-05-28T07:12:06Z) - AdaLomo: Low-memory Optimization with Adaptive Learning Rate [59.64965955386855]
We introduce low-memory optimization with adaptive learning rate (AdaLomo) for large language models.
AdaLomo results on par with AdamW, while significantly reducing memory requirements, thereby lowering the hardware barrier to training large language models.
arXiv Detail & Related papers (2023-10-16T09:04:28Z) - M-L2O: Towards Generalizable Learning-to-Optimize by Test-Time Fast
Self-Adaptation [145.7321032755538]
Learning to Optimize (L2O) has drawn increasing attention as it often remarkably accelerates the optimization procedure of complex tasks.
This paper investigates a potential solution to this open challenge by meta-training an L2O that can perform fast test-time self-adaptation to an out-of-distribution task.
arXiv Detail & Related papers (2023-02-28T19:23:20Z) - Machine Learning Methods for Spectral Efficiency Prediction in Massive
MIMO Systems [0.0]
We study several machine learning approaches to solve the problem of estimating the spectral efficiency (SE) value for a certain precoding scheme, preferably in the shortest possible time.
The best results in terms of mean average percentage error (MAPE) are obtained with gradient boosting over sorted features, while linear models demonstrate worse prediction quality.
We investigate the practical applicability of the proposed algorithms in a wide range of scenarios generated by the Quadriga simulator.
arXiv Detail & Related papers (2021-12-29T07:03:10Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - Robust MAML: Prioritization task buffer with adaptive learning process
for model-agnostic meta-learning [15.894925018423665]
Model agnostic meta-learning (MAML) is a popular state-of-the-art meta-learning algorithm.
This paper proposes a more robust MAML based on an adaptive learning scheme and a prioritization task buffer.
Experimental results on meta reinforcement learning environments demonstrate a substantial performance gain.
arXiv Detail & Related papers (2021-03-15T09:34:34Z) - Robusta: Robust AutoML for Feature Selection via Reinforcement Learning [24.24652530951966]
We propose the first robust AutoML framework, Robusta--based on reinforcement learning (RL)
We show that the framework is able to improve the model robustness by up to 22% while maintaining competitive accuracy on benign samples.
arXiv Detail & Related papers (2021-01-15T03:12:29Z) - Meta-Learning with Adaptive Hyperparameters [55.182841228303225]
We focus on a complementary factor in MAML framework, inner-loop optimization (or fast adaptation)
We propose a new weight update rule that greatly enhances the fast adaptation process.
arXiv Detail & Related papers (2020-10-31T08:05:34Z) - Optimization-driven Machine Learning for Intelligent Reflecting Surfaces
Assisted Wireless Networks [82.33619654835348]
Intelligent surface (IRS) has been employed to reshape the wireless channels by controlling individual scattering elements' phase shifts.
Due to the large size of scattering elements, the passive beamforming is typically challenged by the high computational complexity.
In this article, we focus on machine learning (ML) approaches for performance in IRS-assisted wireless networks.
arXiv Detail & Related papers (2020-08-29T08:39:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.