Evaluating Deep Learning in SystemML using Layer-wise Adaptive Rate
Scaling(LARS) Optimizer
- URL: http://arxiv.org/abs/2102.03018v1
- Date: Fri, 5 Feb 2021 06:23:56 GMT
- Title: Evaluating Deep Learning in SystemML using Layer-wise Adaptive Rate
Scaling(LARS) Optimizer
- Authors: Kanchan Chowdhury, Ankita Sharma and Arun Deepak Chandrasekar
- Abstract summary: We apply LARS to a deep learning model implemented using SystemML.
We perform experiments with various batch sizes and compare the performance of LARS with distributed machine learning framework.
- Score: 0.3857494091717916
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Increasing the batch size of a deep learning model is a challenging task.
Although it might help in utilizing full available system memory during
training phase of a model, it results in significant loss of test accuracy most
often. LARS solved this issue by introducing an adaptive learning rate for each
layer of a deep learning model. However, there are doubts on how popular
distributed machine learning systems such as SystemML or MLlib will perform
with this optimizer. In this work, we apply LARS optimizer to a deep learning
model implemented using SystemML.We perform experiments with various batch
sizes and compare the performance of LARS optimizer with \textit{Stochastic
Gradient Descent}. Our experimental results show that LARS optimizer performs
significantly better than Stochastic Gradient Descent for large batch sizes
even with the distributed machine learning framework, SystemML.
Related papers
- LESA: Learnable LLM Layer Scaling-Up [57.0510934286449]
Training Large Language Models (LLMs) from scratch requires immense computational resources, making it prohibitively expensive.
Model scaling-up offers a promising solution by leveraging the parameters of smaller models to create larger ones.
We propose textbfLESA, a novel learnable method for depth scaling-up.
arXiv Detail & Related papers (2025-02-19T14:58:48Z) - LLMEmb: Large Language Model Can Be a Good Embedding Generator for Sequential Recommendation [57.49045064294086]
Large Language Model (LLM) has the ability to capture semantic relationships between items, independent of their popularity.
We introduce LLMEmb, a novel method leveraging LLM to generate item embeddings that enhance Sequential Recommender Systems (SRS) performance.
arXiv Detail & Related papers (2024-09-30T03:59:06Z) - CubicML: Automated ML for Large ML Systems Co-design with ML Prediction of Performance [7.425372356516303]
Scaling up deep learning models has been proven effective to improve intelligence of machine learning (ML) models.
In this paper, we propose CubicML which uses ML to automatically optimize training performance of large distributed ML systems.
We prove that CubicML can effectively optimize training speed of in-house recommendation models with 73 billion parameters and large language models up to 405 billion parameters at Meta ads.
arXiv Detail & Related papers (2024-09-06T19:55:21Z) - CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models [68.64605538559312]
In this paper, we analyze the MLLM instruction tuning from both theoretical and empirical perspectives.
Inspired by our findings, we propose a measurement to quantitatively evaluate the learning balance.
In addition, we introduce an auxiliary loss regularization method to promote updating of the generation distribution of MLLMs.
arXiv Detail & Related papers (2024-07-29T23:18:55Z) - AdaLomo: Low-memory Optimization with Adaptive Learning Rate [59.64965955386855]
We introduce low-memory optimization with adaptive learning rate (AdaLomo) for large language models.
AdaLomo results on par with AdamW, while significantly reducing memory requirements, thereby lowering the hardware barrier to training large language models.
arXiv Detail & Related papers (2023-10-16T09:04:28Z) - An Adaptive Plug-and-Play Network for Few-Shot Learning [12.023266104119289]
Few-shot learning requires a model to classify new samples after learning from only a few samples.
Deep networks and complex metrics tend to induce overfitting, making it difficult to further improve the performance.
We propose plug-and-play model-adaptive resizer (MAR) and adaptive similarity metric (ASM) without any other losses.
arXiv Detail & Related papers (2023-02-18T13:25:04Z) - Machine Learning Methods for Spectral Efficiency Prediction in Massive
MIMO Systems [0.0]
We study several machine learning approaches to solve the problem of estimating the spectral efficiency (SE) value for a certain precoding scheme, preferably in the shortest possible time.
The best results in terms of mean average percentage error (MAPE) are obtained with gradient boosting over sorted features, while linear models demonstrate worse prediction quality.
We investigate the practical applicability of the proposed algorithms in a wide range of scenarios generated by the Quadriga simulator.
arXiv Detail & Related papers (2021-12-29T07:03:10Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - Robust MAML: Prioritization task buffer with adaptive learning process
for model-agnostic meta-learning [15.894925018423665]
Model agnostic meta-learning (MAML) is a popular state-of-the-art meta-learning algorithm.
This paper proposes a more robust MAML based on an adaptive learning scheme and a prioritization task buffer.
Experimental results on meta reinforcement learning environments demonstrate a substantial performance gain.
arXiv Detail & Related papers (2021-03-15T09:34:34Z) - Robusta: Robust AutoML for Feature Selection via Reinforcement Learning [24.24652530951966]
We propose the first robust AutoML framework, Robusta--based on reinforcement learning (RL)
We show that the framework is able to improve the model robustness by up to 22% while maintaining competitive accuracy on benign samples.
arXiv Detail & Related papers (2021-01-15T03:12:29Z) - Optimization-driven Machine Learning for Intelligent Reflecting Surfaces
Assisted Wireless Networks [82.33619654835348]
Intelligent surface (IRS) has been employed to reshape the wireless channels by controlling individual scattering elements' phase shifts.
Due to the large size of scattering elements, the passive beamforming is typically challenged by the high computational complexity.
In this article, we focus on machine learning (ML) approaches for performance in IRS-assisted wireless networks.
arXiv Detail & Related papers (2020-08-29T08:39:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.