Related papers: Evaluating Deep Learning in SystemML using Layer-wise Adaptive Rate Scaling(LARS) Optimizer

Evaluating Deep Learning in SystemML using Layer-wise Adaptive Rate Scaling(LARS) Optimizer

URL: http://arxiv.org/abs/2102.03018v1
Date: Fri, 5 Feb 2021 06:23:56 GMT
Title: Evaluating Deep Learning in SystemML using Layer-wise Adaptive Rate Scaling(LARS) Optimizer
Authors: Kanchan Chowdhury, Ankita Sharma and Arun Deepak Chandrasekar
Abstract summary: We apply LARS to a deep learning model implemented using SystemML. We perform experiments with various batch sizes and compare the performance of LARS with distributed machine learning framework.
Score: 0.3857494091717916
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Increasing the batch size of a deep learning model is a challenging task. Although it might help in utilizing full available system memory during training phase of a model, it results in significant loss of test accuracy most often. LARS solved this issue by introducing an adaptive learning rate for each layer of a deep learning model. However, there are doubts on how popular distributed machine learning systems such as SystemML or MLlib will perform with this optimizer. In this work, we apply LARS optimizer to a deep learning model implemented using SystemML.We perform experiments with various batch sizes and compare the performance of LARS optimizer with \textit{Stochastic Gradient Descent}. Our experimental results show that LARS optimizer performs significantly better than Stochastic Gradient Descent for large batch sizes even with the distributed machine learning framework, SystemML.

Related papers

LESA: Learnable LLM Layer Scaling-Up [57.0510934286449]
Training Large Language Models (LLMs) from scratch requires immense computational resources, making it prohibitively expensive. Model scaling-up offers a promising solution by leveraging the parameters of smaller models to create larger ones. We propose textbfLESA, a novel learnable method for depth scaling-up.
arXiv Detail & Related papers (2025-02-19T14:58:48Z)
LLMEmb: Large Language Model Can Be a Good Embedding Generator for Sequential Recommendation [57.49045064294086]
Large Language Model (LLM) has the ability to capture semantic relationships between items, independent of their popularity. We introduce LLMEmb, a novel method leveraging LLM to generate item embeddings that enhance Sequential Recommender Systems (SRS) performance.
arXiv Detail & Related papers (2024-09-30T03:59:06Z)
CubicML: Automated ML for Large ML Systems Co-design with ML Prediction of Performance [7.425372356516303]
Scaling up deep learning models has been proven effective to improve intelligence of machine learning (ML) models. In this paper, we propose CubicML which uses ML to automatically optimize training performance of large distributed ML systems. We prove that CubicML can effectively optimize training speed of in-house recommendation models with 73 billion parameters and large language models up to 405 billion parameters at Meta ads.
arXiv Detail & Related papers (2024-09-06T19:55:21Z)
CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models [68.64605538559312]
In this paper, we analyze the MLLM instruction tuning from both theoretical and empirical perspectives. Inspired by our findings, we propose a measurement to quantitatively evaluate the learning balance. In addition, we introduce an auxiliary loss regularization method to promote updating of the generation distribution of MLLMs.
arXiv Detail & Related papers (2024-07-29T23:18:55Z)
SLMRec: Empowering Small Language Models for Sequential Recommendation [38.51895517016953]
Sequential Recommendation task involves predicting the next item a user is likely to interact with, given their past interactions. Recent research demonstrates the great impact of LLMs on sequential recommendation systems. Due to the huge size of LLMs, it is inefficient and impractical to apply a LLM-based model in real-world platforms.
arXiv Detail & Related papers (2024-05-28T07:12:06Z)
AdaLomo: Low-memory Optimization with Adaptive Learning Rate [59.64965955386855]
We introduce low-memory optimization with adaptive learning rate (AdaLomo) for large language models. AdaLomo results on par with AdamW, while significantly reducing memory requirements, thereby lowering the hardware barrier to training large language models.
arXiv Detail & Related papers (2023-10-16T09:04:28Z)
M-L2O: Towards Generalizable Learning-to-Optimize by Test-Time Fast Self-Adaptation [145.7321032755538]
Learning to Optimize (L2O) has drawn increasing attention as it often remarkably accelerates the optimization procedure of complex tasks. This paper investigates a potential solution to this open challenge by meta-training an L2O that can perform fast test-time self-adaptation to an out-of-distribution task.
arXiv Detail & Related papers (2023-02-28T19:23:20Z)
An Adaptive Plug-and-Play Network for Few-Shot Learning [12.023266104119289]
Few-shot learning requires a model to classify new samples after learning from only a few samples. Deep networks and complex metrics tend to induce overfitting, making it difficult to further improve the performance. We propose plug-and-play model-adaptive resizer (MAR) and adaptive similarity metric (ASM) without any other losses.
arXiv Detail & Related papers (2023-02-18T13:25:04Z)
Machine Learning Methods for Spectral Efficiency Prediction in Massive MIMO Systems [0.0]
We study several machine learning approaches to solve the problem of estimating the spectral efficiency (SE) value for a certain precoding scheme, preferably in the shortest possible time. The best results in terms of mean average percentage error (MAPE) are obtained with gradient boosting over sorted features, while linear models demonstrate worse prediction quality. We investigate the practical applicability of the proposed algorithms in a wide range of scenarios generated by the Quadriga simulator.
arXiv Detail & Related papers (2021-12-29T07:03:10Z)
Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks. This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z)
Robust MAML: Prioritization task buffer with adaptive learning process for model-agnostic meta-learning [15.894925018423665]
Model agnostic meta-learning (MAML) is a popular state-of-the-art meta-learning algorithm. This paper proposes a more robust MAML based on an adaptive learning scheme and a prioritization task buffer. Experimental results on meta reinforcement learning environments demonstrate a substantial performance gain.
arXiv Detail & Related papers (2021-03-15T09:34:34Z)
Robusta: Robust AutoML for Feature Selection via Reinforcement Learning [24.24652530951966]
We propose the first robust AutoML framework, Robusta--based on reinforcement learning (RL) We show that the framework is able to improve the model robustness by up to 22% while maintaining competitive accuracy on benign samples.
arXiv Detail & Related papers (2021-01-15T03:12:29Z)
Meta-Learning with Adaptive Hyperparameters [55.182841228303225]
We focus on a complementary factor in MAML framework, inner-loop optimization (or fast adaptation) We propose a new weight update rule that greatly enhances the fast adaptation process.
arXiv Detail & Related papers (2020-10-31T08:05:34Z)
Optimization-driven Machine Learning for Intelligent Reflecting Surfaces Assisted Wireless Networks [82.33619654835348]
Intelligent surface (IRS) has been employed to reshape the wireless channels by controlling individual scattering elements' phase shifts. Due to the large size of scattering elements, the passive beamforming is typically challenged by the high computational complexity. In this article, we focus on machine learning (ML) approaches for performance in IRS-assisted wireless networks.
arXiv Detail & Related papers (2020-08-29T08:39:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.