Related papers: AdaFRUGAL: Adaptive Memory-Efficient Training with Dynamic Control

AdaFRUGAL: Adaptive Memory-Efficient Training with Dynamic Control

URL: http://arxiv.org/abs/2601.11568v1
Date: Sat, 27 Dec 2025 14:11:08 GMT
Title: AdaFRUGAL: Adaptive Memory-Efficient Training with Dynamic Control
Authors: Quang-Hung Bui, Anh Son Ta,
Abstract summary: Training Large Language Models (LLMs) is highly memory-intensive due to state overhead.<n>AdaFRUGAL introduces two dynamic controls: (i) a linear decay for $$ to progressively reduce memory, and (ii) a loss-aware schedule for $T$ to lower computational overhead.<n>It maintains competitive performance against AdamW and static FRUGAL while significantly reducing both GPU memory and training time.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Training Large Language Models (LLMs) is highly memory-intensive due to optimizer state overhead. The FRUGAL framework mitigates this with gradient splitting, but its static hyperparameters -- the subspace ratio ($ρ$) and update frequency ($T$) -- require costly manual tuning, limiting adaptability. We present AdaFRUGAL, which automates this process by introducing two dynamic controls: (i) a linear decay for $ρ$ to progressively reduce memory, and (ii) a loss-aware schedule for $T$ to lower computational overhead. Experiments across large-scale pre-training (English C4, Vietnamese VietVault) and fine-tuning (GLUE) demonstrate that AdaFRUGAL achieves a compelling trade-off. It maintains competitive performance against AdamW and static FRUGAL while significantly reducing both GPU memory and training time, offering a more practical, autonomous solution for resource-constrained LLM training.

Related papers

DiRL: An Efficient Post-Training Framework for Diffusion Language Models [54.405206032785706]
Diffusion Language Models (dLLMs) have emerged as promising alternatives to Auto-Regressive (AR) models.<n>Existing methods suffer from computational inefficiency and objective mismatches between training and inference.<n>We introduce DiRL, an efficient post-training framework that tightly integrates FlexAttention-accelerated blockwise training with LMDeploy-optimized inference.
arXiv Detail & Related papers (2025-12-23T08:33:19Z)
PT$^2$-LLM: Post-Training Ternarization for Large Language Models [52.4629647715623]
Large Language Models (LLMs) have shown impressive capabilities across diverse tasks, but their large memory and compute demands hinder deployment.<n>We propose PT$2$-LLM, a post-training ternarization framework tailored for LLMs.<n>At its core is an Asymmetric Ternary Quantizer equipped with a two-stage refinement pipeline.
arXiv Detail & Related papers (2025-09-27T03:01:48Z)
DAF: An Efficient End-to-End Dynamic Activation Framework for on-Device DNN Training [41.09085549544767]
We introduce a Dynamic Activation Framework (DAF) that enables scalable and efficient on-device training through system-level optimizations.<n>DAF achieves both memory- and time-efficient dynamic quantization training by addressing key system bottlenecks.<n> Evaluations on various deep learning models across embedded and mobile platforms demonstrate up to a $22.9times$ reduction in memory usage and a $3.2times$ speedup.
arXiv Detail & Related papers (2025-07-09T08:59:30Z)
Flexiffusion: Training-Free Segment-Wise Neural Architecture Search for Efficient Diffusion Models [50.260693393896716]
Diffusion models (DMs) are powerful generative models capable of producing high-fidelity images but constrained by high computational costs.<n>We propose Flexiffusion, a training-free NAS framework that jointly optimize generation schedules and model architectures without modifying pre-trained parameters.<n>Our work pioneers a resource-efficient paradigm for searching high-speed DMs without sacrificing quality.
arXiv Detail & Related papers (2025-06-03T06:02:50Z)
MLorc: Momentum Low-rank Compression for Memory Efficient Large Language Model Adaptation [24.943207005554246]
We propose a memory-efficient training paradigm called Momentum Low-rank compression (MLorc)<n>The key idea of MLorc is to compress and reconstruct the momentum of matrix parameters during training to reduce memory consumption.
arXiv Detail & Related papers (2025-06-02T17:21:10Z)
CoLA: Compute-Efficient Pre-Training of LLMs via Low-Rank Activation [19.447967755388092]
We propose CoLA and its memory-efficient implementation, CoLA-M, to replace full-size layers with compute-efficient auto-encoders.<n>Experiments on LLaMA models with 60 million to 7 billion parameters show that CoLA reduces the computing cost by $bf 2pmbtimes$.<n>CoLA-M further squeezes memory cost without sacrificing throughput, offering a pre-training approach with collectively superior parameter, computing, and memory efficiency.
arXiv Detail & Related papers (2025-02-16T01:05:16Z)
Forget Forgetting: Continual Learning in a World of Abundant Memory [55.64184779530581]
Continual learning has traditionally focused on minimizing exemplar memory.<n>This paper challenges this paradigm by investigating a more realistic regime.<n>We find that the core challenge shifts from stability to plasticity, as models become biased toward prior tasks and struggle to learn new ones.
arXiv Detail & Related papers (2025-02-11T05:40:52Z)
FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training [51.39495282347475]
We introduce $textttFRUGAL$ ($textbfF$ull-$textbfR$ank $textbfU$pdates with $textbfG$r$textbfA$dient sp$textbfL$itting, a new memory-efficient optimization framework.<n>Our framework can be integrated with various low-rank update selection techniques, including GaLore and BAdam.
arXiv Detail & Related papers (2024-11-12T14:41:07Z)
EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation [84.70637613266835]
EoRA is a fine-tuning-free method that augments compressed Large Language Models with low-rank matrices.<n>EoRA consistently outperforms prior training-free low rank methods in recovering the accuracy of compressed LLMs.
arXiv Detail & Related papers (2024-10-28T17:59:03Z)
Adaptive Layer Selection for Efficient Vision Transformer Fine-Tuning [18.776903525210933]
We introduce an efficient fine-tuning method for ViTs called $textbfALaST$ ($textitAdaptive Layer Selection Fine-Tuning for Vision Transformers$) Our approach is based on the observation that not all layers are equally critical during fine-tuning, and their importance varies depending on the current mini-batch. We show that this adaptive compute allocation enables a nearly-optimal schedule for distributing computational resources.
arXiv Detail & Related papers (2024-08-16T11:27:52Z)
CoMERA: Computing- and Memory-Efficient Training via Rank-Adaptive Tensor Optimization [9.826264204082095]
Training large AI models such as LLMs and DLRMs costs massive GPU and computing time.<n>CoMERA achieves rank-adaptive tensor-compressed (pre)-training via a multi-objective optimization formulation.<n>CoMERA is $2times$ faster per training epoch and $9times$ more memory-efficient than GaLore.
arXiv Detail & Related papers (2024-05-23T09:52:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.