Fugu-MT 論文翻訳(概要): On-Device Fine-Tuning via Backprop-Free Zeroth-Order Optimization

論文の概要: On-Device Fine-Tuning via Backprop-Free Zeroth-Order Optimization

arxiv url: http://arxiv.org/abs/2511.11362v1
Date: Fri, 14 Nov 2025 14:46:29 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-17 22:42:18.65968
Title: On-Device Fine-Tuning via Backprop-Free Zeroth-Order Optimization
Title（参考訳）: バックプロップフリーゼロ階最適化によるオンデバイスファインチューニング
Authors: Prabodh Katti, Sangwoo Park, Bipin Rajendran, Osvaldo Simeone,
Abstract要約: メモリ効率のゼロオーダー最適化(MeZO)はこのボトルネックを軽減する。本稿ではまず,BPおよびMeZOトレーニングで適用可能な相対モデルサイズを理論的に推定する。次に,メモリ上の制約下で,MeZOが精度上の優位性を示すことを示す。
参考スコア（独自算出の注目度）: 27.237134457089194
License: http://creativecommons.org/licenses/by/4.0/
Abstract: On-device fine-tuning is a critical capability for edge AI systems, which must support adaptation to different agentic tasks under stringent memory constraints. Conventional backpropagation (BP)-based training requires storing layer activations and optimizer states, a demand that can be only partially alleviated through checkpointing. In edge deployments in which the model weights must reside entirely in device memory, this overhead severely limits the maximum model size that can be deployed. Memory-efficient zeroth-order optimization (MeZO) alleviates this bottleneck by estimating gradients using forward evaluations alone, eliminating the need for storing intermediate activations or optimizer states. This enables significantly larger models to fit within on-chip memory, albeit at the cost of potentially longer fine-tuning wall-clock time. This paper first provides a theoretical estimate of the relative model sizes that can be accommodated under BP and MeZO training. We then numerically validate the analysis, demonstrating that MeZO exhibits accuracy advantages under on-device memory constraints, provided sufficient wall-clock time is available for fine-tuning.
Abstract（参考訳）: デバイス上の微調整はエッジAIシステムにとって重要な機能であり、拘束的なメモリ制約の下で異なるエージェントタスクへの適応をサポートする必要がある。従来のバックプロパゲーション(BP)ベースのトレーニングでは、レイヤのアクティベーションとオプティマイザステートを格納する必要がある。モデルウェイトがデバイスメモリに完全に格納されなければならないエッジデプロイメントでは、このオーバーヘッドはデプロイ可能な最大モデルサイズを著しく制限する。メモリ効率のゼロオーダー最適化(MeZO)は、フォワード評価だけで勾配を推定し、中間アクティベーションやオプティマイザ状態を保存する必要がなくなることで、このボトルネックを軽減する。これにより、より大型のモデルをオンチップメモリに収めることができる。本稿ではまず,BPおよびMeZOトレーニングで適用可能な相対モデルサイズを理論的に推定する。次に,メモリ上の制約下でMeZOが精度上の優位性を示すことを実証し,その解析を数値的に検証する。

関連論文リスト

The Curious Case of In-Training Compression of State Space Models [49.819321766705514]
ステートスペースモデル(SSM)は、並列化可能なトレーニングと高速推論の両方を提供する。鍵となる設計上の課題は、表現力の最大化と計算負荷の制限の間の適切なバランスを打つことだ。我々のアプローチである textscCompreSSM はリニアリカレントユニットのような線形時間不変SSMに適用されるが、選択モデルにも拡張可能である。
論文参考訳（メタデータ） (2025-10-03T09:02:33Z)
Low-rank Momentum Factorization for Memory Efficient Training [13.464518325870444]
Momentum Factorized (MoFaSGD) は、1次運動量の動的に更新された低ランクSVD表現を維持している。大規模な言語モデルベンチマークにおけるMoFaSGDの有効性を実証し、メモリ削減(例えばLoRA)と性能の競合的なトレードオフを実現する。
論文参考訳（メタデータ） (2025-07-10T18:04:52Z)
MobiZO: Enabling Efficient LLM Fine-Tuning at the Edge via Inference Engines [28.18421624702502]
本稿では,大規模言語モデル(LLM)のための資源効率の高い微調整フレームワークであるMobiZOを紹介する。 MobiZOは、微調整精度を向上しつつ、実行時の大幅な高速化とメモリ節約を実現する。 MobiZOは、微調整精度を向上しつつ、実行時の大幅なスピードアップとメモリ節約を実現している。
論文参考訳（メタデータ） (2024-09-23T20:14:09Z)
AdaZeta: Adaptive Zeroth-Order Tensor-Train Adaption for Memory-Efficient Large Language Models Fine-Tuning [22.950914612765494]
微調整型大規模言語モデル(LLM)は、様々な自然言語処理タスクにおいて顕著なパフォーマンスを実現している。メモリ効率のゼロ階数法(MeZO)は、前方通過のみを使用してLPMを微調整しようとするため、バックプロパゲーショングラフは不要である。本稿では,ZO手法の性能と収束性を改善するために,AdaZeta(Adaptive Zeroth-order-Train Adaption)フレームワークを提案する。
論文参考訳（メタデータ） (2024-06-26T04:33:13Z)
Block Selective Reprogramming for On-device Training of Vision Transformers [12.118303034660531]
本稿では,事前学習したモデルのブロック全体のごく一部のみを微調整するブロック選択型再プログラミング(BSR)を提案する。既存の代替手法と比較して、トレーニングメモリを最大1.4倍、計算コストを最大2倍に削減する。
論文参考訳（メタデータ） (2024-03-25T08:41:01Z)
Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model [89.8764435351222]
分散を低減した行列生成のために, WTA-CRS と呼ばれる新しい非バイアス推定系を提案する。我々の研究は、チューニング変換器の文脈において、提案した推定器が既存のものよりも低い分散を示すという理論的および実験的証拠を提供する。
論文参考訳（メタデータ） (2023-05-24T15:52:08Z)
On-Device Training Under 256KB Memory [62.95579393237751]
本稿では,256KBのメモリでデバイス上でのトレーニングを可能にするアルゴリズム・システム協調設計フレームワークを提案する。私たちのフレームワークは256KBと1MBのFlashで畳み込みニューラルネットワークのデバイス上での小さなトレーニングを可能にする最初のソリューションです。
論文参考訳（メタデータ） (2022-06-30T17:59:08Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。