Fugu-MT 論文翻訳(概要): Reversing Large Language Models for Efficient Training and Fine-Tuning

論文の概要: Reversing Large Language Models for Efficient Training and Fine-Tuning

arxiv url: http://arxiv.org/abs/2512.02056v1
Date: Thu, 27 Nov 2025 19:32:15 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-03 21:04:45.537441
Title: Reversing Large Language Models for Efficient Training and Fine-Tuning
Title（参考訳）: 効率的な訓練と微調整のための大規模言語モデルの逆転
Authors: Eshed Gal, Moshe Eliasof, Javier Turek, Uri Ascher, Eran Treister, Eldad Haber,
Abstract要約: 大きな言語モデル(LLM)は、高価で時間を要する訓練で知られている。対称およびシンプレクティック微分方程式に着想を得たLLMのメモリ効率・可逆的アーキテクチャを提案する。その結果、いくつかのデータセットとベンチマークで同等または改善されたパフォーマンスを示しました。
参考スコア（独自算出の注目度）: 24.232966507637673
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) are known for their expensive and time-consuming training. Thus, oftentimes, LLMs are fine-tuned to address a specific task, given the pretrained weights of a pre-trained LLM considered a foundation model. In this work, we introduce memory-efficient, reversible architectures for LLMs, inspired by symmetric and symplectic differential equations, and investigate their theoretical properties. Different from standard, baseline architectures that store all intermediate activations, the proposed models use time-reversible dynamics to retrieve hidden states during backpropagation, relieving the need to store activations. This property allows for a drastic reduction in memory consumption, allowing for the processing of larger batch sizes for the same available memory, thereby offering improved throughput. In addition, we propose an efficient method for converting existing, non-reversible LLMs into reversible architectures through fine-tuning, rendering our approach practical for exploiting existing pre-trained models. Our results show comparable or improved performance on several datasets and benchmarks, on several LLMs, building a scalable and efficient path towards reducing the memory and computational costs associated with both training from scratch and fine-tuning of LLMs.
Abstract（参考訳）: 大きな言語モデル(LLM)は、高価で時間を要する訓練で知られている。したがって、LLMは基礎モデルと見なされる事前訓練されたLLMの重量を考えると、特定のタスクに対処するために微調整されることが多い。本研究では, 対称およびシンプレクティック微分方程式に着想を得て, LLMのメモリ効率, 可逆的アーキテクチャを導入し, その理論的性質について検討する。全ての中間アクティベーションを格納する標準のベースラインアーキテクチャとは異なり、提案モデルはバックプロパゲーション中に隠された状態を取得するために時間可逆ダイナミクスを使用し、アクティベーションを格納する必要がなくなる。この特性により、メモリ消費が大幅に削減され、同じ利用可能なメモリに対してより大きなバッチサイズの処理が可能となり、スループットが向上する。さらに,既存の非可逆LPMを微調整により可逆的アーキテクチャに変換する効率的な手法を提案する。以上の結果から,LLMのスクラッチと微調整の両方による学習に伴うメモリと計算コストの削減に向けた,スケーラブルで効率的な経路を構築するために,複数のLLMのデータセットやベンチマークで同等あるいは改善された性能を示す。

論文の概要: Reversing Large Language Models for Efficient Training and Fine-Tuning

関連論文リスト