Fugu-MT 論文翻訳(概要): ChunkFT: Byte-Streamed Optimization for Memory-Efficient Full Fine-Tuning

論文の概要: ChunkFT: Byte-Streamed Optimization for Memory-Efficient Full Fine-Tuning

arxiv url: http://arxiv.org/abs/2605.21177v1
Date: Wed, 20 May 2026 13:44:44 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-21 19:19:56.70014
Title: ChunkFT: Byte-Streamed Optimization for Memory-Efficient Full Fine-Tuning
Title（参考訳）: ChunkFT: メモリ効率の良いフルファインチューニングのためのバイトストリーム最適化
Authors: Yongkang Liu, Zijing Wang, Mengjie Zhao, Ercong Nie, Mingyang Wang, Qian Li, Feiliang Ren, Shi Feng, Daling Wang, Hinrich Schütze,
Abstract要約: textscChunkFTはメモリ効率の良い微調整フレームワークである。 textscChunkFTは、ネットワークアーキテクチャを変更することなく任意のサブテンソルの勾配計算を可能にする。 textscChunkFTは、既存のメモリ効率のベースラインを一貫して上回る。
参考スコア（独自算出の注目度）: 58.54940026861599
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This work presents \textsc{ChunkFT}, a memory-efficient fine-tuning framework that reformulates full-parameter fine-tuning around a dynamically activated working set. \textsc{ChunkFT} enables gradient computation for arbitrary sub-tensors without modifying the network architecture, providing an algorithmic foundation for optimizing arbitrary sub-networks while avoiding standard dense gradient computation. We provide a theoretical convergence analysis of \textsc{ChunkFT} in the deterministic setting. Empirically, we apply \textsc{ChunkFT} to fine-tune Llama 3-8B and Llama 3-70B using a single RTX 4090-24GB GPU and 2$\times$ H800-80GB GPUs, respectively. Full-parameter fine-tuning of a 7B model with a 1K input length requires only 13.72GB of GPU memory. The results demonstrate the effectiveness of \textsc{ChunkFT} in memory usage, running time, and optimization quality. Moreover, downstream evaluations on language understanding, mathematical reasoning, and MT-Bench show that \textsc{ChunkFT} consistently outperforms existing memory-efficient baselines. Notably, \textsc{ChunkFT} achieves performance comparable to, and in some cases exceeding, full-parameter fine-tuning. Our repository is on https://github.com/misonsky/chunk.
Abstract（参考訳）: この研究は、動的に活性化されたワーキングセットの周りにフルパラメータの微調整を再構成するメモリ効率の良い微調整フレームワークである「textsc{ChunkFT}」を提示する。 \textsc{ChunkFT} は、ネットワークアーキテクチャを変更することなく任意のサブテンソルの勾配計算を可能にし、標準的な勾配計算を回避しつつ任意のサブネットワークを最適化するアルゴリズム基盤を提供する。本稿では, 決定論的条件下での textsc{ChunkFT} の理論的収束解析について述べる。経験的には、単一RTX 4090-24GB GPUと2$\times$ H800-80GB GPUを用いて、Llama 3-8B と Llama 3-70B に \textsc{ChunkFT} を適用する。 1K入力長の7Bモデルのフルパラメータ細調整には、わずか13.72GBのGPUメモリが必要である。その結果, メモリ使用量, 実行時間, 最適化品質において, textsc{ChunkFT} の有効性が示された。さらに、言語理解、数学的推論、MT-Benchのダウンストリーム評価は、既存のメモリ効率のベースラインを一貫して上回っていることを示している。特に、 \textsc{ChunkFT} は、フルパラメータの微調整に匹敵するパフォーマンスを達成する。私たちのリポジトリはhttps://github.com/misonsky/chunk.comにある。

論文の概要: ChunkFT: Byte-Streamed Optimization for Memory-Efficient Full Fine-Tuning

関連論文リスト