Fugu-MT 論文翻訳(概要): D-QRELO: Training- and Data-Free Delta Compression for Large Language Models via Quantization and Residual Low-Rank Approximation

論文の概要: D-QRELO: Training- and Data-Free Delta Compression for Large Language Models via Quantization and Residual Low-Rank Approximation

arxiv url: http://arxiv.org/abs/2604.16940v1
Date: Sat, 18 Apr 2026 09:52:18 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-21 21:52:52.250317
Title: D-QRELO: Training- and Data-Free Delta Compression for Large Language Models via Quantization and Residual Low-Rank Approximation
Title（参考訳）: D-QRELO:量子化と残差低ランク近似による大規模言語モデルの訓練・データ自由デルタ圧縮
Authors: Junlin Li, Shuangyong Song, Guodong Du, Ngai Wong, Xuebo Liu, Yongxiang Li, Min Zhang, Jing Li, Xuelong Li,
Abstract要約: Supervised Fine-Tuning (SFT) はタスク固有の大規模言語モデル (LLM) の開発を加速するが、細調整モデルの増加によってメモリオーバーヘッドが大幅に増大する。本稿ではDQRELO(Delta Compression via Quantization and Residual Low-Rank)を提案する。粗い1ビットの量子化を組み合わせ、デルタの支配的な構造を捉える。
参考スコア（独自算出の注目度）: 78.32916244416033
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Supervised Fine-Tuning (SFT) accelerates taskspecific large language models (LLMs) development, but the resulting proliferation of finetuned models incurs substantial memory overhead. Delta compression addresses this by retaining a single pre-trained LLM with multiple compressed delta weights. However, existing methods fail on models fine-tuned with largescale datasets. We find that larger SFT data scale amplifies delta parameter magnitude, singular values, and entropy, exacerbating compression errors. To tackle this, we propose DQRELO (Delta Compression via Quantization and Residual Low-Rank), a novel training- and data-free delta compression method. It combines coarse-grained one-bit quantization to capture the dominant structure of the delta, followed by compensated residual low-rank approximation to recover fine-grained details from the smaller residual error. Experiments on various LLMs spanning dense and MoE architectures across multiple domains under this challenging setting demonstrate that DQRELO outperforms existing methods. Moreover, we establish key design principles for delta compression through extensive empirical analysis, demonstrating how task difficulty, architecture, and layer positioning create predictable patterns that can guide optimal compression strategies in production systems.
Abstract（参考訳）: Supervised Fine-Tuning (SFT) はタスク固有の大規模言語モデル (LLM) の開発を加速するが、細調整モデルの増加によってメモリオーバーヘッドが大幅に増大する。デルタ圧縮は、複数の圧縮デルタ重みを持つ1つの事前訓練LDMを保持することでこの問題に対処する。しかし、既存の手法は大規模なデータセットで微調整されたモデルでは失敗する。より大きなSFTデータスケールは、デルタパラメータの等級、特異値、エントロピーを増幅し、圧縮誤差を悪化させる。そこで我々はDQRELO(Delta Compression via Quantization and Residual Low-Rank)を提案する。粗い1ビットの量子化を組み合わせ、デルタの支配的な構造を捉える。この挑戦的な設定の下で、高密度およびMoEアーキテクチャにまたがる様々なLLMの実験は、DQRELOが既存のメソッドより優れていることを示す。さらに, デルタ圧縮の鍵となる設計原理を実証分析により確立し, 作業難易度, アーキテクチャ, レイヤ位置決めが, 生産システムにおける最適圧縮戦略を導出する予測可能なパターンをいかに生み出すかを示す。

論文の概要: D-QRELO: Training- and Data-Free Delta Compression for Large Language Models via Quantization and Residual Low-Rank Approximation

関連論文リスト