Fugu-MT 論文翻訳(概要): Dynamic Low-Rank Sparse Adaptation for Large Language Models

論文の概要: Dynamic Low-Rank Sparse Adaptation for Large Language Models

arxiv url: http://arxiv.org/abs/2502.14816v1
Date: Thu, 20 Feb 2025 18:37:32 GMT
ステータス: 翻訳完了
システム内更新日: 2025-02-21 22:18:11.81271
Title: Dynamic Low-Rank Sparse Adaptation for Large Language Models
Title（参考訳）: 大規模言語モデルに対する動的低ランクスパース適応
Authors: Weizhong Huang, Yuxin Zhang, Xiawu Zheng, Yang Liu, Jing Lin, Yiwu Yao, Rongrong Ji,
Abstract要約: Low-rank Sparse Adaptation (LoSA)は、低ランク適応をsparse LLM sparsityにシームレスに統合する新しい手法である。 LoSAは、微調整中に対応するスパース重みに基づいてLoRA結果を動的に分散する。 LoSAは、追加の推論負荷を伴わずに、スパースLSMの有効性を数時間で効果的に向上させることができる。
参考スコア（独自算出の注目度）: 54.1231638555233
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite the efficacy of network sparsity in alleviating the deployment strain of Large Language Models (LLMs), it endures significant performance degradation. Applying Low-Rank Adaptation (LoRA) to fine-tune the sparse LLMs offers an intuitive approach to counter this predicament, while it holds shortcomings include: 1) The inability to integrate LoRA weights into sparse LLMs post-training, and 2) Insufficient performance recovery at high sparsity ratios. In this paper, we introduce dynamic Low-rank Sparse Adaptation (LoSA), a novel method that seamlessly integrates low-rank adaptation into LLM sparsity within a unified framework, thereby enhancing the performance of sparse LLMs without increasing the inference latency. In particular, LoSA dynamically sparsifies the LoRA outcomes based on the corresponding sparse weights during fine-tuning, thus guaranteeing that the LoRA module can be integrated into the sparse LLMs post-training. Besides, LoSA leverages Representation Mutual Information (RMI) as an indicator to determine the importance of layers, thereby efficiently determining the layer-wise sparsity rates during fine-tuning. Predicated on this, LoSA adjusts the rank of the LoRA module based on the variability in layer-wise reconstruction errors, allocating an appropriate fine-tuning for each layer to reduce the output discrepancies between dense and sparse LLMs. Extensive experiments tell that LoSA can efficiently boost the efficacy of sparse LLMs within a few hours, without introducing any additional inferential burden. For example, LoSA reduced the perplexity of sparse LLaMA-2-7B by 68.73 and increased zero-shot accuracy by 16.32$\%$, achieving a 2.60$\times$ speedup on CPU and 2.23$\times$ speedup on GPU, requiring only 45 minutes of fine-tuning on a single NVIDIA A100 80GB GPU. Code is available at https://github.com/wzhuang-xmu/LoSA.
Abstract（参考訳）: LLM(Large Language Models)の展開ひずみを緩和するネットワーク幅の有効性にもかかわらず、性能の大幅な低下に耐える。 Low-Rank Adaptation (LoRA) を適用してスパースLSMを微調整することで、この問題に対処するための直感的なアプローチを提供する。 1)LoRA重量計を訓練後スパースLLMに組み込むことができないこと、及び 2) 高疎度比での性能回復が不十分であった。本稿では,低ランク適応をLLM空間にシームレスに統合する動的低ランクスパース適応(LoSA)を提案する。特に、LoSAは微調整中に対応するスパース重みに基づいてLoRA結果を動的に分散し、LoRAモジュールがスパースLLMに統合されることを保証する。さらに、LoSAはRepresentation Mutual Information (RMI) をレイヤの重要性を判断する指標として活用し、微調整中のレイヤワイドの間隔率を効率的に決定する。これに基づいて、LoSAは層単位での再構成誤差のばらつきに基づいてLoRAモジュールのランクを調整し、各層に適切な微調整を割り当て、密度とスパースLLM間の出力差を低減する。大規模な実験により、LoSAは、余分な推論負荷を伴わずに、スパースLSMの有効性を数時間で効果的に向上させることができることが判明した。例えば、LoSAはスパースLLaMA-2-7Bの難易度を68.73削減し、ゼロショット精度を16.32$\%$、CPUの2.60$\times$スピードアップとGPUの2.23$\times$スピードアップを達成した。コードはhttps://github.com/wzhuang-xmu/LoSA.comで入手できる。

論文の概要: Dynamic Low-Rank Sparse Adaptation for Large Language Models

関連論文リスト