Fugu-MT 論文翻訳(概要): Frustratingly Easy Task-aware Pruning for Large Language Models

論文の概要: Frustratingly Easy Task-aware Pruning for Large Language Models

arxiv url: http://arxiv.org/abs/2510.22489v1
Date: Sun, 26 Oct 2025 02:09:22 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-28 19:54:32.523325
Title: Frustratingly Easy Task-aware Pruning for Large Language Models
Title（参考訳）: 大規模言語モデルに対する難易度の高いタスク認識プルーニング
Authors: Yuanhe Tian, Junjie Liu, Xican Yang, Haishan Ye, Yan Song,
Abstract要約: 大規模言語モデル(LLM)に対する単純かつ効果的なプルーニング手法を提案する。本フレームワークは,汎用校正データとタスク固有の校正データの両方を用いて,重要度を算出している。広く使われているベンチマークの実験は、我々のアプローチが効果的であり、一貫してベースラインを上回っていることを示している。
参考スコア（独自算出の注目度）: 33.84349099489764
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Pruning provides a practical solution to reduce the resources required to run large language models (LLMs) to benefit from their effective capabilities as well as control their cost for training and inference. Research on LLM pruning often ranks the importance of LLM parameters using their magnitudes and calibration-data activations and removes (or masks) the less important ones, accordingly reducing LLMs' size. However, these approaches primarily focus on preserving the LLM's ability to generate fluent sentences, while neglecting performance on specific domains and tasks. In this paper, we propose a simple yet effective pruning approach for LLMs that preserves task-specific capabilities while shrinking their parameter space. We first analyze how conventional pruning minimizes loss perturbation under general-domain calibration and extend this formulation by incorporating task-specific feature distributions into the importance computation of existing pruning algorithms. Thus, our framework computes separate importance scores using both general and task-specific calibration data, partitions parameters into shared and exclusive groups based on activation-norm differences, and then fuses their scores to guide the pruning process. This design enables our method to integrate seamlessly with various foundation pruning techniques and preserve the LLM's specialized abilities under compression. Experiments on widely used benchmarks demonstrate that our approach is effective and consistently outperforms the baselines with identical pruning ratios and different settings.
Abstract（参考訳）: プルーニングは、大きな言語モデル(LLM)を実行するのに必要なリソースを減らすための実用的なソリューションを提供する。 LLMプルーニングの研究は、その大きさとキャリブレーションデータアクティベーションを用いてLLMパラメータの重要性をランク付けし、LLMのサイズを減らすために重要でないパラメータを除去(またはマスク)する。しかしながら、これらのアプローチは主に、特定のドメインやタスクのパフォーマンスを無視しながら、LLMが流動的な文を生成する能力を維持することに重点を置いている。本稿では, LLM に対して, パラメータ空間を小さくしながら, タスク固有性を保ちつつ, 単純かつ効果的なプルーニング手法を提案する。まず,従来のプルーニングは,一般領域のキャリブレーション下での損失摂動を最小限に抑え,タスク固有の特徴分布を既存のプルーニングアルゴリズムの重要計算に組み込むことで,この定式化を拡張した。そこで,本フレームワークは,一般的な校正データとタスク固有の校正データを用いて,アクティベーションノルム差に基づいてパラメータを共有グループと排他グループに分割し,それらのスコアを融合してプルーニングプロセスを導出する。本設計により, 各種基礎刈り技術とシームレスに統合し, 圧縮下でのLLMの特殊能力の維持が可能となる。広範に使用されているベンチマーク実験により,我々のアプローチは有効であり,同一のプルーニング比と異なる設定でベースラインを一貫して上回っていることが示された。

論文の概要: Frustratingly Easy Task-aware Pruning for Large Language Models

関連論文リスト