Fugu-MT 論文翻訳(概要): Boosting Parameter Efficiency in LLM-Based Recommendation through Sophisticated Pruning

論文の概要: Boosting Parameter Efficiency in LLM-Based Recommendation through Sophisticated Pruning

arxiv url: http://arxiv.org/abs/2507.07064v1
Date: Wed, 09 Jul 2025 17:26:10 GMT
ステータス: 翻訳完了
システム内更新日: 2025-07-10 17:37:43.707003
Title: Boosting Parameter Efficiency in LLM-Based Recommendation through Sophisticated Pruning
Title（参考訳）: ソフシフィケートプルーニングによるLLM勧告のブーピングパラメータ効率
Authors: Shanle Zheng, Keqin Bao, Jizhi Zhang, Yang Zhang, Fuli Feng, Xiangnan He,
Abstract要約: この研究は、レコメンデーション品質を維持しながら効率を向上させるために刈り取りを探求する。層内および層内プルーニングの両方を統合したよりきめ細かいプルーニング手法を提案する。提案手法は,非埋め込みパラメータの95%以上を刈り取りながら,元のモデルの性能の88%を平均的に達成する。
参考スコア（独自算出の注目度）: 44.747749293948864
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: LLM-based recommender systems have made significant progress; however, the deployment cost associated with the large parameter volume of LLMs still hinders their real-world applications. This work explores parameter pruning to improve parameter efficiency while maintaining recommendation quality, thereby enabling easier deployment. Unlike existing approaches that focus primarily on inter-layer redundancy, we uncover intra-layer redundancy within components such as self-attention and MLP modules. Building on this analysis, we propose a more fine-grained pruning approach that integrates both intra-layer and layer-wise pruning. Specifically, we introduce a three-stage pruning strategy that progressively prunes parameters at different levels and parts of the model, moving from intra-layer to layer-wise pruning, or from width to depth. Each stage also includes a performance restoration step using distillation techniques, helping to strike a balance between performance and parameter efficiency. Empirical results demonstrate the effectiveness of our approach: across three datasets, our models achieve an average of 88% of the original model's performance while pruning more than 95% of the non-embedding parameters. This underscores the potential of our method to significantly reduce resource requirements without greatly compromising recommendation quality. Our code will be available at: https://github.com/zheng-sl/PruneRec
Abstract（参考訳）: LLMベースのレコメンデータシステムは大きな進歩を遂げているが、LLMの膨大なパラメータ量に関連するデプロイメントコストは、まだ実世界のアプリケーションを妨げる。本研究は,パラメータのプルーニングによるパラメータの効率向上と,推奨品質の維持,デプロイメントの容易化を目的としている。主に層間冗長性に焦点を当てた既存のアプローチとは異なり、自己アテンションやMLPモジュールなどのコンポーネント内の層内冗長性を明らかにする。この分析に基づいて, 層内および層内プルーニングの両方を統合する, よりきめ細かいプルーニング手法を提案する。具体的には,層内プルーニングから層内プルーニング,幅から深さまで,モデルの異なるレベルと部分でパラメータを段階的にプルーニングする3段階プルーニング戦略を導入する。それぞれのステージには蒸留技術を使用した性能回復ステップが含まれており、性能とパラメータ効率のバランスを取るのに役立つ。 3つのデータセットを通して、我々のモデルは元のモデルの性能の88%を平均で達成し、非埋め込みパラメータの95%以上を刈り取っています。これにより,提案手法が推奨品質を大幅に損なうことなく,資源要求を大幅に削減できる可能性が示唆された。私たちのコードは、https://github.com/zheng-sl/PruneRecで利用可能になります。

論文の概要: Boosting Parameter Efficiency in LLM-Based Recommendation through Sophisticated Pruning

関連論文リスト