Fugu-MT 論文翻訳(概要): FrugalPrompt: Reducing Contextual Overhead in Large Language Models via Token Attribution

論文の概要: FrugalPrompt: Reducing Contextual Overhead in Large Language Models via Token Attribution

arxiv url: http://arxiv.org/abs/2510.16439v2
Date: Wed, 22 Oct 2025 04:39:03 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-25 03:08:11.818261
Title: FrugalPrompt: Reducing Contextual Overhead in Large Language Models via Token Attribution
Title（参考訳）: FrugalPrompt: トークン属性による大規模言語モデルのコンテキストオーバーヘッド削減
Authors: Syed Rifat Raiyan, Md Farhan Ishmam, Abdullah Al Imran, Mohammad Ali Moni,
Abstract要約: 大規模言語モデル(LLM)は、その恒星の性能の大部分を入力コンテキストの拡大に負っているが、そのような冗長性は金銭的コスト、炭素フットプリント、推論時間の遅延を膨らませている。本稿では,LLMのための新しいプロンプト圧縮フレームワークであるFrugalPromptを紹介する。我々は,4つのNLPタスク(感性分析,コモンセンスQA,要約,数学的推論)にまたがるアプローチを評価する。
参考スコア（独自算出の注目度）: 3.4666771782038652
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) owe much of their stellar performance to expansive input contexts, yet such verbosity inflates monetary costs, carbon footprint, and inference-time latency. Much of this overhead manifests from the redundant low-utility tokens present in typical prompts, as only a fraction of tokens typically carries the majority of the semantic weight. We address this inefficiency by introducing FrugalPrompt, a novel prompt compression framework for LLMs, which retains only the most semantically significant tokens. Leveraging two state-of-the-art token attribution methods, GlobEnc and DecompX, we assign salience scores to every token in an input sequence, rank them to preserve the top-k% tokens in their original order, and obtain a sparse frugalized prompt. We evaluate the approach across four NLP tasks: Sentiment Analysis, Commonsense QA, Summarization, and Mathematical Reasoning, using a suite of frontier LLMs. For the first three tasks, a 20% prompt reduction incurs only a marginal loss in task performance, demonstrating that contemporary LLMs can reconstruct elided context from high-salience cues. In contrast, performance on mathematical reasoning deteriorates sharply, reflecting a stronger dependence on complete token continuity. Further analysis with bottom-k% and random-k% tokens reveals asymmetric performance patterns that may suggest potential task contamination effects, wherein models may resort to shallow memorized patterns from pretraining exposure for conventional NLP tasks. We posit that our work contributes to a more nuanced understanding of LLM behavior in performance-efficiency trade-offs, and delineate the boundary between tasks tolerant to contextual sparsity and those requiring exhaustive context. Our source code and models are available at: https://github.com/Starscream-11813/Frugal-ICL.
Abstract（参考訳）: 大規模言語モデル(LLM)は、その恒星の性能の大部分を入力コンテキストの拡大に負っているが、そのような冗長性は金銭的コスト、炭素フットプリント、推論時間の遅延を膨らませている。このようなオーバーヘッドの多くは、典型的なプロンプトに存在する冗長な低ユーティリティトークンから生じており、トークンのごく一部だけが一般的に意味的な重みの大部分を担っている。 FrugalPromptはLLMのための新しいプロンプト圧縮フレームワークで、意味的に重要なトークンのみを保持する。 2つの最先端トークン属性手法であるGlobEncとDecompXを利用して、入力シーケンス内の全てのトークンにサリエンススコアを割り当て、それらをランク付けして、上位k%トークンを元の順序で保存し、スパースフレガライズされたプロンプトを得る。我々は,4つのNLPタスク,すなわち感性分析,コモンセンスQA,要約,数学的推論のアプローチを,フロンティアLLMを用いて評価した。最初の3つのタスクでは、20%の迅速な削減はタスク性能の限界損失しか生じず、現代のLCMは高可用性のキューから解離したコンテキストを再構築できることを示した。対照的に、数学的推論の性能は急激に悪化し、完全なトークン連続性への強い依存を反映している。ボトム-k%およびランダム-k%トークンを用いたさらなる分析では、非対称な性能パターンが示され、従来型のNLPタスクに対する事前学習による暗記パターンをモデルが利用することができる。我々の研究は、パフォーマンス効率トレードオフにおけるLCMの振る舞いのより微妙な理解に寄与していると仮定し、状況に寛容なタスクと徹底したコンテキストを必要とするタスクの境界を明確にする。ソースコードとモデルについては、https://github.com/Starscream-11813/Frugal-ICL.comで公開しています。

論文の概要: FrugalPrompt: Reducing Contextual Overhead in Large Language Models via Token Attribution

関連論文リスト