Fugu-MT 論文翻訳(概要): Prompt Compression in Diffusion Large Language Models: Evaluating LLMLingua-2 on LLaDA

論文の概要: Prompt Compression in Diffusion Large Language Models: Evaluating LLMLingua-2 on LLaDA

arxiv url: http://arxiv.org/abs/2605.17932v1
Date: Mon, 18 May 2026 06:39:10 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-19 17:57:48.927502
Title: Prompt Compression in Diffusion Large Language Models: Evaluating LLMLingua-2 on LLaDA
Title（参考訳）: 拡散大言語モデルにおけるプロンプト圧縮:LLaDA上でのLLMLingua-2の評価
Authors: Sterling Huang, Abigayle Brown, Jiyoo Noh, Jiakang Xu, Wantong Huo, Kaung Myat Kyaw, Jonathan Chan,
Abstract要約: 本研究では,LLMLingua-2を用いた拡散大言語モデル (DLLM) への高速圧縮転送が有効であるかどうかを検討する。我々は,GSM8K,DUC2004,ShareGPTの圧縮性能を,約2$times$圧縮比でデータセット当たり250プロンプトを用いて評価した。
参考スコア（独自算出の注目度）: 0.8135412538980287
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Prompt compression reduces inference cost and context length in large language models, but prior evaluations focus primarily on autoregressive architectures. This study investigates whether prompt compression transfers effectively to diffusion large language models (DLLMs) using LLMLingua-2, specifically the 8B-parameter DLLM LLaDA. We evaluate compression performance on GSM8K, DUC2004, and ShareGPT using 250 prompts per dataset at an approximate 2$\times$ compression ratio, across mathematical reasoning, prompt reconstruction, and summarization tasks. Outputs generated from original prompts, compressed prompts, reconstructed prompts, and reconstructed-prompt reasoning were compared using exact-match accuracy, BLEU, ROUGE, and BERTScore. Results show that semantic preservation does not necessarily imply stable downstream behavior in diffusion models. Summarization tasks remained comparatively robust under compression, while mathematical reasoning degraded substantially despite high semantic similarity scores. Reconstruction experiments further showed that semantically similar prompts may still omit reasoning-critical information required for stable denoising. Across tasks, BERTScore recall was consistently lower than precision, suggesting that compression failures are primarily driven by information omission rather than semantic drift. These findings indicate that prompt compression methods designed for autoregressive models do not transfer uniformly to diffusion large language models and motivate the development of diffusion-aware compression strategies.
Abstract（参考訳）: プロンプト圧縮は、大きな言語モデルにおける推論コストと文脈長を削減するが、事前評価は主に自己回帰アーキテクチャに焦点を当てている。本研究では,LLMLingua-2,特に8BパラメータDLLM LLaDAを用いた拡散大言語モデル(DLLM)への高速圧縮転送について検討した。本稿では,GSM8K,DUC2004,ShareGPTの圧縮性能を,約2$\times$圧縮比で評価した。元のプロンプト、圧縮プロンプト、再構成プロンプト、再構成されたプロンプト推論から生成された出力を、正確なマッチング精度、BLEU、ROUGE、BERTScoreを用いて比較した。以上の結果から,拡散モデルにおいて意味的保存が必ずしも下流の挙動を安定させるとは限らないことが示唆された。要約タスクは圧縮下で比較的頑健であり、数学的推論は意味的類似性スコアが高いにもかかわらず著しく低下した。レコンストラクション実験により、意味論的に類似したプロンプトは、安定な復調に必要な推論クリティカルな情報を省略する可能性があることが示された。タスク全体にわたってBERTScoreのリコールは精度よりも一貫して低く、圧縮の失敗は主にセマンティックドリフトではなく情報欠落によって引き起こされていることを示唆している。これらの結果から, 自己回帰モデル用に設計された即時圧縮手法は, 拡散型大言語モデルに一様に伝達せず, 拡散型圧縮戦略の開発を動機付けていることが明らかとなった。

論文の概要: Prompt Compression in Diffusion Large Language Models: Evaluating LLMLingua-2 on LLaDA

関連論文リスト