Fugu-MT 論文翻訳(概要): AgentCompress: Task-Aware Compression for Affordable Large Language Model Agents

論文の概要: AgentCompress: Task-Aware Compression for Affordable Large Language Model Agents

arxiv url: http://arxiv.org/abs/2601.05191v2
Date: Mon, 12 Jan 2026 18:25:18 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-13 15:02:56.563359
Title: AgentCompress: Task-Aware Compression for Affordable Large Language Model Agents
Title（参考訳）: AgentCompress: Affordable Large Language Model Agentsのためのタスク認識圧縮
Authors: Zuhair Ahmed Khan Taha, Mohammed Mudassir Uddin, Shahnawaz Alam,
Abstract要約: 70ビリオンパラメータモデルを使用した1セッションは、クラウドコンピューティングの料金が約127ドルである。本稿では,タスク認識型動的圧縮によってこの問題に対処するフレームワークであるAgentCompressを提案する。計算コストは68.3%減少し、当初の成功率の96.2%を維持した。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models hold considerable promise for various applications, but their computational requirements create a barrier that many institutions cannot overcome. A single session using a 70-billion-parameter model can cost around $127 in cloud computing fees, which puts these tools out of reach for organizations operating on limited budgets. We present AgentCompress, a framework that tackles this problem through task-aware dynamic compression. The idea comes from a simple observation: not all tasks require the same computational effort. Complex reasoning, for example, is far more demanding than text reformatting, yet conventional compression applies the same reduction to both. Our approach uses a lightweight neural controller that looks at the first few tokens of each request, estimates how complex the task will be, and sends it to an appropriately quantized version of the model. This routing step adds only about 12 milliseconds of overhead. We tested the framework on 290 multi-stage workflows from domains including computer science, physics, chemistry, and biology. The results show a 68.3% reduction in computational costs while preserving 96.2% of the original success rate. These findings suggest that routing queries intelligently can make powerful language models substantially more affordable without sacrificing output quality
Abstract（参考訳）: 大規模言語モデルは、様々なアプリケーションに対してかなりの保証を持っているが、その計算要求は、多くの機関が克服できない障壁を生み出している。 70ビリオンのパラメータモデルを使用した1回のセッションでは、クラウドコンピューティングの料金が約127ドルになる。本稿では,タスク認識型動的圧縮によってこの問題に対処するフレームワークであるAgentCompressを提案する。すべてのタスクが同じ計算作業を必要とするわけではない。例えば、複雑な推論は、テキストの再フォーマットよりもはるかに要求が多いが、従来の圧縮は両方に同じ削減を適用している。弊社のアプローチでは、軽量なニューラルコントローラを使って、各リクエストの最初の数個のトークンを調べ、そのタスクがどれだけ複雑であるかを推定し、それをモデルの適切な定量化バージョンに送信する。このルーティングステップでは、オーバーヘッドはわずか12ミリ秒に過ぎません。コンピュータサイエンス、物理、化学、生物学といった分野から290のマルチステージワークフローでこのフレームワークをテストしました。その結果、計算コストは68.3%削減され、元の成功率の96.2%が維持された。これらの結果は、ルーティングクエリをインテリジェントにルーティングすることで、出力品質を犠牲にすることなく、強力な言語モデルを大幅に手頃な価格にすることができることを示唆している。

論文の概要: AgentCompress: Task-Aware Compression for Affordable Large Language Model Agents

関連論文リスト