Fugu-MT 論文翻訳(概要): LongCodeZip: Compress Long Context for Code Language Models

論文の概要: LongCodeZip: Compress Long Context for Code Language Models

arxiv url: http://arxiv.org/abs/2510.00446v1
Date: Wed, 01 Oct 2025 02:54:57 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-03 16:59:20.34193
Title: LongCodeZip: Compress Long Context for Code Language Models
Title（参考訳）: LongCodeZip: コード言語モデルのロングコンテキスト圧縮
Authors: Yuling Shi, Yichun Qian, Hongyu Zhang, Beijun Shen, Xiaodong Gu,
Abstract要約: LongCodeZipは、LLM(Large Language Models)用に設計された新しいプラグアンドプレイコード圧縮フレームワークである。重要な情報を保持しながらコンテキストサイズを効果的に削減することで、LongCodeZipはLLMを現実世界の大規模コードシナリオに拡張することができる。
参考スコア（独自算出の注目度）: 16.940525379087326
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Code generation under long contexts is becoming increasingly critical as Large Language Models (LLMs) are required to reason over extensive information in the codebase. While recent advances enable code LLMs to process long inputs, high API costs and generation latency remain substantial bottlenecks. Existing context pruning techniques, such as LLMLingua, achieve promising results for general text but overlook code-specific structures and dependencies, leading to suboptimal performance in programming tasks. In this paper, we propose LongCodeZip, a novel plug-and-play code compression framework designed specifically for code LLMs. LongCodeZip employs a dual-stage strategy: (1) coarse-grained compression, which identifies and ranks function-level chunks using conditional perplexity with respect to the instruction, retaining only the most relevant functions; and (2) fine-grained compression, which segments retained functions into blocks based on perplexity and selects an optimal subset under an adaptive token budget to maximize relevance. Evaluations across multiple tasks, including code completion, summarization, and question answering, show that LongCodeZip consistently outperforms baseline methods, achieving up to a 5.6x compression ratio without degrading task performance. By effectively reducing context size while preserving essential information, LongCodeZip enables LLMs to better scale to real-world, large-scale code scenarios, advancing the efficiency and capability of code intelligence applications.
Abstract（参考訳）: 長いコンテキスト下でのコード生成は、コードベースの広範な情報を引き継ぐためにLLM(Large Language Models)が要求されるにつれて、ますます重要になっている。最近の進歩により、LLMは長いインプットを処理できるが、高いAPIコストと生成遅延は重大なボトルネックのままである。 LLMLinguaのような既存のコンテキストプルーニング技術は、一般的なテキストに対して有望な結果をもたらすが、コード固有の構造や依存関係を見落とし、プログラミングタスクにおける準最適パフォーマンスをもたらす。本稿では,LongCodeZipを提案する。LongCodeZipは,LLMに特化して設計された,新しいプラグアンドプレイコード圧縮フレームワークである。 LongCodeZipは、(1)命令に関して条件付きパープレキティを用いて関数レベルのチャンクを識別・ランク付けし、最も関連性の高い関数のみを保持する粗粒圧縮、(2)パープレキティに基づいて関数をブロックに分割し、適応トークン予算の下で最適なサブセットを選択して関連性を最大化する微粒圧縮という2段階の戦略を採用している。コード補完、要約、質問応答を含む複数のタスクに対する評価は、LongCodeZipがタスクパフォーマンスを劣化させることなく5.6倍の圧縮率で、ベースラインメソッドを一貫して上回っていることを示している。重要な情報を保持しながらコンテキストサイズを効果的に削減することにより、LongCodeZipはLLMを現実の大規模コードシナリオにスケールし、コードインテリジェンスアプリケーションの効率と能力を向上させる。

論文の概要: LongCodeZip: Compress Long Context for Code Language Models

関連論文リスト