Fugu-MT 論文翻訳(概要): The Hidden Cost of Readability: How Code Formatting Silently Consumes Your LLM Budget

論文の概要: The Hidden Cost of Readability: How Code Formatting Silently Consumes Your LLM Budget

arxiv url: http://arxiv.org/abs/2508.13666v1
Date: Tue, 19 Aug 2025 09:13:48 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-20 15:36:31.871966
Title: The Hidden Cost of Readability: How Code Formatting Silently Consumes Your LLM Budget
Title（参考訳）: 読みやすさの隠れたコスト:LLM予算の無駄な計算方法
Authors: Dangfeng Pan, Zhensu Sun, Cenyuan Zhang, David Lo, Xiaoning Du,
Abstract要約: コードフォーマッティングが大規模言語モデル(LLM)の性能と効率に与える影響を評価する。主要な発見は、LLMがフォーマットされたコードと未フォーマットのコード間で性能を維持することができ、平均的な入力トークンの減少が24.5%に達することを示唆している。書式処理のための双方向コード変換ツールを開発し,既存の推論にシームレスに統合できる。
参考スコア（独自算出の注目度）: 13.419222464653425
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Source code is usually formatted with elements like indentation and newlines to improve readability for human developers. However, these visual aids do not seem to be beneficial for large language models (LLMs) in the same way since the code is processed as a linear sequence of tokens. Furthermore, these additional tokens can lead to increased computational costs and longer response times for LLMs. If such formatting elements are non-essential to LLMs, we can reduce such costs by removing them from the code. To figure out the role played by formatting elements, we conduct a comprehensive empirical study to evaluate the impact of code formatting on LLM performance and efficiency. Through large-scale experiments on Fill-in-the-Middle Code Completion tasks across four programming languages (Java, Python, C++, C\#) and ten LLMs-including both commercial and open-source models-we systematically analyze token count and performance when formatting elements are removed. Key findings indicate that LLMs can maintain performance across formatted code and unformatted code, achieving an average input token reduction of 24.5\% with negligible output token reductions. This makes code format removal a practical optimization strategy for improving LLM efficiency. Further exploration reveals that both prompting and fine-tuning LLMs can lead to significant reductions (up to 36.1\%) in output code length without compromising correctness. To facilitate practical applications, we develop a bidirectional code transformation tool for format processing, which can be seamlessly integrated into existing LLM inference workflows, ensuring both human readability and LLM efficiency.
Abstract（参考訳）: ソースコードは通常、人間の開発者の可読性を改善するために、インデンテーションや新規性といった要素でフォーマットされる。しかしながら、これらの視覚的補助は、コードがトークンの線形シーケンスとして処理されるため、大きな言語モデル(LLM)にも、同じように有益とは思えない。さらに、これらの追加トークンは計算コストが増加し、LLMの応答時間が長くなる可能性がある。もしそのようなフォーマット要素が LLM に必須でないなら、コードからそれらを取り除くことで、そのようなコストを削減することができる。コードフォーマットがLCMの性能と効率に与える影響を評価するための総合的な実証的研究を行う。 4つのプログラミング言語(Java, Python, C++, C\#)と10のLLM(商用モデルとオープンソースモデルを含む)にわたるFill-in-the-Middle Code Completionタスクの大規模な実験を通じて、我々は、要素のフォーマットが削除されたときのトークン数とパフォーマンスを体系的に分析した。鍵となる発見は、LLMがフォーマットされたコードと未フォーマットのコード間で性能を維持することができ、平均入力トークンの24.5\%を無視可能な出力トークンの削減で達成できることを示している。これにより、コードフォーマットの削除がLLM効率を改善するための実用的な最適化戦略となる。さらなる調査により、プロンプトと微調整の両方で出力コード長の大幅な削減(最大36.1\%)を、正確性を損なうことなく達成できることが判明した。そこで我々は,書式処理のための双方向コード変換ツールを開発し,既存のLLM推論ワークフローにシームレスに統合し,人間の読みやすさとLLM効率の両立を保証した。

論文の概要: The Hidden Cost of Readability: How Code Formatting Silently Consumes Your LLM Budget

関連論文リスト