Fugu-MT 論文翻訳(概要): Combining Static Code Analysis and Large Language Models Improves Correctness and Performance of Algorithm Recognition

論文の概要: Combining Static Code Analysis and Large Language Models Improves Correctness and Performance of Algorithm Recognition

arxiv url: http://arxiv.org/abs/2604.03048v1
Date: Fri, 03 Apr 2026 13:56:39 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-06 17:20:24.488951
Title: Combining Static Code Analysis and Large Language Models Improves Correctness and Performance of Algorithm Recognition
Title（参考訳）: 静的コード解析と大言語モデルを組み合わせることで、アルゴリズム認識の正確性と性能が向上する
Authors: Denis Neumüller, Sebastian Boll, David Schüler, Matthias Tichy,
Abstract要約: 我々は,LLMと静的コード解析を組み合わせることで,アルゴリズムの自動認識をいかに改善できるかを実証的に評価する。この組み合わせのアプローチを,さまざまなプロンプト戦略の下で,スタンドアローンのパフォーマンスと比較する。 LLMは、識別子が難読化されている場合、ほとんどのアルゴリズムの実装を識別できるため、名前情報にのみ依存するわけではない。
参考スコア（独自算出の注目度）: 0.27998963147546146
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Context: Since it is well-established that developers spend a substantial portion of their time understanding source code, the ability to automatically identify algorithms within source code presents a valuable opportunity. This capability can support program comprehension, facilitate maintenance, and enhance overall software quality. Objective: We empirically evaluate how combining LLMs with static code analysis can improve the automated recognition of algorithms, while also evaluating their standalone performance and dependence on identifier names. Method: We perform multiple experiments evaluating the combination of LLMs with static analysis using different filter patterns. We compare this combined approach against their standalone performance under various prompting strategies and investigate the impact of systematic identifier obfuscation on classification performance and runtime. Results: The combination of LLMs with lightweight static analysis performs surprisingly well, reducing required LLM calls by 72.39-97.50% depending on the filter pattern. This not only lowers runtime significantly but also improves F1-scores by up to 12 percentage points (pp) compared to the baseline. Regarding the different prompting strategies, in-context learning with two examples provides an effective trade-off between classification performance and runtime efficiency, achieving F1-scores of 75-77% with only a modest increase in inference time. Lastly, we find that LLMs are not solely dependent on name-information as they are still able to identify most algorithm implementations when identifiers are obfuscated. Conclusion: By combining LLMs with static analysis, we achieve substantial reductions in runtime while simultaneously improving F1-scores, underscoring the value of a hybrid approach.
Abstract（参考訳）: コンテキスト: 開発者がソースコードを理解するのにかなりの時間を費やすことは十分に確立されているため、ソースコード内のアルゴリズムを自動的に識別する能力は貴重な機会となります。この機能は、プログラムの理解をサポートし、メンテナンスを容易にし、全体的なソフトウェア品質を向上させることができる。 Objective: LLMと静的コード解析を組み合わせることで,アルゴリズムの自動認識が向上すると同時に,そのスタンドアロンのパフォーマンスと識別子名への依存性を実証的に評価する。方法: 異なるフィルタパターンを用いて, LLMと静的解析の組み合わせを評価する実験を複数実施する。各種のプロンプト戦略下でのスタンドアロン性能に対するこの組み合わせのアプローチを比較し、系統的識別子難読化が分類性能と実行時間に与える影響について検討する。結果: LLMと軽量な静的解析の組み合わせは驚くほどよく機能し、フィルタパターンによって所要のLLM呼び出しを72.39-97.50%削減する。これはランタイムを大幅に低下させるだけでなく、F1スコアをベースラインと比較して最大12ポイント(pp)改善する。異なるプロンプト戦略に関して、文脈内学習には2つの例があるが、これは分類性能と実行効率の効果的なトレードオフであり、F1スコアは75-77%で、推論時間はわずかに増加している。最後に、LLMは、識別子が難読化されている場合、ほとんどのアルゴリズムの実装を識別できるため、名前情報にのみ依存していないことを発見した。結論: LLMと静的解析を組み合わせることで,F1スコアを同時に改善しながら,実行時の大幅な削減を実現し,ハイブリッドアプローチの価値を裏付ける。

論文の概要: Combining Static Code Analysis and Large Language Models Improves Correctness and Performance of Algorithm Recognition

関連論文リスト