Fugu-MT 論文翻訳(概要): Reducing Hallucinations in LLM-Generated Code via Semantic Triangulation

論文の概要: Reducing Hallucinations in LLM-Generated Code via Semantic Triangulation

arxiv url: http://arxiv.org/abs/2511.12288v2
Date: Sat, 22 Nov 2025 03:27:30 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-25 13:28:09.592989
Title: Reducing Hallucinations in LLM-Generated Code via Semantic Triangulation
Title（参考訳）: 意味的三角測量によるLLM生成コードの幻覚の低減
Authors: Yihan Dai, Sijie Liang, Haotian Xu, Peichu Xie, Sergey Mechtaev,
Abstract要約: 我々はセマンティックトライアングルを導入し、解間の正確な検証可能なマッピングを保持する方法でプログラミング問題を変換する。 LiveCodeBenchとCodeEloのベンチマークでは、セマンティックトライアングルによって生成されたコードの信頼性が21%向上している。また、複数の有効だが等価でない解を持つタスクに対して、真のコンセンサスを一貫して形成する唯一のアプローチでもある。
参考スコア（独自算出の注目度）: 2.8646222242803643
License: http://creativecommons.org/licenses/by/4.0/
Abstract: When generating code from natural language prompts, an LLM samples programs from a probability distribution, many of which might be incorrect. Sample consensus techniques - such as majority voting or validation against generated tests or specifications - aim to identify a correct program in the sample or abstain if none is valid. However, existing methods often fail to select a correct solution when its sampling probability is low, or when the problem permits multiple valid but non-equivalent solutions. Additionally, they often fail to abstain when no correct solution is present in the sample. To overcome these limitations, we introduce semantic triangulation, which transforms a programming problem in a way that non-trivially alters its semantics while preserving an exact, verifiable mapping between solutions before and after transformation. We theoretically establish that verifying consistency across such problem transformations increases confidence that generated programs reflect accurate generalization rather than spurious statistical correlations, enabling more reliable sample consensus and abstention. On the LiveCodeBench and CodeElo benchmarks, using GPT-4o and DeepSeek-V3 models, semantic triangulation increases reliability of generated code by 21% compared to the method that selects only high-confidence solutions with the probability threshold 0.5, while being able to pinpoint correct solutions at sampling probabilities as low as 0.14. Apart from that, it is also the only approach to consistently form true consensus on tasks with multiple valid but non-equivalent solutions.
Abstract（参考訳）: 自然言語のプロンプトからコードを生成する場合、LLMは確率分布からプログラムをサンプリングする。過半数の投票や、生成されたテストや仕様に対する検証といった、サンプルのコンセンサステクニックは、サンプル内の正しいプログラムを特定したり、誰も有効でないかどうかを保証したりすることを目的としています。しかし、既存の手法はサンプリング確率が低い場合や、問題が複数の有効だが等価でない解を許す場合、正しい解を選択するのに失敗することが多い。さらに、サンプルに正しい解が存在しない場合、しばしば棄権する。これらの制限を克服するために、意味的三角法を導入し、これは、変換前後のソリューション間の正確な検証可能なマッピングを維持しながら、意味論を非自明に変更する方法で、プログラミング問題を変換する。このような問題変換における整合性検証は、統計的相関よりも正確な一般化を反映する自信を高め、より信頼性の高いサンプルのコンセンサスと棄却を可能にすることを理論的に確立する。 LiveCodeBenchとCodeEloのベンチマークでは、GPT-4oとDeepSeek-V3モデルを使用して、確率閾値0.5の高信頼解のみを選択する方法と比較して、セマンティックトライアングリゲーションは生成コードの信頼性を21%向上させ、サンプリング確率0.14で正しい解を特定できる。それとは別に、複数の有効だが等価でないソリューションを持つタスクに対して、真のコンセンサスを一貫して形成する唯一のアプローチでもある。

論文の概要: Reducing Hallucinations in LLM-Generated Code via Semantic Triangulation

関連論文リスト