Fugu-MT 論文翻訳(概要): When to Answer and When to Defer: A Decision Framework for Reliable Code Predictions

論文の概要: When to Answer and When to Defer: A Decision Framework for Reliable Code Predictions

arxiv url: http://arxiv.org/abs/2605.19369v1
Date: Tue, 19 May 2026 05:04:42 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-20 15:03:09.134783
Title: When to Answer and When to Defer: A Decision Framework for Reliable Code Predictions
Title（参考訳）: 答える時と定義する時 - 信頼性の高いコード予測のための決定フレームワーク
Authors: Ravishka Rathnasuriya, Wei Yang,
Abstract要約: この作業では、不確実性推定、モデルのキャリブレーション、およびコードモデルに対するツールベースの禁忌処理を統合する統一的なフレームワークを導入している。提案設計では,信頼性の高い正当性確率を割り当てたり,不確実性の下で不確実性を排除したり,不確実性のあるケースを処理するための軽量なプログラム解析手順を実行することができる。
参考スコア（独自算出の注目度）: 11.136449698197174
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Code language models are increasingly adopted for both understanding and generative tasks. Despite their success, these models frequently produce overconfident incorrect predictions and underconfident correct predictions, undermining their reliability in deployment. Practical deployment demands three capabilities: accurately estimating the likelihood of correctness, abstaining on uncertain predictions, and invoking external mechanisms to validate or repair abstained outputs. Existing calibration and uncertainty estimation methods, primarily developed for natural language tasks, do not readily transfer to code. Notably, post-hoc calibration techniques often reduce probability misalignment but fail to improve the ranking of predictions by correctness likelihood-a requirement for selective prediction under partial coverage. Furthermore, most approaches treat uncertainty as a passive indicator rather than an actionable signal. This work introduces a unified framework that integrates uncertainty estimation, model calibration, and tool-based abstention handling for code models. The proposed design enables models to assign reliable correctness probabilities, abstain under uncertainty, and invoke lightweight program analysis procedures to process abstained cases. By combining these components within a single deployment-oriented workflow, this framework supports risk-aware, coverage-controlled use of code models across both classification and generation settings.
Abstract（参考訳）: コード言語モデルは、理解と生成の両方に採用されている。その成功にもかかわらず、これらのモデルはしばしば過度に信頼できない誤った予測と過度に信頼できない正確な予測を生成し、デプロイメントの信頼性を損なう。正確に正しさを推定し、不確実な予測を棄却し、不確実なアウトプットを検証または修復するための外部メカニズムを起動する。既存のキャリブレーションと不確実性推定手法は、主に自然言語処理のために開発されたが、コードへの転送は容易ではない。特に、ポストホックキャリブレーション手法は、しばしば確率的ミスアライメントを減少させるが、部分的カバレッジ下での選択的予測の要件である正しさによる予測のランキングの改善には失敗する。さらに、ほとんどのアプローチは、動作可能な信号よりも受動的指標として不確実性を扱う。この作業では、不確実性推定、モデルのキャリブレーション、およびコードモデルに対するツールベースの禁忌処理を統合する統一的なフレームワークを導入している。提案設計では,信頼性の高い正当性確率を割り当てたり,不確実性の下で不確実性を排除したり,不確実性のあるケースを処理するための軽量なプログラム解析手順を実行することができる。これらのコンポーネントを単一のデプロイメント指向ワークフローに組み合わせることで、このフレームワークは、分類と生成設定の両方にわたって、リスク認識、カバレッジ管理されたコードモデルの使用をサポートする。

論文の概要: When to Answer and When to Defer: A Decision Framework for Reliable Code Predictions

関連論文リスト