Fugu-MT 論文翻訳(概要): SIMBA UQ: Similarity-Based Aggregation for Uncertainty Quantification in Large Language Models

論文の概要: SIMBA UQ: Similarity-Based Aggregation for Uncertainty Quantification in Large Language Models

arxiv url: http://arxiv.org/abs/2510.13836v1
Date: Fri, 10 Oct 2025 17:22:53 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-17 21:15:14.460268
Title: SIMBA UQ: Similarity-Based Aggregation for Uncertainty Quantification in Large Language Models
Title（参考訳）: SIMBA UQ:大言語モデルにおける不確実性定量化のための類似性に基づく集約
Authors: Debarun Bhattacharjya, Balaji Ganesan, Junkyu Lee, Radu Marinescu, Katsiaryna Mirylenka, Michael Glass, Xiao Shou,
Abstract要約: 不確実性定量化(UQ)は不確実性の尺度を提供する。 Black-box UQメソッドは内部モデル情報へのアクセスを必要としない。本稿では,高レベルな非言語的類似性に基づくアグリゲーションフレームワークを提案する。
参考スコア（独自算出の注目度）: 17.805673311465295
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: When does a large language model (LLM) know what it does not know? Uncertainty quantification (UQ) provides measures of uncertainty, such as an estimate of the confidence in an LLM's generated output, and is therefore increasingly recognized as a crucial component of trusted AI systems. Black-box UQ methods do not require access to internal model information from the generating LLM and therefore have numerous real-world advantages, such as robustness to system changes, adaptability to choice of LLM, reduced costs, and computational tractability. In this paper, we investigate the effectiveness of UQ techniques that are primarily but not necessarily entirely black-box, where the consistency between a generated output and other sampled generations is used as a proxy for confidence in its correctness. We propose a high-level non-verbalized similarity-based aggregation framework that subsumes a broad swath of UQ approaches suitable for complex generative tasks, as well as introduce specific novel techniques from the framework that train confidence estimation models using small training sets. Through an empirical study with datasets spanning the diverse tasks of question answering, summarization, and text-to-SQL, we demonstrate that our proposed similarity-based methods can yield better calibrated confidences than baselines.
Abstract（参考訳）: 大きな言語モデル(LLM)は、それが知らないことをいつ知っていますか? 不確実性定量化(英: Uncertainty Quantification、UQ)は、LLMが生成した出力の信頼度を推定するなど不確実性の尺度を提供するため、信頼されたAIシステムの重要な構成要素として認識される。ブラックボックスUQ法は、システム変更に対する堅牢性、LLMの選択への適応性、コスト削減、計算的トラクタビリティなど、内部モデル情報へのアクセスを必要としない。本稿では、生成した出力と他のサンプリングされた世代間の一貫性を、その正確性に対する信頼性の代用として利用する、主に完全にブラックボックスではないUQ手法の有効性について検討する。複雑な生成タスクに適した幅広いUQアプローチを仮定する高レベルな非言語類似性に基づくアグリゲーションフレームワークを提案し、また、小さなトレーニングセットを用いて信頼度推定モデルを訓練するフレームワークから特定の手法を導入する。質問応答,要約,テキスト・トゥ・SQLといった多様なタスクにまたがるデータセットを用いた実証的研究を通じて,提案手法がベースラインよりも精度の高い信頼性が得られることを示す。

論文の概要: SIMBA UQ: Similarity-Based Aggregation for Uncertainty Quantification in Large Language Models

関連論文リスト