Fugu-MT 論文翻訳(概要): Uncertainty Quantification for LLM Function-Calling

論文の概要: Uncertainty Quantification for LLM Function-Calling

arxiv url: http://arxiv.org/abs/2604.22985v1
Date: Fri, 24 Apr 2026 19:56:39 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-28 17:12:07.08426
Title: Uncertainty Quantification for LLM Function-Calling
Title（参考訳）: LLM関数計算の不確かさの定量化
Authors: Zihuiwen Ye, Lukas Aichberger, Michael Kirchhof, Sinead Williamson, Luca Zappella, Yarin Gal, Arno Blaas, Adam Golinski,
Abstract要約: 大規模言語モデル(LLM)は、現実のタスクを自律的に解決するために、ますます多くデプロイされている。不確実性定量化(UQ)手法は、この信頼度を定量化し、潜在的に不正な関数呼び出しを防止するために用いられる。
参考スコア（独自算出の注目度）: 29.49069464022948
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) are increasingly deployed to autonomously solve real-world tasks. A key ingredient for this is the LLM Function-Calling paradigm, a widely used approach for equipping LLMs with tool-use capabilities. However, an LLM calling functions incorrectly can have severe implications, especially when their effects are irreversible, e.g., transferring money or deleting data. Hence, it is of paramount importance to consider the LLM's confidence that a function call solves the task correctly prior to executing it. Uncertainty Quantification (UQ) methods can be used to quantify this confidence and prevent potentially incorrect function calls. In this work, we present what is, to our knowledge, the first evaluation of UQ methods for LLM Function-Calling (FC). While multi-sample UQ methods, such as Semantic Entropy, show strong performance for natural language Q&A tasks, we find that in the FC setting, it offers no clear advantage over simple single-sample UQ methods. Additionally, we find that the particularities of FC outputs can be leveraged to improve the performance of existing UQ methods in this setting. Specifically, multi-sample UQ methods benefit from clustering FC outputs based on their abstract syntax tree parsing, while single-sample UQ methods can be improved by selecting only semantically meaningful tokens when calculating logit-based uncertainty scores.
Abstract（参考訳）: 大規模言語モデル(LLM)は、現実のタスクを自律的に解決するために、ますます多くデプロイされている。 LLM関数変換パラダイム(LLM Function-Calling paradigm)は、LLMにツール使用機能を持たせるために広く使われている手法である。しかし、LLM呼び出し関数は、特にその効果が不可逆である場合、例えば、送金やデータの削除など、深刻な意味を持つ可能性がある。したがって、関数呼び出しがそれを実行する前にそのタスクを正しく解決するというLCMの自信を考えることが最重要となる。不確実性定量化(UQ)手法は、この信頼度を定量化し、潜在的に不正な関数呼び出しを防止するために用いられる。本研究では,LLM関数カルリング(FC)のためのUQ手法の最初の評価を行う。セマンティック・エントロピー(Semantic Entropy)のようなマルチサンプルUQ手法は、自然言語Q&Aタスクに対して強い性能を示すが、FC設定では単純な単一サンプルUQ手法に対して明確な優位性はない。さらに, 既存のUQ手法の性能向上のために, FC出力の特異性を活用できることが判明した。具体的には、マルチサンプルUQ手法は、抽象構文木解析に基づくFC出力のクラスタリングの恩恵を受ける一方、単一サンプルUQ手法は、ロジットベースの不確実性スコアを計算する際に意味的に意味のあるトークンのみを選択することで改善することができる。

論文の概要: Uncertainty Quantification for LLM Function-Calling

関連論文リスト