Fugu-MT 論文翻訳(概要): It's LIT! Reliability-Optimized LLMs with Inspectable Tools

論文の概要: It's LIT! Reliability-Optimized LLMs with Inspectable Tools

arxiv url: http://arxiv.org/abs/2511.14903v1
Date: Tue, 18 Nov 2025 20:41:58 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-20 15:51:28.526102
Title: It's LIT! Reliability-Optimized LLMs with Inspectable Tools
Title（参考訳）: インスペクタブルツールを用いた信頼性最適化LDM
Authors: Ruixin Zhang, Jon Donnelly, Zhicheng Guo, Ghazal Khalighinejad, Haiyang Huang, Alina Jade Barnett, Cynthia Rudin,
Abstract要約: 大規模言語モデル(LLM)は、様々な領域で顕著な機能を示している。 LLMはしばしば不透明な推論プロセスに従い、高い領域におけるそれらの有用性を制限する。本稿では,既存のLCMのツールコール機能をベースに構築されたフレームワークについて述べる。
参考スコア（独自算出の注目度）: 33.53798264548128
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have exhibited remarkable capabilities across various domains. The ability to call external tools further expands their capability to handle real-world tasks. However, LLMs often follow an opaque reasoning process, which limits their usefulness in high-stakes domains where solutions need to be trustworthy to end users. LLMs can choose solutions that are unreliable and difficult to troubleshoot, even if better options are available. We address this issue by forcing LLMs to use external -- more reliable -- tools to solve problems when possible. We present a framework built on the tool-calling capabilities of existing LLMs to enable them to select the most reliable and easy-to-troubleshoot solution path, which may involve multiple sequential tool calls. We refer to this framework as LIT (LLMs with Inspectable Tools). In order to support LIT, we introduce a new and challenging benchmark dataset of 1,300 questions and a customizable set of reliability cost functions associated with a collection of specialized tools. These cost functions summarize how reliable each tool is and how easy it is to troubleshoot. For instance, a calculator is reliable across domains, whereas a linear prediction model is not reliable if there is distribution shift, but it is easy to troubleshoot. A tool that constructs a random forest is neither reliable nor easy to troubleshoot. These tools interact with the Harvard USPTO Patent Dataset and a new dataset of NeurIPS 2023 papers to solve mathematical, coding, and modeling problems of varying difficulty levels. We demonstrate that LLMs can achieve more reliable and informed problem-solving while maintaining task performance using our framework.
Abstract（参考訳）: 大規模言語モデル(LLM)は、様々な領域で顕著な機能を示している。外部ツールを呼び出す能力は、現実世界のタスクを処理する能力をさらに拡張します。しかし、LSMは不透明な推論プロセスに追従することが多く、エンドユーザーにはソリューションを信頼できるものにする必要があるハイテイクなドメインにおいて、その有用性を制限している。 LLMは、たとえより良い選択肢が利用可能であっても、信頼性が低くトラブルシュートが難しいソリューションを選択することができる。この問題に対処するために、LLMには、可能な限り解決するために外部(より信頼性の高い)ツールを使わざるを得ません。本稿では,既存のLCMのツールコール機能をベースに構築されたフレームワークについて述べる。このフレームワークを LIT (LLMs with Inspectable Tools) と呼ぶ。 LITをサポートするために、我々は1,300の質問からなる新しい、挑戦的なベンチマークデータセットと、専門ツールのコレクションに関連する信頼性コスト関数セットを導入する。これらのコスト関数は、各ツールの信頼性とトラブルシューティングの容易さをまとめたものです。例えば、電卓は領域間で信頼性があるが、線形予測モデルは分布シフトがある場合信頼できないが、トラブルシュートは容易である。ランダムな森林を構築するツールは、信頼性もトラブルシュートも容易ではない。これらのツールは、Harvard USPTO Patent DatasetとNeurIPS 2023論文の新しいデータセットと相互作用し、さまざまな難易度の数学的、コーディング、モデリングの問題を解決する。 LLMは、我々のフレームワークを使用してタスクパフォーマンスを維持しながら、より信頼性が高く、情報に富んだ問題解決を実現することができることを実証する。

論文の概要: It's LIT! Reliability-Optimized LLMs with Inspectable Tools

関連論文リスト