Fugu-MT 論文翻訳(概要): Neural Bandit Based Optimal LLM Selection for a Pipeline of Tasks

論文の概要: Neural Bandit Based Optimal LLM Selection for a Pipeline of Tasks

arxiv url: http://arxiv.org/abs/2508.09958v2
Date: Sun, 17 Aug 2025 17:37:34 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-19 12:43:44.898234
Title: Neural Bandit Based Optimal LLM Selection for a Pipeline of Tasks
Title（参考訳）: ニューラルバンドに基づくタスクパイプラインのための最適LCM選択
Authors: Baran Atalar, Eddie Zhang, Carlee Joe-Wong,
Abstract要約: 本稿では,各サブタスク上でLLMの成功をオンラインにモデル化するニューラルネットワークをトレーニングするニューラルネットワークのコンテキスト帯域ベースアルゴリズムを提案する。通信質問応答と診断予測データセットの実験から,提案手法の有効性が示唆された。
参考スコア（独自算出の注目度）: 11.389019661082415
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the increasing popularity of large language models (LLMs) for a variety of tasks, there has been a growing interest in strategies that can predict which out of a set of LLMs will yield a successful answer at low cost. This problem promises to become more and more relevant as providers like Microsoft allow users to easily create custom LLM "assistants" specialized to particular types of queries. However, some tasks (i.e., queries) may be too specialized and difficult for a single LLM to handle alone. These applications often benefit from breaking down the task into smaller subtasks, each of which can then be executed by a LLM expected to perform well on that specific subtask. For example, in extracting a diagnosis from medical records, one can first select an LLM to summarize the record, select another to validate the summary, and then select another, possibly different, LLM to extract the diagnosis from the summarized record. Unlike existing LLM selection or routing algorithms, this setting requires that we select a sequence of LLMs, with the output of each LLM feeding into the next and potentially influencing its success. Thus, unlike single LLM selection, the quality of each subtask's output directly affects the inputs, and hence the cost and success rate, of downstream LLMs, creating complex performance dependencies that must be learned and accounted for during selection. We propose a neural contextual bandit-based algorithm that trains neural networks that model LLM success on each subtask in an online manner, thus learning to guide the LLM selections for the different subtasks, even in the absence of historical LLM performance data. Experiments on telecommunications question answering and medical diagnosis prediction datasets illustrate the effectiveness of our proposed approach compared to other LLM selection algorithms.
Abstract（参考訳）: 様々なタスクに対する大規模言語モデル(LLM)の普及に伴い、LLMのどのセットが低コストで成功するかを予測できる戦略への関心が高まっている。この問題は、Microsoftのようなプロバイダが、ユーザが特定のタイプのクエリに特化したカスタムLLM "アシスト"を簡単に作成できるように、ますます関連性を高めていくことを約束している。しかし、いくつかのタスク(すなわちクエリ)は、単一のLLMが単独で扱うのが困難でありすぎる。これらのアプリケーションは、タスクを小さなサブタスクに分割することで恩恵を受けることが多く、それぞれのサブタスクは、その特定のサブタスクでうまく動作することが期待されるLCMによって実行される。例えば、医療記録から診断を抽出する際、まずLSMを選択して要約し、別のLSMを選択してその要約を検証し、次に別のLSMを選択して要約されたレコードから診断を抽出することができる。既存のLLM選択アルゴリズムやルーティングアルゴリズムとは異なり、この設定では、各LLMの出力が次のLLMに供給され、その成功に影響を与える可能性がある。したがって、単一のLCM選択とは異なり、各サブタスクの出力の品質は入力に直接影響を与え、従って下流LSMのコストと成功率に影響を与え、選択中に学習し、考慮しなければならない複雑なパフォーマンス依存を生成する。そこで我々は,従来のLLM性能データがない場合でも,各サブタスクにおけるLLM成功をオンラインにモデル化するニューラルネットワークをトレーニングし,異なるサブタスクに対するLLM選択のガイドを学習するアルゴリズムを提案する。通信質問応答と診断予測データセットの実験は、他のLLM選択アルゴリズムと比較して提案手法の有効性を示している。

論文の概要: Neural Bandit Based Optimal LLM Selection for a Pipeline of Tasks

関連論文リスト