Fugu-MT 論文翻訳(概要): Word-Sequence Entropy: Towards Uncertainty Estimation in Free-Form Medical Question Answering Applications and Beyond

論文の概要: Word-Sequence Entropy: Towards Uncertainty Estimation in Free-Form Medical Question Answering Applications and Beyond

arxiv url: http://arxiv.org/abs/2402.14259v2
Date: Mon, 18 Nov 2024 09:19:25 GMT
ステータス: 翻訳完了
システム内更新日: 2024-11-28 17:07:30.945318
Title: Word-Sequence Entropy: Towards Uncertainty Estimation in Free-Form Medical Question Answering Applications and Beyond
Title（参考訳）: 単語列エントロピー:フリーフォーム医療質問応答アプリケーションにおける不確実性推定に向けて
Authors: Zhiyuan Wang, Jinhao Duan, Chenxi Yuan, Qingyu Chen, Tianlong Chen, Yue Zhang, Ren Wang, Xiaoshuang Shi, Kaidi Xu,
Abstract要約: 本稿ではワードシーケンスエントロピー(WSE)を紹介し,単語レベルとシーケンスレベルの不確実性を校正する手法を提案する。 We compare WSE with six baseline method on five free-form medical QA datasets, using 7 popular large language model (LLMs)。
参考スコア（独自算出の注目度）: 52.246494389096654
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Uncertainty estimation is crucial for the reliability of safety-critical human and artificial intelligence (AI) interaction systems, particularly in the domain of healthcare engineering. However, a robust and general uncertainty measure for free-form answers has not been well-established in open-ended medical question-answering (QA) tasks, where generative inequality introduces a large number of irrelevant words and sequences within the generated set for uncertainty quantification (UQ), which can lead to biases. This paper introduces Word-Sequence Entropy (WSE), a method that calibrates uncertainty at both the word and sequence levels, considering semantic relevance. WSE quantifies uncertainty in a way that is more closely aligned with the reliability of LLMs during uncertainty quantification (UQ). We compare WSE with six baseline methods on five free-form medical QA datasets, utilizing seven popular large language models (LLMs). Experimental results demonstrate that WSE exhibits superior performance in UQ under two standard criteria for correctness evaluation. Additionally, in terms of real-world medical QA applications, the performance of LLMs is significantly enhanced (e.g., a 6.36% improvement in model accuracy on the COVID-QA dataset) by employing responses with lower uncertainty that are identified by WSE as final answers, without any additional task-specific fine-tuning or architectural modifications.
Abstract（参考訳）: 不確実性の推定は、特に医療工学の分野において、安全クリティカルな人間と人工知能(AI)インタラクションシステムの信頼性に不可欠である。しかしながら、自由形式の回答に対する堅牢で一般的な不確実性尺度は、生成的不等式が生成した不確実性定量化(UQ)セット内に多数の無関係な単語やシーケンスを導入し、バイアスをもたらす可能性のあるオープンエンドの医療質問回答(QA)タスクにおいて、十分に確立されていない。本稿では、意味的関連性を考慮して、単語とシーケンスレベルの不確実性を校正するWord-Sequence Entropy(WSE)を提案する。 WSEは、不確実性定量化(UQ)中のLLMの信頼性とより密に一致した方法で不確実性を定量化する。 We compare WSE with six baseline method on five free-form medical QA datasets, using 7 popular large language model (LLMs)。実験結果から,WSEは2つの基準基準でUQにおいて優れた性能を示すことが示された。さらに、実世界の医療QAアプリケーションでは、WSEによって最終回答として特定される不確実性の低い応答をタスク固有の微調整やアーキテクチャの変更なしに利用することにより、LCMの性能が著しく向上する(例えば、COVID-QAデータセットのモデル精度が6.36%向上する)。

関連論文リスト

Inv-Entropy: A Fully Probabilistic Framework for Uncertainty Quantification in Language Models [5.6672926445919165]
大規模言語モデル(LLM)は自然言語処理を変換しているが、信頼性の高いデプロイメントには有効な不確実性定量化(UQ)が必要である。既存のUQメソッドは多くの場合、確率論的基盤を欠いている。本稿では, 与えられた出力に条件付き入力空間の多様性を評価することによって不確実性を定量的に評価する, 逆モデルに基づく完全確率的フレームワークを提案する。
論文参考訳（メタデータ） (2025-06-11T13:02:17Z)
Uncertainty Quantification and Confidence Calibration in Large Language Models: A Survey [11.737403011836532]
LLM(Large Language Models)は、医療、法律、交通といった高度な分野において、テキスト生成、推論、意思決定に優れる。不確実性定量化(UQ)は、アウトプットの信頼度を推定することで信頼性を高め、リスク軽減と選択的予測を可能にする。計算効率と不確実性次元に基づいてUQ手法を分類する新しい分類法を提案する。
論文参考訳（メタデータ） (2025-03-20T05:04:29Z)
Statistical Guarantees of Correctness Coverage for Medical Multiple-Choice Question Answering [0.0]
大規模言語モデル(LLM)は、現実の質問応答(QA)アプリケーションにますます多くデプロイされている。 LLMは幻覚や非現実的な情報を生み出すことが証明されており、高い医療業務における信頼性を損なう。本研究では,CP フレームワークを医療用マルチ選択質問応答 (MCQA) タスクに適用した。
論文参考訳（メタデータ） (2025-03-07T15:22:10Z)
Legitimate ground-truth-free metrics for deep uncertainty classification scoring [3.9599054392856483]
製造における不確実性定量化(UQ)手法の使用は依然として限られている。この制限は、UQ基底真理を欠いたUQ手法を検証するという課題によってさらに悪化する。本稿では,これらの指標を考察し,理論的に良好であり,実際に不確実な基礎的真理に結びついていることを証明する。
論文参考訳（メタデータ） (2024-10-30T14:14:32Z)
Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness [106.52630978891054]
視覚言語AIシステムに特有の不確実性の分類法を提案する。また、精度と校正誤差の両方によく相関する新しい計量信頼度重み付き精度を導入する。
論文参考訳（メタデータ） (2024-07-02T04:23:54Z)
Uncertainty Quantification in Table Structure Recognition [6.328777177761948]
本稿ではテーブル構造認識(TSR)の不確実性定量化(UQ)手法を提案する。私たちのキーとなるアイデアは、テーブル表現を豊かにし、多様化し、高い認識の不確かさで細胞をスポットライトすることです。細胞複雑性の定量化は、近隣の細胞とのトポロジカルな関係によって各細胞の不確実性を測定する。
論文参考訳（メタデータ） (2024-07-01T19:03:55Z)
ConU: Conformal Uncertainty in Large Language Models with Correctness Coverage Guarantees [68.33498595506941]
自己整合性理論に基づく新しい不確実性尺度を導入する。次に,CPアルゴリズムに正当性に整合した不確かさ条件を組み込むことにより,適合性不確かさの基準を策定する。実証的な評価は、我々の不確実性測定が過去の最先端手法よりも優れていることを示している。
論文参考訳（メタデータ） (2024-06-29T17:33:07Z)
Kernel Language Entropy: Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities [79.9629927171974]
大規模言語モデル(LLM)の不確実性は、安全性と信頼性が重要であるアプリケーションには不可欠である。ホワイトボックスとブラックボックス LLM における不確実性評価手法である Kernel Language Entropy (KLE) を提案する。
論文参考訳（メタデータ） (2024-05-30T12:42:05Z)
Uncertainty-aware Language Modeling for Selective Question Answering [107.47864420630923]
本稿では,不確実性を考慮したLLMを生成するLLM変換手法を提案する。我々のアプローチはモデルとデータに依存しず、計算効率が高く、外部モデルやシステムに依存しない。
論文参考訳（メタデータ） (2023-11-26T22:47:54Z)
Towards Clear Expectations for Uncertainty Estimation [64.20262246029286]
不確実性定量化(UQ)は、信頼できる機械学習(ML)を実現するために不可欠であるほとんどのUQ手法は、異なる不整合評価プロトコルに悩まされている。この意見書は、これらの要件を5つの下流タスクを通して指定することで、新たな視点を提供する。
論文参考訳（メタデータ） (2022-07-27T07:50:57Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。