Fugu-MT 論文翻訳(概要): Word-Sequence Entropy: Towards Uncertainty Estimation in Free-Form Medical Question Answering Applications and Beyond

論文の概要: Word-Sequence Entropy: Towards Uncertainty Estimation in Free-Form Medical Question Answering Applications and Beyond

arxiv url: http://arxiv.org/abs/2402.14259v1
Date: Thu, 22 Feb 2024 03:46:08 GMT
ステータス: 翻訳完了
システム内更新日: 2024-02-23 16:33:14.352691
Title: Word-Sequence Entropy: Towards Uncertainty Estimation in Free-Form Medical Question Answering Applications and Beyond
Title（参考訳）: 単語列エントロピー:フリーフォーム医療質問応答アプリケーションにおける不確実性推定に向けて
Authors: Zhiyuan Wang, Jinhao Duan, Chenxi Yuan, Qingyu Chen, Tianlong Chen, Huaxiu Yao, Yue Zhang, Ren Wang, Kaidi Xu, Xiaoshuang Shi
Abstract要約: 不確実性推定は、安全クリティカルな人間とAIのインタラクションシステムの信頼性を確保する上で重要な役割を果たす。本稿では,ワードシーケンスエントロピー(WSE, Word-Sequence Entropy)を提案する。 We show that WSE exhibits excellent performance on accurate uncertainty Measurement under two standard criteria for correctness evaluation。
参考スコア（独自算出の注目度）: 63.969531254692725
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Uncertainty estimation plays a pivotal role in ensuring the reliability of safety-critical human-AI interaction systems, particularly in the medical domain. However, a general method for quantifying the uncertainty of free-form answers has yet to be established in open-ended medical question-answering (QA) tasks, where irrelevant words and sequences with limited semantic information can be the primary source of uncertainty due to the presence of generative inequality. In this paper, we propose the Word-Sequence Entropy (WSE), which calibrates the uncertainty proportion at both the word and sequence levels according to the semantic relevance, with greater emphasis placed on keywords and more relevant sequences when performing uncertainty quantification. We compare WSE with 6 baseline methods on 5 free-form medical QA datasets, utilizing 7 "off-the-shelf" large language models (LLMs), and show that WSE exhibits superior performance on accurate uncertainty measurement under two standard criteria for correctness evaluation (e.g., WSE outperforms existing state-of-the-art method by 3.23% AUROC on the MedQA dataset). Additionally, in terms of the potential for real-world medical QA applications, we achieve a significant enhancement in the performance of LLMs when employing sequences with lower uncertainty, identified by WSE, as final answers (e.g., +6.36% accuracy improvement on the COVID-QA dataset), without requiring any additional task-specific fine-tuning or architectural modifications.
Abstract（参考訳）: 不確実性推定は、特に医療領域において、安全クリティカルな人間とAIの相互作用システムの信頼性を確保する上で重要な役割を果たす。しかし, フリーフォーム回答の不確かさを定量化するための一般的な手法は, 限定的な意味情報を持つ無関係な単語やシーケンスが生成的不等式の存在による不確実性の原因となるような, オープンな医療質問応答(QA)タスクにおいてはまだ確立されていない。本稿では,単語列エントロピー(word-sequence entropy, wse)を提案する。単語列エントロピー(word-sequence entropy, wse)は,単語とシーケンスレベルの不確かさの比率を意味的関連性に応じて規定する。 7つの"オフ・ザ・シェルフ"大規模言語モデル(llms)を用いた5つのフリー形式の医療用qaデータセットにおいて、wseと6つのベースライン手法を比較し、wseが2つの標準性評価基準の下で正確な不確実性測定において優れた性能を示すことを示した(例えば、medqaデータセットの既存のstate-of-the-artメソッドを3.23%aurocで上回っている)。さらに、実世界の医療QA応用の可能性の観点からは、追加のタスク固有の微調整やアーキテクチャの変更を必要とせず、WSEが最終回答として特定した、低い不確実性を持つシーケンス(+6.36%の精度改善など)を用いる場合のLCMの性能を著しく向上させる。

関連論文リスト

Inv-Entropy: A Fully Probabilistic Framework for Uncertainty Quantification in Language Models [5.6672926445919165]
大規模言語モデル(LLM)は自然言語処理を変換しているが、信頼性の高いデプロイメントには有効な不確実性定量化(UQ)が必要である。既存のUQメソッドは多くの場合、確率論的基盤を欠いている。本稿では, 与えられた出力に条件付き入力空間の多様性を評価することによって不確実性を定量的に評価する, 逆モデルに基づく完全確率的フレームワークを提案する。
論文参考訳（メタデータ） (2025-06-11T13:02:17Z)
Uncertainty Quantification and Confidence Calibration in Large Language Models: A Survey [11.737403011836532]
LLM(Large Language Models)は、医療、法律、交通といった高度な分野において、テキスト生成、推論、意思決定に優れる。不確実性定量化(UQ)は、アウトプットの信頼度を推定することで信頼性を高め、リスク軽減と選択的予測を可能にする。計算効率と不確実性次元に基づいてUQ手法を分類する新しい分類法を提案する。
論文参考訳（メタデータ） (2025-03-20T05:04:29Z)
Statistical Guarantees of Correctness Coverage for Medical Multiple-Choice Question Answering [0.0]
大規模言語モデル(LLM)は、現実の質問応答(QA)アプリケーションにますます多くデプロイされている。 LLMは幻覚や非現実的な情報を生み出すことが証明されており、高い医療業務における信頼性を損なう。本研究では,CP フレームワークを医療用マルチ選択質問応答 (MCQA) タスクに適用した。
論文参考訳（メタデータ） (2025-03-07T15:22:10Z)
Legitimate ground-truth-free metrics for deep uncertainty classification scoring [3.9599054392856483]
製造における不確実性定量化(UQ)手法の使用は依然として限られている。この制限は、UQ基底真理を欠いたUQ手法を検証するという課題によってさらに悪化する。本稿では,これらの指標を考察し,理論的に良好であり,実際に不確実な基礎的真理に結びついていることを証明する。
論文参考訳（メタデータ） (2024-10-30T14:14:32Z)
Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness [106.52630978891054]
視覚言語AIシステムに特有の不確実性の分類法を提案する。また、精度と校正誤差の両方によく相関する新しい計量信頼度重み付き精度を導入する。
論文参考訳（メタデータ） (2024-07-02T04:23:54Z)
Uncertainty Quantification in Table Structure Recognition [6.328777177761948]
本稿ではテーブル構造認識(TSR)の不確実性定量化(UQ)手法を提案する。私たちのキーとなるアイデアは、テーブル表現を豊かにし、多様化し、高い認識の不確かさで細胞をスポットライトすることです。細胞複雑性の定量化は、近隣の細胞とのトポロジカルな関係によって各細胞の不確実性を測定する。
論文参考訳（メタデータ） (2024-07-01T19:03:55Z)
ConU: Conformal Uncertainty in Large Language Models with Correctness Coverage Guarantees [68.33498595506941]
自己整合性理論に基づく新しい不確実性尺度を導入する。次に,CPアルゴリズムに正当性に整合した不確かさ条件を組み込むことにより,適合性不確かさの基準を策定する。実証的な評価は、我々の不確実性測定が過去の最先端手法よりも優れていることを示している。
論文参考訳（メタデータ） (2024-06-29T17:33:07Z)
Kernel Language Entropy: Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities [79.9629927171974]
大規模言語モデル(LLM)の不確実性は、安全性と信頼性が重要であるアプリケーションには不可欠である。ホワイトボックスとブラックボックス LLM における不確実性評価手法である Kernel Language Entropy (KLE) を提案する。
論文参考訳（メタデータ） (2024-05-30T12:42:05Z)
Uncertainty-aware Language Modeling for Selective Question Answering [107.47864420630923]
本稿では,不確実性を考慮したLLMを生成するLLM変換手法を提案する。我々のアプローチはモデルとデータに依存しず、計算効率が高く、外部モデルやシステムに依存しない。
論文参考訳（メタデータ） (2023-11-26T22:47:54Z)
Towards Clear Expectations for Uncertainty Estimation [64.20262246029286]
不確実性定量化(UQ)は、信頼できる機械学習(ML)を実現するために不可欠であるほとんどのUQ手法は、異なる不整合評価プロトコルに悩まされている。この意見書は、これらの要件を5つの下流タスクを通して指定することで、新たな視点を提供する。
論文参考訳（メタデータ） (2022-07-27T07:50:57Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。