Fugu-MT 論文翻訳(概要): Explainability of Large Language Models: Opportunities and Challenges toward Generating Trustworthy Explanations

論文の概要: Explainability of Large Language Models: Opportunities and Challenges toward Generating Trustworthy Explanations

arxiv url: http://arxiv.org/abs/2510.17256v1
Date: Mon, 20 Oct 2025 07:43:53 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-25 03:08:12.026568
Title: Explainability of Large Language Models: Opportunities and Challenges toward Generating Trustworthy Explanations
Title（参考訳）: 大規模言語モデルの説明可能性:信頼に値する説明を生み出すための機会と課題
Authors: Shahin Atakishiyev, Housam K. B. Babiker, Jiayi Dai, Nawshad Farruque, Teruaki Hayashi, Nafisa Sadaf Hriti, Md Abed Rahman, Iain Smith, Mi-Young Kim, Osmar R. Zaïane, Randy Goebel,
Abstract要約: 言語モデルがどのように次のトークンを予測し、コンテンツを生成するかは、一般的に人間には理解できない。本稿では,Transformer を用いた大規模言語モデルにおける局所的説明可能性と機械論的解釈可能性について検討する。
参考スコア（独自算出の注目度）: 5.676319658620339
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models have exhibited impressive performance across a broad range of downstream tasks in natural language processing. However, how a language model predicts the next token and generates content is not generally understandable by humans. Furthermore, these models often make errors in prediction and reasoning, known as hallucinations. These errors underscore the urgent need to better understand and interpret the intricate inner workings of language models and how they generate predictive outputs. Motivated by this gap, this paper investigates local explainability and mechanistic interpretability within Transformer-based large language models to foster trust in such models. In this regard, our paper aims to make three key contributions. First, we present a review of local explainability and mechanistic interpretability approaches and insights from relevant studies in the literature. Furthermore, we describe experimental studies on explainability and reasoning with large language models in two critical domains -- healthcare and autonomous driving -- and analyze the trust implications of such explanations for explanation receivers. Finally, we summarize current unaddressed issues in the evolving landscape of LLM explainability and outline the opportunities, critical challenges, and future directions toward generating human-aligned, trustworthy LLM explanations.
Abstract（参考訳）: 大規模言語モデルは、自然言語処理における幅広い下流タスクにおいて、印象的なパフォーマンスを示している。しかし、言語モデルがどのように次のトークンを予測し、コンテンツを生成するかは、一般的に人間には理解できない。さらに、これらのモデルはしばしば幻覚として知られる予測と推論の誤りを引き起こす。これらの誤りは、言語モデルの内部の複雑な動作をより理解し、解釈し、どのように予測出力を生成するかという緊急の必要性を浮き彫りにする。本稿では,トランスフォーマーを用いた大規模言語モデルにおける局所的説明可能性と機械論的解釈可能性について検討し,そのようなモデルの信頼性を高める。この点に関して、本稿は3つの重要な貢献を目指しています。まず,文献における局所的説明可能性と機械的解釈可能性のアプローチ,および関連研究からの知見について概説する。さらに、医療と自律運転という2つの重要な領域において、大きな言語モデルによる説明可能性と推論に関する実験研究を行い、そのような説明の信頼関係を説明レシーバーに対して分析する。最後に、LLM説明可能性の進化する状況における現在の未解決問題について要約し、人間に整合した信頼性のあるLCM説明を生成するための機会、重要な課題、今後の方向性について概説する。

論文の概要: Explainability of Large Language Models: Opportunities and Challenges toward Generating Trustworthy Explanations

関連論文リスト