Fugu-MT 論文翻訳(概要): Perception of Knowledge Boundary for Large Language Models through Semi-open-ended Question Answering

論文の概要: Perception of Knowledge Boundary for Large Language Models through Semi-open-ended Question Answering

arxiv url: http://arxiv.org/abs/2405.14383v1
Date: Thu, 23 May 2024 10:00:14 GMT
ステータス: 翻訳完了
システム内更新日: 2024-05-24 15:54:01.894405
Title: Perception of Knowledge Boundary for Large Language Models through Semi-open-ended Question Answering
Title（参考訳）: セミオープンな質問応答による大規模言語モデルの知識境界の認識
Authors: Zhihua Wen, Zhiliang Tian, Zexin Jian, Zhen Huang, Pei Ke, Yifu Gao, Minlie Huang, Dongsheng Li,
Abstract要約: 大きな言語モデル(LLM)は知識探索に広く用いられているが、幻覚に悩まされている。本稿では,LLMの知識境界(KB)を半オープンな質問(SoeQ)で知覚する。 GPT-4 は SoeQ では性能が悪く,KB に気づいていないことが多い。我々の補助モデルであるLLaMA-2-13Bは、より曖昧な答えを見つけるのに有効である。
参考スコア（独自算出の注目度）: 67.94354589215637
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large Language Models (LLMs) are widely used for knowledge-seeking yet suffer from hallucinations. The knowledge boundary (KB) of an LLM limits its factual understanding, beyond which it may begin to hallucinate. Investigating the perception of LLMs' KB is crucial for detecting hallucinations and LLMs' reliable generation. Current studies perceive LLMs' KB on questions with a concrete answer (close-ended questions) while paying limited attention to semi-open-ended questions (SoeQ) that correspond to many potential answers. Some researchers achieve it by judging whether the question is answerable or not. However, this paradigm is unsuitable for SoeQ, which are usually partially answerable, containing both answerable and ambiguous (unanswerable) answers. Ambiguous answers are essential for knowledge-seeking, but they may go beyond the KB of LLMs. In this paper, we perceive the LLMs' KB with SoeQ by discovering more ambiguous answers. First, we apply an LLM-based approach to construct SoeQ and obtain answers from a target LLM. Unfortunately, the output probabilities of mainstream black-box LLMs are inaccessible to sample for low-probability ambiguous answers. Therefore, we apply an open-sourced auxiliary model to explore ambiguous answers for the target LLM. We calculate the nearest semantic representation for existing answers to estimate their probabilities, with which we reduce the generation probability of high-probability answers to achieve a more effective generation. Finally, we compare the results from the RAG-based evaluation and LLM self-evaluation to categorize four types of ambiguous answers that are beyond the KB of the target LLM. Following our method, we construct a dataset to perceive the KB for GPT-4. We find that GPT-4 performs poorly on SoeQ and is often unaware of its KB. Besides, our auxiliary model, LLaMA-2-13B, is effective in discovering more ambiguous answers.
Abstract（参考訳）: 大きな言語モデル(LLM)は知識探索に広く用いられているが、幻覚に悩まされている。 LLMの知識境界(KB)は、その事実的理解を制限し、幻覚し始めるかもしれない。 LLMのKBの知覚を調べることは、幻覚とLLMの信頼性の発生を検出するのに不可欠である。最近の研究では、LLMのKBを具体的な回答(クローズエンドな質問)で知覚し、潜在的な多くの答えに対応する半オープンエンドな質問(SoeQ)に限定的に注意を払っている。一部の研究者は、その疑問が答えられるかどうかを判断することでそれを達成している。しかし、このパラダイムは、通常は部分的に答えられるSoeQには適さない。知識探索には曖昧な答えが不可欠だが、LLMのKBを超えることもある。本稿では,よりあいまいな回答を見出すことにより,LLMのKBをSoeQで知覚する。まず,SoeQ の構築と対象 LLM からの回答を得るために LLM ベースのアプローチを適用する。残念ながら、主流のブラックボックスLSMの出力確率は、低確率曖昧な答えのサンプルにはアクセスできない。そこで我々は,オープンソースの補助モデルを用いて,目標LLMに対するあいまいな回答を探索する。既存の回答に対して最も近いセマンティック表現を計算してそれらの確率を推定し、高い確率の回答の生成確率を低減し、より効率的な生成を実現する。最後に、RAGに基づく評価とLLM自己評価の結果を比較し、目的のLLMのKBを超える4種類の曖昧な回答を分類する。提案手法に従って,GPT-4のKBを知覚するデータセットを構築した。 GPT-4 は SoeQ では性能が悪く,KB に気づいていないことが多い。さらに, 我々の補助モデルであるLLaMA-2-13Bは, より曖昧な解を見つけるのに有効である。

関連論文リスト

Inside-Out: Hidden Factual Knowledge in LLMs [50.79758420289131]
この研究は、大言語モデル(LLM)が出力で表現したものよりも、パラメータの事実的知識を符号化するかどうかを評価するためのフレームワークを示す。まず、与えられた質問に対して、正解が上位にランクされている正解対の分数として、その知識の形式的定義を定量化する。次に、このフレームワークを3つの人気のあるオープンウェイト LLM に適用し、クローズドブック QA セットアップのケーススタディを示す。
論文参考訳（メタデータ） (2025-03-19T15:21:48Z)
Are LLMs Aware that Some Questions are not Open-ended? [58.93124686141781]
大規模言語モデルでは、いくつかの質問が限定的な回答を持ち、より決定論的に答える必要があることを認識しているかどうかを調査する。 LLMにおける疑問認識の欠如は,(1)非オープンな質問に答えるにはカジュアルすぎる,(2)オープンな質問に答えるには退屈すぎる,という2つの現象をもたらす。
論文参考訳（メタデータ） (2024-10-01T06:07:00Z)
Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs [60.40396361115776]
本稿では,スリムプロキシモデルを用いた大規模言語モデル (LLM) における知識不足を検知する新しい協調手法であるSlimPLMを提案する。パラメータがはるかに少ないプロキシモデルを採用し、回答を回答としています。ヒューリスティックな回答は、LLM内の既知の未知の知識と同様に、ユーザの質問に答えるために必要な知識を予測するのに使用される。
論文参考訳（メタデータ） (2024-02-19T11:11:08Z)
Direct Evaluation of Chain-of-Thought in Multi-hop Reasoning with Knowledge Graphs [52.42505579545893]
大規模言語モデル(LLM)は、回答とともにチェーン・オブ・シントの説明を生成するよう促されたとき、強い推論能力を示す。本稿では,LLMの推論知識と生成したCoTの精度を評価するために,新しい識別的・生成的CoT評価パラダイムを提案する。
論文参考訳（メタデータ） (2024-02-17T05:22:56Z)
Learn to Refuse: Making Large Language Models More Controllable and Reliable through Knowledge Scope Limitation and Refusal Mechanism [0.0]
大規模言語モデル(LLM)は印象的な言語理解と生成能力を示している。これらのモデルは欠陥がなく、しばしばエラーや誤報を含む応答を生成する。本稿では,LLMに対して,誤りを避けるために,難解な質問への回答を拒否するように指示する拒絶機構を提案する。
論文参考訳（メタデータ） (2023-11-02T07:20:49Z)
Knowing What LLMs DO NOT Know: A Simple Yet Effective Self-Detection Method [36.24876571343749]
大規模言語モデル(LLM)は自然言語処理(NLP)タスクにおいて大きな可能性を示している。近年の文献では、LLMは断続的に非実効応答を生成する。本研究では,LLM が知らない質問が非現実的な結果を生成する傾向にあることを検知する新たな自己検出手法を提案する。
論文参考訳（メタデータ） (2023-10-27T06:22:14Z)
Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation [109.8527403904657]
大規模言語モデル(LLM)は,その知識に対する信頼度が低く,内部知識と外部知識の衝突をうまく扱えないことを示す。検索の強化は、LLMの知識境界に対する認識を高める効果的なアプローチであることが証明されている。本稿では,文書を動的に活用するための簡易な手法を提案する。
論文参考訳（メタデータ） (2023-07-20T16:46:10Z)
Knowledge-Augmented Language Model Prompting for Zero-Shot Knowledge Graph Question Answering [7.888547093390469]
大言語モデル(LLM)は、ゼロショットのクローズドブック質問応答タスクを実行することができる。我々は,LSMの入力において,その知識を直接拡張することを提案する。我々のフレームワークであるKAPING(Knowledge-Augmented Language Model Prompting)は、モデルトレーニングを必要としないため、完全にゼロショットである。
論文参考訳（メタデータ） (2023-06-07T04:15:21Z)
Statistical Knowledge Assessment for Large Language Models [79.07989821512128]
ファクトイドの問題に関する様々なプロンプトを考慮すれば、大きな言語モデル(LLM)は事実的に正しい答えを確実に生成できるだろうか? LLMの事実知識を評価する統計的手法であるKaRRを提案する。この結果から,同じバックボーン構造を持つLLMの知識はスケーリング法則に則っており,命令追従データに基づくチューニングは,実際に正しいテキストを確実に生成するモデルの能力を損なう場合があることがわかった。
論文参考訳（メタデータ） (2023-05-17T18:54:37Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。