Fugu-MT 論文翻訳(概要): Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment

論文の概要: Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment

arxiv url: http://arxiv.org/abs/2308.05374v1
Date: Thu, 10 Aug 2023 06:43:44 GMT
ステータス: 翻訳完了
システム内更新日: 2023-08-11 13:29:07.050609
Title: Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment
Title（参考訳）: trustworthy llms:大規模言語モデルのアライメント評価のための調査とガイドライン
Authors: Yang Liu, Yuanshun Yao, Jean-Francois Ton, Xiaoying Zhang, Ruocheng Guo Hao Cheng, Yegor Klochkov, Muhammad Faaiz Taufiq, and Hang Li
Abstract要約: 本稿では,大規模言語モデル(LLM)の評価において考慮すべき重要な要素について,包括的に調査する。この調査は、信頼性、安全性、公正性、誤用に対する抵抗性、説明可能性と推論、社会的規範への固執、堅牢性の7つの主要なカテゴリーをカバーしている。結果は、一般に、より整合したモデルは、全体的な信頼性の観点から、より良いパフォーマンスを示す傾向があることを示している。
参考スコア（独自算出の注目度）: 15.663618713626386
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Ensuring alignment, which refers to making models behave in accordance with human intentions [1,2], has become a critical task before deploying large language models (LLMs) in real-world applications. For instance, OpenAI devoted six months to iteratively aligning GPT-4 before its release [3]. However, a major challenge faced by practitioners is the lack of clear guidance on evaluating whether LLM outputs align with social norms, values, and regulations. This obstacle hinders systematic iteration and deployment of LLMs. To address this issue, this paper presents a comprehensive survey of key dimensions that are crucial to consider when assessing LLM trustworthiness. The survey covers seven major categories of LLM trustworthiness: reliability, safety, fairness, resistance to misuse, explainability and reasoning, adherence to social norms, and robustness. Each major category is further divided into several sub-categories, resulting in a total of 29 sub-categories. Additionally, a subset of 8 sub-categories is selected for further investigation, where corresponding measurement studies are designed and conducted on several widely-used LLMs. The measurement results indicate that, in general, more aligned models tend to perform better in terms of overall trustworthiness. However, the effectiveness of alignment varies across the different trustworthiness categories considered. This highlights the importance of conducting more fine-grained analyses, testing, and making continuous improvements on LLM alignment. By shedding light on these key dimensions of LLM trustworthiness, this paper aims to provide valuable insights and guidance to practitioners in the field. Understanding and addressing these concerns will be crucial in achieving reliable and ethically sound deployment of LLMs in various applications.
Abstract（参考訳）: 人間の意図に従ってモデルが振る舞うこと [1,2] を指すアライメントの確保は、現実世界のアプリケーションに大規模言語モデル(llm)をデプロイする前に重要なタスクとなった。例えば、OpenAIはリリース前に反復的にGPT-4を調整するために6ヶ月を費やしました [3]。しかしながら、実践者が直面する大きな課題は、llmアウトプットが社会的規範、価値観、規制に合致するかどうかを評価するための明確なガイダンスの欠如である。この障害は、LLMの体系的なイテレーションとデプロイを妨げる。本論では,LLMの信頼性を評価する上で重要となる重要な側面について,包括的に調査する。調査は、信頼性、安全性、公平性、誤用に対する抵抗、説明可能性と推論、社会的規範への順守、堅牢性という7つの主要なカテゴリーをカバーする。各主要なカテゴリはさらにいくつかのサブカテゴリに分けられ、合計29のサブカテゴリになる。さらに、さらなる調査のために8つのサブカテゴリのサブセットが選択され、対応する測定研究が、広く使用されている複数のLLMで設計および実施される。測定結果は、一般に、より整合したモデルの方が全体的な信頼性の点でより優れた性能を示す傾向があることを示している。しかしながら、アライメントの有効性は、考慮されるさまざまな信頼性カテゴリによって異なる。このことは、よりきめ細かい分析、テスト、LLMアライメントの継続的な改善を行うことの重要性を強調している。本稿では,LLMの信頼性に関するこれらの重要な側面に光を当てることで,現場の実践者に貴重な洞察とガイダンスを提供することを目的とする。これらの懸念を理解し、対処することは、様々なアプリケーションにおけるLLMの信頼性と倫理的に健全な展開を達成する上で重要である。

論文の概要: Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment

関連論文リスト