Fugu-MT 論文翻訳(概要): The Scales of Justitia: A Comprehensive Survey on Safety Evaluation of LLMs

論文の概要: The Scales of Justitia: A Comprehensive Survey on Safety Evaluation of LLMs

arxiv url: http://arxiv.org/abs/2506.11094v1
Date: Fri, 06 Jun 2025 05:50:50 GMT
ステータス: 翻訳完了
システム内更新日: 2025-06-16 17:50:49.451591
Title: The Scales of Justitia: A Comprehensive Survey on Safety Evaluation of LLMs
Title（参考訳）: ユスティシアの尺度 : LLMの安全性評価に関する総合的な調査
Authors: Songyang Liu, Chaozhuo Li, Jiameng Qiu, Xi Zhang, Feiran Huang, Litian Zhang, Yiming Hei, Philip S. Yu,
Abstract要約: 大規模言語モデル(LLM)は自然言語処理(NLP)分野において顕著な可能性を示した。 LLMは、特に敵のシナリオにおいて、毒性や偏見のような安全でない要素を時々示してきた。本調査は,LLMの安全性評価の最近の進歩を包括的かつ体系的に概観することを目的としている。
参考スコア（独自算出の注目度）: 42.57873562187369
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the rapid advancement of artificial intelligence technology, Large Language Models (LLMs) have demonstrated remarkable potential in the field of Natural Language Processing (NLP), including areas such as content generation, human-computer interaction, machine translation, and code generation, among others. However, their widespread deployment has also raised significant safety concerns. In recent years, LLM-generated content has occasionally exhibited unsafe elements like toxicity and bias, particularly in adversarial scenarios, which has garnered extensive attention from both academia and industry. While numerous efforts have been made to evaluate the safety risks associated with LLMs, there remains a lack of systematic reviews summarizing these research endeavors. This survey aims to provide a comprehensive and systematic overview of recent advancements in LLMs safety evaluation, focusing on several key aspects: (1) "Why evaluate" that explores the background of LLMs safety evaluation, how they differ from general LLMs evaluation, and the significance of such evaluation; (2) "What to evaluate" that examines and categorizes existing safety evaluation tasks based on key capabilities, including dimensions such as toxicity, robustness, ethics, bias and fairness, truthfulness, and so on; (3) "Where to evaluate" that summarizes the evaluation metrics, datasets and benchmarks currently used in safety evaluations; (4) "How to evaluate" that reviews existing evaluation toolkit, and categorizing mainstream evaluation methods based on the roles of the evaluators. Finally, we identify the challenges in LLMs safety evaluation and propose potential research directions to promote further advancement in this field. We emphasize the importance of prioritizing LLMs safety evaluation to ensure the safe deployment of these models in real-world applications.
Abstract（参考訳）: 人工知能技術の急速な進歩により、Large Language Models (LLMs) は、コンテンツ生成、人-コンピュータインタラクション、機械翻訳、コード生成などを含む自然言語処理(NLP)分野において、顕著なポテンシャルを示した。しかし、その広範な展開は、重大な安全上の懸念ももたらした。近年、LSMが生成するコンテンツは、毒性や偏見のような危険な要素をしばしば示しており、特に敵対的なシナリオでは、学術と産業の両方から広く注目を集めている。 LLMの安全性リスクを評価するために多くの努力がなされてきたが、これらの研究成果を要約した体系的なレビューはいまだに残っていない。本調査は, LLMの安全性評価の最近の進歩を包括的かつ体系的に概観することを目的として, 1) LLMの安全性評価の背景を探求する「なぜ評価」, 一般のLCMの安全性評価とどのように異なるのか, 評価の意義を考察する「評価」, (2) 毒性, 堅牢性, 倫理性, バイアス, 公正性, 真実性などを含む重要な機能に基づいて, 既存の安全評価タスクを調査・分類する「評価」, (3) 安全性評価に現在使用されている指標, データセット, ベンチマークを要約する「評価」, (4) 既存の評価ツールキットを評価し, 評価者の役割に基づいて, 主流評価手法を分類する「評価方法」, などに着目した。最後に、LLMの安全性評価における課題を特定し、この分野のさらなる進歩を促進するための潜在的研究の方向性を提案する。我々は,LLMの安全性評価を優先することの重要性を強調し,これらのモデルが現実世界のアプリケーションに安全にデプロイされることを保証する。

論文の概要: The Scales of Justitia: A Comprehensive Survey on Safety Evaluation of LLMs

関連論文リスト