Fugu-MT 論文翻訳(概要): Improving Fairness in LLMs Through Testing-Time Adversaries

論文の概要: Improving Fairness in LLMs Through Testing-Time Adversaries

arxiv url: http://arxiv.org/abs/2505.12100v1
Date: Sat, 17 May 2025 17:56:53 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-20 14:57:11.038547
Title: Improving Fairness in LLMs Through Testing-Time Adversaries
Title（参考訳）: テスト・タイム・アドバーナリーによるLCMの公平性向上
Authors: Isabela Pereira Gregio, Ian Pons, Anna Helena Reali Costa, Artur Jordão,
Abstract要約: 大規模言語モデル(LLM)は自然言語処理と生成AIのバウンダリをプッシュする。本研究では,このようなバイアスを軽減するための,単純でユーザフレンドリで実践的な手法を提案する。本手法は,特定の属性を修正し,対応する予測行動を評価することによって,与えられた文の複数のバリエーションを生成する。
参考スコア（独自算出の注目度）: 1.7811840395202343
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) push the bound-aries in natural language processing and generative AI, driving progress across various aspects of modern society. Unfortunately, the pervasive issue of bias in LLMs responses (i.e., predictions) poses a significant and open challenge, hindering their application in tasks involving ethical sensitivity and responsible decision-making. In this work, we propose a straightforward, user-friendly and practical method to mitigate such biases, enhancing the reliability and trustworthiness of LLMs. Our method creates multiple variations of a given sentence by modifying specific attributes and evaluates the corresponding prediction behavior compared to the original, unaltered, prediction/sentence. The idea behind this process is that critical ethical predictions often exhibit notable inconsistencies, indicating the presence of bias. Unlike previous approaches, our method relies solely on forward passes (i.e., testing-time adversaries), eliminating the need for training, fine-tuning, or prior knowledge of the training data distribution. Through extensive experiments on the popular Llama family, we demonstrate the effectiveness of our method in improving various fairness metrics, focusing on the reduction of disparities in how the model treats individuals from different racial groups. Specifically, using standard metrics, we improve the fairness in Llama3 in up to 27 percentage points. Overall, our approach significantly enhances fairness, equity, and reliability in LLM-generated results without parameter tuning or training data modifications, confirming its effectiveness in practical scenarios. We believe our work establishes an important step toward enabling the use of LLMs in tasks that require ethical considerations and responsible decision-making.
Abstract（参考訳）: 大規模言語モデル(LLMs)は、自然言語処理と生成AIのバウンダリを押し上げ、現代社会の様々な側面を前進させます。残念なことに、LSMの反応における偏見(すなわち予測)の広範にわたる問題は、倫理的感受性や責任ある意思決定を含むタスクにおける彼らの応用を妨げる、重要かつオープンな課題を生じさせる。本研究では,これらのバイアスを軽減し,LCMの信頼性と信頼性を高めるための,単純でユーザフレンドリで実践的な手法を提案する。提案手法は,特定の属性を変更して与えられた文の複数のバリエーションを生成し,原文,未修正,予測/文と比較して対応する予測行動を評価する。このプロセスの背景にある考え方は、批判的な倫理的予測がしばしば顕著な矛盾を示し、バイアスの存在を示しているということである。従来の手法とは異なり,本手法はフォワードパス(例えば,テストタイムの敵)にのみ依存しており,トレーニングや微調整,あるいは事前のトレーニングデータ分布の知識を排除している。一般的なラマ族に関する広範な実験を通じて、モデルが異なる人種集団から個人をどう扱うかについての格差の低減に焦点をあて、様々な公正度指標を改善するための方法の有効性を実証した。具体的には、標準指標を用いて、Llama3のフェアネスを最大27ポイント改善する。全体として,本手法は,パラメータ調整やデータ修正を伴わないLCM生成結果の公平性,公平性,信頼性を著しく向上させ,実用シナリオにおける有効性を確認した。我々は、倫理的配慮と責任ある意思決定を必要とするタスクにおいて、LLMの使用を可能にするための重要なステップを確立すると信じている。

論文の概要: Improving Fairness in LLMs Through Testing-Time Adversaries

関連論文リスト