Fugu-MT 論文翻訳(概要): Investigating Intersectional Bias in Large Language Models using Confidence Disparities in Coreference Resolution

論文の概要: Investigating Intersectional Bias in Large Language Models using Confidence Disparities in Coreference Resolution

arxiv url: http://arxiv.org/abs/2508.07111v1
Date: Sat, 09 Aug 2025 22:24:40 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-12 21:23:28.705214
Title: Investigating Intersectional Bias in Large Language Models using Confidence Disparities in Coreference Resolution
Title（参考訳）: 信頼の相違を利用した大規模言語モデルにおける断面積バイアスの調査
Authors: Falaah Arif Khan, Nivedha Sivakumar, Yinong Oliver Wang, Katherine Metcalf, Cezanne Camacho, Barry-John Theobald, Luca Zappella, Nicholas Apostoloff,
Abstract要約: 大規模言語モデル(LLM)は目覚ましいパフォーマンスを達成し、採用や受け入れといったリソース制約のあるコンテキストで意思決定支援ツールとして広く採用されている。しかし、AIシステムは社会的バイアスを反映し、さらに悪化させることができるという科学的コンセンサスがあり、批判的な社会的文脈で使用される場合、アイデンティティに基づく害についての懸念が高まる。本研究では,複数の識別軸が交差する際,異なる不利パターンを生じることを認識して,一軸の公平性評価を拡張し,交差バイアスを検証した。
参考スコア（独自算出の注目度）: 5.061421107401101
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Large language models (LLMs) have achieved impressive performance, leading to their widespread adoption as decision-support tools in resource-constrained contexts like hiring and admissions. There is, however, scientific consensus that AI systems can reflect and exacerbate societal biases, raising concerns about identity-based harm when used in critical social contexts. Prior work has laid a solid foundation for assessing bias in LLMs by evaluating demographic disparities in different language reasoning tasks. In this work, we extend single-axis fairness evaluations to examine intersectional bias, recognizing that when multiple axes of discrimination intersect, they create distinct patterns of disadvantage. We create a new benchmark called WinoIdentity by augmenting the WinoBias dataset with 25 demographic markers across 10 attributes, including age, nationality, and race, intersected with binary gender, yielding 245,700 prompts to evaluate 50 distinct bias patterns. Focusing on harms of omission due to underrepresentation, we investigate bias through the lens of uncertainty and propose a group (un)fairness metric called Coreference Confidence Disparity which measures whether models are more or less confident for some intersectional identities than others. We evaluate five recently published LLMs and find confidence disparities as high as 40% along various demographic attributes including body type, sexual orientation and socio-economic status, with models being most uncertain about doubly-disadvantaged identities in anti-stereotypical settings. Surprisingly, coreference confidence decreases even for hegemonic or privileged markers, indicating that the recent impressive performance of LLMs is more likely due to memorization than logical reasoning. Notably, these are two independent failures in value alignment and validity that can compound to cause social harm.
Abstract（参考訳）: 大規模言語モデル(LLM)は目覚ましいパフォーマンスを達成し、採用や受け入れといったリソース制約のあるコンテキストで意思決定支援ツールとして広く採用されている。しかし、AIシステムは社会的バイアスを反映し、さらに悪化させることができるという科学的コンセンサスがあり、批判的な社会的文脈で使用される場合、アイデンティティに基づく害についての懸念が高まる。以前の研究は、異なる言語推論タスクにおける人口格差を評価することによって、LLMのバイアスを評価するための確固たる基盤を築いてきた。本研究では,複数の識別軸が交差する際,異なる不利パターンを生じることを認識して,一軸の公平性評価を拡張し,交差バイアスを検証した。 WinoIdentityと呼ばれる新しいベンチマークを作成し、WinoBiasデータセットを年齢、国籍、人種を含む10の属性にわたる25の人口統計マーカーで拡張し、二進性と交差し、50の異なるバイアスパターンを評価するための245,700のプロンプトを得る。非表現性による省略の害に着目し、不確実性のレンズを通してバイアスを調査し、モデルが他のモデルよりもある程度の信頼度を持つかどうかを測る、コロンス信頼格差と呼ばれる群(不公平度)尺度を提案する。我々は最近発表された5つのLCMを評価し、身体タイプ、性的指向、社会経済的地位などの様々な属性に沿って、40%以上の信頼格差を見出した。驚くべきことに、コア参照の信頼性は、ヘゲモニックや特権マーカーであっても低下し、最近のLCMの印象的な性能は、論理的推論よりも記憶による可能性が高いことを示唆している。特に、これらは価値アライメントと妥当性の2つの独立した失敗であり、社会的損害を引き起こす可能性がある。

論文の概要: Investigating Intersectional Bias in Large Language Models using Confidence Disparities in Coreference Resolution

関連論文リスト