Fugu-MT 論文翻訳(概要): BiAxisAudit: A Novel Framework to Evaluate LLM Bias Across Prompt Sensitivity and Response-Layer Divergence

論文の概要: BiAxisAudit: A Novel Framework to Evaluate LLM Bias Across Prompt Sensitivity and Response-Layer Divergence

arxiv url: http://arxiv.org/abs/2605.09041v1
Date: Sat, 09 May 2026 16:26:49 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-12 23:28:50.038882
Title: BiAxisAudit: A Novel Framework to Evaluate LLM Bias Across Prompt Sensitivity and Response-Layer Divergence
Title（参考訳）: BiAxisAudit: 急激な感度と応答層多様性を横断するLLMバイアスを評価するための新しいフレームワーク
Authors: Jialing Gan, Junhao Dong, Songze Li,
Abstract要約: 大規模言語モデルのバイアス監査は、EU AI Actなどのガバナンスフレームワーク内で運用されている。このプロトコルでは、各バイアススコアを2つの軸上での信頼性推定とともに報告する。
参考スコア（独自算出の注目度）: 22.315546054051143
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Bias audits of large language models now operate within governance frameworks such as the EU AI Act, making benchmark reliability a security concern in its own right. Many current benchmarks, however, collapse bias into a single scalar from one prompt format and one surface label. This design misses two failure modes that can be exploited without changing model weights. Across prompts, meaning-preserving format changes shift bias endorsement by more than $0.7$ on a fixed statement pool. Within a response, the discrete Selection and free-text Elaboration can take opposing stances, so an apparently clean aggregate may hide substantial internal inconsistency (a ``cancellation trap''). Selection-only and elaboration-only rankings are therefore nearly uncorrelated across eight LLMs (Spearman $ρ= 0.238$, $p = 0.570$): LLaMA3-70B ranks in the middle under selection-only scoring but highest under elaboration-only scoring on the same responses. We introduce \textsc{BiAxisAudit}, a protocol that reports each bias score together with a reliability estimate on two orthogonal axes. The across-prompt axis evaluates each statement under a factorial grid of task format, perspective, role, and sentiment, treating bias as a distribution rather than a point estimate. The within-response axis uses Split Coding to recover Selection and Elaboration as separate signals, measured by the Inconsistency Rate and Divergence Net Imbalance. Across eight LLMs with $80{,}200$ coded responses each, task format alone explains as much variance as model choice; $63.6\%$ of pooled bias signals (up to $85.2\%$ per model) appear in only one coding layer, and prompt-dimension interactions exceed main effects. The instrument also separates real bias reductions from apparent reductions caused by cross-layer redistribution: some prompt configurations reduce both BER and IR, whereas others suppress only selection-layer bias.
Abstract（参考訳）: 大規模な言語モデルのバイアス監査は、EU AI Actのようなガバナンスフレームワーク内で運用されている。しかし、現在のベンチマークでは、1つのプロンプトフォーマットと1つのサーフェスラベルから1つのスカラーに崩壊バイアスが設定されている。この設計では、モデルの重みを変えることなく活用できる2つの障害モードを見逃している。アクロスプロンプトにより、意味保存フォーマットは、固定されたステートメントプールで0.7ドル以上のバイアス支持をシフトさせる。応答内では、離散的な選択と自由テキストの作業は反対の姿勢を取ることができるため、明らかにクリーンな集約は、実質的な内部の不整合を隠蔽する可能性がある(‘カンセレーショントラップ’)。したがって、選抜のみと選抜のみのランキングは8つのLDM(Spearman $ρ = 0.238$, $p = 0.570$): LLaMA3-70Bは選抜のみのスコアで、選抜のみのスコアでは最高である。本稿では,2つの直交軸上の信頼性推定値とともに,各バイアススコアを報告するプロトコルである‘textsc{BiAxisAudit} を紹介する。クロスプロンプト軸は、各ステートメントをタスク形式、視点、役割、感情の因子的グリッドの下で評価し、偏差を点推定ではなく分布として扱う。応答内軸はスプリット符号化を用いて、不整合率と分散ネット不均衡によって測定された分離信号として選択と協調を復元する。 80{,}200$の符号付き応答を持つ8つのLCMにおいて、タスク形式だけでモデル選択と同じくらいのばらつきを説明できる。この装置はまた、実際のバイアス低減を、層間再分配による明らかな低減と区別する:いくつかの急進的な構成はBERとIRの両方を減少させ、他方は選択層バイアスのみを抑制する。

論文の概要: BiAxisAudit: A Novel Framework to Evaluate LLM Bias Across Prompt Sensitivity and Response-Layer Divergence

関連論文リスト