Fugu-MT 論文翻訳(概要): AgentFairBench: Do LLM Agents Discriminate When They Act?

論文の概要: AgentFairBench: Do LLM Agents Discriminate When They Act?

arxiv url: http://arxiv.org/abs/2606.16723v1
Date: Mon, 15 Jun 2026 13:50:26 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-16 16:21:34.586568
Title: AgentFairBench: Do LLM Agents Discriminate When They Act?
Title（参考訳）: AgentFairBench: LLMエージェントは行動する時に差別するのか?
Authors: Triveni Morla, Rohith Reddy Bellibaltu, Manpreet Singh, Manmeet Singh Kapoor,
Abstract要約: AgentFairBenchは、LLMエージェントの動作における人口格差に対する、安価で再現可能なマルチドメインベンチマークである。これは、雇用、貸与、医療トリアージという、規制対象の3つの領域にまたがっている。 NumPyのみのハーネスは、反ファクト的なフリップ率、平均絶対スコア差(MASD)、アクションレートの相違、ツール起動の相違を計算する。
参考スコア（独自算出の注目度）: 2.3004655342211078
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language model (LLM) agents increasingly take actions (screening applicants, recommending credit, triaging patients), yet fairness for LLMs is still measured by grading answers. We introduce AgentFairBench, a cheap, reproducible, multi-domain benchmark for demographic disparity in the actions of LLM agents. Grounded in a companion framework, the Bias Conduction Framework (BCF, restated here), it spans three regulator-anchored domains: hiring, lending, and medical triage. Synthetic, demographic-neutral profiles are evaluated in counterfactual matched sets that vary only a name-coded race x gender signal (in the Bertrand Mullainathan tradition), under four agent scaffolds of increasing agency (direct, chain-of-thought, multi-agent deliberation, tool-augmented). A NumPy-only harness computes counterfactual flip rate, mean absolute score difference (MASD), action-rate disparity, and tool-invocation disparity, with bootstrap confidence intervals, paired tests, and false-discovery-rate control, for single-digit dollars per model. A live leaderboard with a held-out private split and a contamination canary admits external models by submission. Our pilot (864 decisions plus a test-retest replication) carries a methodological lesson: comparing a six-group score spread against a two-run noise difference overstates disparity by ~ 2.4X through statistic arity alone. Against an arity matched noise floor and an omnibus group test, claude haiku 4 5 shows no demographic effect above sampling noise (0 of 120 pairwise and 0 of 9 omnibus contrasts survive correction); a planted-bias test confirms the instrument detects disparity when present. The contribution is a sound, sensitive, adoption-ready instrument, the arity matched null methodology, and open artifacts to scale it. Code, data, and harness are released under open licenses, with an anonymized review artifact.
Abstract（参考訳）: 大規模言語モデル(LLM)エージェントは、ますます行動を起こす(応募者をスクリーニングし、クレジットを推奨し、患者をトリアージする)が、LCMの公平さは、依然として回答のグレードによって測定されている。本稿では,LLMエージェントの動作における人口格差に対する,安価で再現可能なマルチベンチマークであるAgentFairBenchを紹介する。 Bias Conduction Framework (BCF, restated here) は、雇用、貸与、医療トリアージの3つの規制対象ドメインにまたがるフレームワークである。シンセティックで人口統計学的なプロファイルは、(Bertrand Mullainathanの伝統において)名前付き人種xの性別信号だけが異なる反ファクトマッチングセットで評価され、増加するエージェンシーの4つのエージェント足場(直接的、連鎖的、マルチエージェント的熟考、ツール強化)で評価される。 NumPyのみのハーネスは、1モデル当たりの平均絶対スコア差(MASD)、アクションレートの相違、ツール呼び出しの相違、ブートストラップの信頼区間、ペアテスト、偽発見レートの制御を計算します。プライベートスプリットと汚染カナリアを備えたライブのリーダーボードは、提出によって外部モデルを認める。我々のパイロット(864の判定とテスト再テストの再現)は、統計的アリティのみによる差分が約2.4倍になる2ランノイズ差と6グループスコアの拡散を比較した方法論的な授業を行っている。アリティ整合ノイズフロアとオムニバス群検定に対し、クロード俳句45はサンプリングノイズ以上の人口動態効果は示さない(オムニバスの0対120、オムニバスの0対9は生存補正)。このコントリビューションは、健全で、センシティブで、採用可能な機器であり、arityはnullメソッドにマッチし、それをスケールするためのオープンアーティファクトである。コード、データ、ハーネスは、匿名化されたレビューアーティファクトとともに、オープンライセンスでリリースされている。

論文の概要: AgentFairBench: Do LLM Agents Discriminate When They Act?

関連論文リスト