Fugu-MT 論文翻訳(概要): Auditing demographic bias in AI-based emergency police dispatch: a cross-lingual evaluation of eleven large language models

論文の概要: Auditing demographic bias in AI-based emergency police dispatch: a cross-lingual evaluation of eleven large language models

arxiv url: http://arxiv.org/abs/2605.01451v1
Date: Sat, 02 May 2026 14:00:57 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-05 20:33:49.78044
Title: Auditing demographic bias in AI-based emergency police dispatch: a cross-lingual evaluation of eleven large language models
Title（参考訳）: AIに基づく救急警察派遣における人口統計バイアス:11大言語モデルの言語横断的評価
Authors: William Guey, Wei Zhang, Pierrick Bougault, Yi Wang, Bertan Ucar, Vitor D. de Moura, José O. Gomes,
Abstract要約: 本稿では,警察プライオリティ・ディスパッチ・システムを運用するための言語間監査フレームワークを提案する。事故の重大さがあいまいな場合に、人口統計バイアスが体系的に現れることがわかりました。ジェンダーバイアスは中国語では顕著に増幅されるが、人種バイアスは英語ではより顕著である。
参考スコア（独自算出の注目度）: 3.85700834433791
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) are rapidly being integrated into high-stakes public safety systems, including emergency call triage and dispatch decision support, yet their demographic fairness in this context remains largely untested. Here we introduce a cross-lingual audit framework that operationalizes the Police Priority Dispatch System as a five-level ordinal classification task and applies a controlled minimal-pair design to isolate the effect of demographic cues. Across 19,800 model outputs spanning 11 frontier models, 15 scenario pairs, three demographic categories (religious appearance, gender, and race), and two languages (English and Mandarin Chinese), we find that demographic bias emerges systematically when incident severity is ambiguous but largely disappears when the operational priority is clearly determined by call content. Bias magnitude varies by demographic axis, with the largest effects observed for religious appearance, followed by gender and race. Critically, bias does not transfer consistently across languages: gender bias is substantially amplified in Mandarin Chinese, whereas race bias is more pronounced in English, revealing cross-lingual asymmetries that aggregate analyses obscure. In several scenarios, demographic cues produce counter-directional effects, challenging simple stereotype-amplification accounts of model behavior. These findings suggest that bias in LLM-based dispatch is not a fixed property of models alone, but arises from the interaction between demographic signals, contextual ambiguity, and language. Beyond these empirical results, the proposed framework provides a scalable audit infrastructure that enables deploying agencies to evaluate candidate models on jurisdiction-relevant scenarios prior to real-world adoption.
Abstract（参考訳）: 大規模言語モデル(LLM)は、緊急呼び出しのトリアージやディスパッチ決定支援など、高度の公共安全システムに急速に統合されているが、この文脈における彼らの人口統計学的公正性はほとんどテストされていない。ここでは,警察プライオリティ・ディスパッチ・システムを5段階の序列分類タスクとして運用する言語横断型監査フレームワークを導入し,人口統計学的方法の効果を分離するための最小ペア設計を適用した。 19,800のモデルアウトプットは,11つのフロンティアモデル,15のシナリオペア,3つの人口カテゴリー(宗教的外観,性別,人種),および2つの言語(英語とマンダリン中国語)にまたがる。バイアスの規模は人口軸によって異なり、宗教的な外観で観察される最大の影響は性別と人種である。性バイアスは中国語で実質的に増幅されるが、人種バイアスは英語でより発音され、分析を曖昧に集約する言語横断の非対称性が明らかになる。いくつかのシナリオにおいて、人口統計学の手がかりは、モデル行動の単純なステレオタイプ増幅の説明に挑戦する対向効果をもたらす。これらの結果は, LLMに基づくディスパッチのバイアスは, モデルのみの固定特性ではなく, 人口統計信号, 文脈的あいまいさ, 言語間の相互作用から生じることを示唆している。これらの経験的な結果以外にも、提案フレームワークはスケーラブルな監査インフラを提供しており、実際の採用に先立って、代理店が管轄権関連シナリオの候補モデルを評価できるようにする。

論文の概要: Auditing demographic bias in AI-based emergency police dispatch: a cross-lingual evaluation of eleven large language models

関連論文リスト