Fugu-MT 論文翻訳(概要): Open-DeBias: Toward Mitigating Open-Set Bias in Language Models

論文の概要: Open-DeBias: Toward Mitigating Open-Set Bias in Language Models

arxiv url: http://arxiv.org/abs/2509.23805v1
Date: Sun, 28 Sep 2025 11:08:39 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-30 22:32:19.459058
Title: Open-DeBias: Toward Mitigating Open-Set Bias in Language Models
Title（参考訳）: Open-DeBias: 言語モデルにおけるオープンセットバイアスの緩和
Authors: Arti Rani, Shweta Singh, Nihar Ranjan Sahoo, Gaurav Kumar Nayak,
Abstract要約: 我々は,テキストベースの質問応答タスクにおいて,オープンセットバイアス検出と緩和という新たな課題に取り組む。 OpenBiasBenchは、様々なカテゴリやサブグループにまたがるバイアスを評価するために設計されたベンチマークである。また,新しいデータ効率,パラメータ効率のデバイアス法であるOpen-DeBiasを提案する。
参考スコア（独自算出の注目度）: 6.958242323649994
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have achieved remarkable success on question answering (QA) tasks, yet they often encode harmful biases that compromise fairness and trustworthiness. Most existing bias mitigation approaches are restricted to predefined categories, limiting their ability to address novel or context-specific emergent biases. To bridge this gap, we tackle the novel problem of open-set bias detection and mitigation in text-based QA. We introduce OpenBiasBench, a comprehensive benchmark designed to evaluate biases across a wide range of categories and subgroups, encompassing both known and previously unseen biases. Additionally, we propose Open-DeBias, a novel, data-efficient, and parameter-efficient debiasing method that leverages adapter modules to mitigate existing social and stereotypical biases while generalizing to unseen ones. Compared to the state-of-the-art BMBI method, Open-DeBias improves QA accuracy on BBQ dataset by nearly $48\%$ on ambiguous subsets and $6\%$ on disambiguated ones, using adapters fine-tuned on just a small fraction of the training data. Remarkably, the same adapters, in a zero-shot transfer to Korean BBQ, achieve $84\%$ accuracy, demonstrating robust language-agnostic generalization. Through extensive evaluation, we also validate the effectiveness of Open-DeBias across a broad range of NLP tasks, including StereoSet and CrowS-Pairs, highlighting its robustness, multilingual strength, and suitability for general-purpose, open-domain bias mitigation. The project page is available at: https://sites.google.com/view/open-debias25
Abstract（参考訳）: 大きな言語モデル(LLM)は、質問応答(QA)タスクにおいて顕著な成功を収めていますが、公平さと信頼性を損なう有害なバイアスをしばしばエンコードします。既存のバイアス緩和アプローチの多くは、定義済みのカテゴリに制限されており、新規またはコンテキスト固有の創発的バイアスに対処する能力を制限する。このギャップを埋めるために、テキストベースのQAにおいて、オープンセットバイアス検出と緩和という新たな問題に取り組む。 OpenBiasBenchは、様々なカテゴリやサブグループにまたがる偏見を評価するために設計された総合的なベンチマークで、既知の偏見と以前は見えない偏見の両方を包含する。さらに,新しい,データ効率,パラメータ効率のデバイアス手法であるOpen-DeBiasを提案する。最先端のBMBI手法と比較して、Open-DeBiasはBBQデータセットのQA精度を、曖昧な部分集合に対して4,8\%、曖昧な部分集合に対して6,6\%で改善する。注目すべきは、同じアダプタが、ゼロショットで韓国のBBQに転送され、84\%の精度を達成し、堅牢な言語に依存しない一般化を実証していることだ。また,StereoSet や CrowS-Pairs など幅広い NLP タスクにおける Open-DeBias の有効性を検証し,その堅牢性,多言語的強度,汎用的・オープン領域バイアス緩和への適合性を強調した。プロジェクトのページは、https://sites.google.com/view/open-debias25で公開されている。

論文の概要: Open-DeBias: Toward Mitigating Open-Set Bias in Language Models

関連論文リスト