Fugu-MT 論文翻訳(概要): When Robots Say No: The Empathic Ethical Disobedience Benchmark

論文の概要: When Robots Say No: The Empathic Ethical Disobedience Benchmark

arxiv url: http://arxiv.org/abs/2512.18474v1
Date: Sat, 20 Dec 2025 19:35:08 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-23 08:17:40.450611
Title: When Robots Say No: The Empathic Ethical Disobedience Benchmark
Title（参考訳）: ロボットがノーと言うとき:共感的な倫理的不服従のベンチマーク
Authors: Dmytro Kuzmenko, Nadiya Shvai,
Abstract要約: 我々は, 拒否の安全性と社会的受容性を共同で評価する標準化されたテストベッドであるEED(Empathic Ethical Disobedience)Gymを提示する。 EED Gymを用いて、マスク行為は安全でないコンプライアンスを排除し、説明的拒絶は信頼を維持するのに役立つ。
参考スコア（独自算出の注目度）: 2.1127261244588156
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Robots must balance compliance with safety and social expectations as blind obedience can cause harm, while over-refusal erodes trust. Existing safe reinforcement learning (RL) benchmarks emphasize physical hazards, while human-robot interaction trust studies are small-scale and hard to reproduce. We present the Empathic Ethical Disobedience (EED) Gym, a standardized testbed that jointly evaluates refusal safety and social acceptability. Agents weigh risk, affect, and trust when choosing to comply, refuse (with or without explanation), clarify, or propose safer alternatives. EED Gym provides different scenarios, multiple persona profiles, and metrics for safety, calibration, and refusals, with trust and blame models grounded in a vignette study. Using EED Gym, we find that action masking eliminates unsafe compliance, while explanatory refusals help sustain trust. Constructive styles are rated most trustworthy, empathic styles -- most empathic, and safe RL methods improve robustness but also make agents more prone to overly cautious behavior. We release code, configurations, and reference policies to enable reproducible evaluation and systematic human-robot interaction research on refusal and trust. At submission time, we include an anonymized reproducibility package with code and configs, and we commit to open-sourcing the full repository after the paper is accepted.
Abstract（参考訳）: ロボットは、盲目の服従が害をもたらす可能性があるため、安全と社会的期待のコンプライアンスをバランスさせなければならないが、過度に拒否される信頼は損なわれる。既存の安全強化学習(RL)ベンチマークでは物理的危険が強調され、人間とロボットのインタラクション信頼研究は小規模で再現が難しい。我々は, 拒否の安全性と社会的受容性を共同で評価する標準化されたテストベッドであるEED(Empathic Ethical Disobedience)Gymを提示する。エージェントは、従うことを選んだり、(説明なしで)拒否したり、明確にしたり、より安全な代替案を提案したりする際に、リスク、影響、信頼を測る。 EED Gymは、さまざまなシナリオ、複数のペルソナプロファイル、安全性、キャリブレーション、拒絶のためのメトリクスを提供する。 EED Gymを用いて、アクションマスキングは安全でないコンプライアンスを排除し、説明的拒絶は信頼を維持するのに役立つ。構成的スタイルは最も信頼性が高く共感的なスタイルと評価され、最も共感的で安全なRLメソッドは堅牢性を改善するが、エージェントは過度に慎重な振る舞いをする傾向がある。我々は、再現可能な評価を可能にするためのコード、設定、参照ポリシーを公開し、拒絶と信頼に関する体系的な人間とロボットの相互作用の研究を行う。提出時点では、コードとコンフィギュレーションを備えた匿名の再現性パッケージを含み、論文が受け入れられた後、完全なリポジトリをオープンソース化することを約束します。

論文の概要: When Robots Say No: The Empathic Ethical Disobedience Benchmark

関連論文リスト