Fugu-MT 論文翻訳(概要): PersonaTeaming: Supporting Persona-Driven Red-Teaming for Generative AI

論文の概要: PersonaTeaming: Supporting Persona-Driven Red-Teaming for Generative AI

arxiv url: http://arxiv.org/abs/2605.05682v1
Date: Thu, 07 May 2026 05:19:51 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-08 22:27:11.526035
Title: PersonaTeaming: Supporting Persona-Driven Red-Teaming for Generative AI
Title（参考訳）: Persona Teaming: ジェネレーティブAIのためのペルソナ駆動のレッドチームを支援する
Authors: Wesley Hanwen Deng, Mingxi Yan, Sunnie S. Y. Kim, Akshita Jha, Lauren Wilcox, Kenneth Holstein, Motahhare Eslami, Leon A. Gatys,
Abstract要約: 我々はペルソナを対向的なプロンプト生成プロセスに組み込むペルソナチームを開発する。次に、PersonaTeamingをPersonaTeaming Playgroundとしてインスタンス化します。業界実践者11名を対象に行ったユーザスタディでは,PersonaTeaming Playgroundが,実践者が有用だと感じたさまざまなレッドチーム戦略とアウトプットを可能にした。
参考スコア（独自算出の注目度）: 28.268811472721996
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent developments in AI safety research have called for red-teaming methods that effectively surface potential risks posed by generative AI models, with growing emphasis on how red-teamers' backgrounds and perspectives shape their strategies and the risks they uncover. While automated red-teaming approaches promise to complement human red-teaming through larger-scale exploration, existing automated approaches do not account for human identities and rarely incorporate human inputs. In this work, we explore persona-driven red-teaming to advance both automated red-teaming and human-AI collaboration. We first develop PersonaTeaming Workflow, which incorporates personas into the adversarial prompt generation process to explore a wider spectrum of adversarial strategies. Compared to RainbowPlus, a state-of-the-art automated red-teaming method, PersonaTeaming Workflow achieves higher attack success rates while maintaining prompt diversity. However, since automated personas only approximate real human perspectives, we further instantiate PersonaTeaming Workflow as PersonaTeaming Playground, a user-facing interface that enables red-teamers to author their own personas and collaborate with AI to mutate and refine prompts. In a user study with 11 industry practitioners, we found that PersonaTeaming Playground enabled diverse red-teaming strategies and outputs that practitioners perceived as useful, and that AI-generated suggestions in the PersonaTeaming Playground encouraged out-of-the-box thinking even when practitioners did not follow them strictly. Together, our work advances both automated and human-in-the-loop approaches to red-teaming, while shedding light on interaction patterns and design insights for supporting human-AI collaboration in generative AI red-teaming.
Abstract（参考訳）: AI安全研究の最近の進歩は、生成的AIモデルによって引き起こされる潜在的なリスクを効果的に表面化する、赤チームの背景と視点が彼らの戦略と彼らが発見するリスクをいかに形作るかに重点を置いている。自動化された赤チームアプローチは、大規模な探索を通じて人間の赤チームの補完を約束するが、既存の自動化アプローチは人間のアイデンティティを考慮せず、人間の入力を組み込むことは滅多にない。本研究では,人手駆動のレッドチームについて,自動化されたレッドチームと人間とAIのコラボレーションの促進について検討する。我々はまず,ペルソナを対人的プロンプト生成プロセスに組み込んだペルソナコラボレーションワークフローを開発し,より幅広い対人的戦略を探索する。最先端の自動化赤チーム方式であるRainbowPlusと比較して、ペルソナチームワークフローは、迅速な多様性を維持しながら、より高い攻撃成功率を達成する。しかし、自動化されたペルソナは実際の人間の視点にのみ近似するため、さらにPersonaTeaming WorkflowをPersonaTeaming Playgroundとしてインスタンス化する。その結果,PersonaTeaming Playgroundのユーザスタディでは,PersonaTeaming Playgroundは,実践者が有用と認識した多様な赤チーム戦略とアウトプットを可能にし,PersonaTeaming PlaygroundのAI生成提案は,実践者が厳格にフォローしていない場合でも,アウト・オブ・ボックス思考を奨励していることがわかった。当社の作業は、自動と人道へのアプローチの両方をレッドチームに進めると同時に、AI生成における人間とAIのコラボレーションを支援するためのインタラクションパターンとデザインの洞察に光を当てています。

論文の概要: PersonaTeaming: Supporting Persona-Driven Red-Teaming for Generative AI

関連論文リスト