Fugu-MT 論文翻訳(概要): ActorMind: Emulating Human Actor Reasoning for Speech Role-Playing

論文の概要: ActorMind: Emulating Human Actor Reasoning for Speech Role-Playing

arxiv url: http://arxiv.org/abs/2604.11103v1
Date: Mon, 13 Apr 2026 07:20:20 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-14 20:13:16.393121
Title: ActorMind: Emulating Human Actor Reasoning for Speech Role-Playing
Title（参考訳）: ActorMind:音声ロールプレイングのための人間アクター推論をエミュレートする
Authors: Xi Chen, Wei Xue, Yike Guo,
Abstract要約: 音声ロールプレイングにより、モデルは、役割、シーン、および音声対話に基づいて、パーソナライズされた言語特性を持つ自発的な応答を提供することができる。 ActorMindは、劇場での俳優の演技をエミュレートする、既成のマルチエージェント・チェーンスタイルの推論フレームワークだ。
参考スコア（独自算出の注目度）: 33.79520698647648
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Role-playing has garnered rising attention as it provides a strong foundation for human-machine interaction and facilitates sociological research. However, current work is confined to textual modalities, neglecting speech, which plays a predominant role in daily life, thus limiting genuine role-playing. To bridge this gap, we conceptualize and benchmark speech role-playing through ActorMindBench, and we present a corresponding reasoning framework, called ActorMind. Specifically, (1) Speech Role-Playing enables models to deliver spontaneous responses with personalized verbal traits based on their role, the scene, and spoken dialogue. (2) ActorMindBench is a hierarchical benchmark comprises Utterance-Level content with 7,653 utterances, Scene-Level content with 313 scenes, and Role-Level content with 6 roles. (3) ActorMind is an off-the-shelf, multi-agent, chain-of-though style reasoning framework that emulates how human actors perform in theaters. Concretely, ActorMind first reads its assigned role description via Eye Agent, then comprehends emotional cues within contextual spoken dialogues through Ear Agent. Subsequently, Brain Agent generates a descriptive emotional state, and finally, Mouth Agent delivers the scripts infused with corresponding emotion state. Experimental results demonstrate the effectiveness of ActorMind in enhancing speech role-playing.
Abstract（参考訳）: ロールプレイングは、人間と機械の相互作用の強力な基盤を提供し、社会学的研究を促進するため、注目を集めている。しかし、現在の仕事はテキストのモダリティに限られており、日々の生活において重要な役割を果たす音声を無視し、真のロールプレイングを制限する。このギャップを埋めるために、ActorMindBenchによる音声ロールプレイの概念化とベンチマークを行い、ActorMindと呼ばれるそれに対応する推論フレームワークを提案する。具体的には,(1)音声ロールプレイングにより,その役割,シーン,および音声対話に基づいて,パーソナライズされた発話特性を持つ自発的な応答をモデルに提供することができる。 2) ActorMindBenchは、7,653の発話を持つ発話レベルコンテンツ、313のシーンを持つシーンレベルコンテンツ、6つのロールを持つロールレベルコンテンツからなる階層型ベンチマークである。 (3) ActorMindは、劇場での人間の演技をエミュレートする、既成のマルチエージェント・チェーン・オブ・ルックスタイルの推論フレームワークである。具体的には、ActorMindはまずEye Agentを介して割り当てられた役割記述を読み、Ear Agentを通してコンテキスト音声対話の中で感情的な手がかりを理解する。その後、Brain Agentは記述的な感情状態を生成し、最後に、Mouth Agentは対応する感情状態で注入されたスクリプトを提供する。 ActorMindが音声ロールプレイングの強化に有効であることを示す実験結果が得られた。

論文の概要: ActorMind: Emulating Human Actor Reasoning for Speech Role-Playing

関連論文リスト