Fugu-MT 論文翻訳(概要): Real-time body pose non-verbal communication with a consistency-based reliability measure

論文の概要: Real-time body pose non-verbal communication with a consistency-based reliability measure

arxiv url: http://arxiv.org/abs/2606.09390v1
Date: Mon, 08 Jun 2026 12:05:23 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-09 14:42:06.9645
Title: Real-time body pose non-verbal communication with a consistency-based reliability measure
Title（参考訳）: 整合性に基づく信頼度測定による非言語通信のリアルタイムボディーポーズ
Authors: Alina Marcu, Dragos Costea, Cristina Lazar, Marius Leordeanu,
Abstract要約: 本研究では,2次元身体ポーズからコミュニケーション意図の認識を単独で検討する。身体の動きは特に、リアルタイムの低コストのオン・デバイス通信を必要とするシナリオにおいて信頼性の高い信号である、と我々は主張する。効果のあるコーパスは、ボディ、顔、音声、テキストを組み合わせ、スケルトン行動認識ベンチマークは、伝達されたメッセージよりも実行されたアクションをラベル付けする。
参考スコア（独自算出の注目度）: 6.623088068354071
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Body movement communicates intent at distances and in conditions where neither the face, nor speech can be captured. We study the recognition of communicative intent from 2D body pose alone. We argue that body motion is a reliable signal especially in scenarios that require real time low-cost on-device person-to-robot communication in long distance environments, such as rescue missions. However, existing resources do not isolate this signal. Affective corpora combine body, face, voice and text, while skeleton action-recognition benchmarks label the action performed rather than the message conveyed. We release a dataset of real frames of full-body pose covering ten communicative intents and we compare it against other real (IPC) and synthetic (MotionLCM, VEO3.1, Kimodo) ones that span a range of difficulty. We target systems that can run on a robot's limited onboard hardware. We benchmark multiple models, from skeleton graph classifiers to joint motion-forecasting networks, and report performance metrics together with frame rate on an embedded GPU (NVIDIA Orin~Nano), since speed matters as much as accuracy in our scenario. Finally, we show that a model's own autoregressive self-consistency works as an unsupervised reliability signal. We give a short proof that bounds the probability that a self-consistent prediction is correct, show that this probability grows with the number of consistent steps, and identify the conditions under which a confident prediction can still be false, benchmarked against industry-standard metrics.
Abstract（参考訳）: 身体の動きは、顔やスピーチを捉えない状況や距離で意図を伝達する。本研究では,2次元身体ポーズからコミュニケーション意図の認識を単独で検討する。救助任務などの遠距離環境下でのデバイス上でのロボット間通信をリアルタイムに行う場合,特に身体の動きは信頼性の高い信号である,と我々は主張する。しかし、既存のリソースはこの信号を分離していない。効果のあるコーパスは、ボディ、顔、音声、テキストを組み合わせ、スケルトン行動認識ベンチマークは、伝達されたメッセージよりも実行されたアクションをラベル付けする。我々は,10のコミュニケーション意図を包含するフルボディポーズの実際のフレームのデータセットを公開し,それを,様々な難易度にまたがる他のリアル(IPC)および合成(MotionLCM, VEO3.1, Kimodo)のフレームと比較する。ロボットの限られたハードウェア上で動作可能なシステムをターゲットにしています。我々は,スケルトングラフ分類器からジョイントモーション予測ネットワークに至るまで,複数のモデルをベンチマークし,GPU(NVIDIA Orin~Nano)のフレームレートとともに性能指標を報告する。最後に,モデルの自己回帰自己整合性は教師なし信頼性信号として機能することを示す。自己整合予測が正しい確率を束縛し、この確率が一貫したステップの数で大きくなることを示し、自信のある予測がまだ誤りである条件を特定し、業界標準メトリクスに対してベンチマークする。

論文の概要: Real-time body pose non-verbal communication with a consistency-based reliability measure

関連論文リスト