Fugu-MT 論文翻訳(概要): Emotion-Conditioned Short-Horizon Human Pose Forecasting with a Lightweight Predictive World Model

論文の概要: Emotion-Conditioned Short-Horizon Human Pose Forecasting with a Lightweight Predictive World Model

arxiv url: http://arxiv.org/abs/2604.23532v1
Date: Sun, 26 Apr 2026 04:56:45 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-28 17:12:07.414526
Title: Emotion-Conditioned Short-Horizon Human Pose Forecasting with a Lightweight Predictive World Model
Title（参考訳）: 軽量予測世界モデルを用いた感情調和型短時間人文予測
Authors: Jingni Huang, Peter Bloodsworth,
Abstract要約: 短期的な人間のポーズ予測は、対話システム、補助ロボット、感情に敏感な人間とコンピュータの相互作用において重要な役割を果たす。本稿では,表情に基づく感情の埋め込みが,短時間のポーズ予測に補助的な条件付き信号を提供するかどうかを検討する。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Short-term human pose prediction plays a crucial role in interactive systems, assistive robots, and emotion-aware human-computer interaction[1-3]. While current trajectory prediction models primarily rely on geometric motion cues, they often overlook the underlying emotional signals influencing human motion dynamics[4-5]. This paper investigates whether facial expression-derived emotion embeddings can provide auxiliary conditional signals for short-term pose prediction. To further evaluate multimodal conditionation in a recursive prediction setting, we propose a lightweight autoregressive predictive world model that performs 15-step rolling pose prediction. This framework combines pose keypoints with emotion embeddings through a learnable gating mechanism and performs autoregressive unfolding prediction using a recurrent sequence model based on a two-layer LSTM architecture. Experiments were conducted on two small-scale pose-emotion video datasets: controlled motion sequences with minimal facial expression changes and, natural emotion-driven motion sequences with considerable facial expression changes. The results show that simple multimodal fusion does not consistently improve prediction accuracy, while normalized gating fusion significantly enhances the performance of emotion-driven motion sequences. Furthermore, counterfactual perturbation experiments demonstrate that the predicted trajectory exhibits measurable sensitivity to changes in multimodal input, suggesting that facial expression embeddings act as auxiliary conditional signals rather than redundant features. In summary, these results indicate that incorporating facial expression-derived emotion embeddings into emotion-conditional short-term pose prediction based on a lightweight predictive world model architecture is a feasible approach.
Abstract（参考訳）: 短期的な人間のポーズ予測は、対話システム、補助ロボット、感情に敏感な人間とコンピュータの相互作用[1-3]において重要な役割を果たす。現在の軌道予測モデルは、主に幾何学的な動きの手がかりに頼っているが、人間の動きのダイナミクスに影響を与える感情的な信号を見落としていることが多い[4-5]。本稿では,表情に基づく感情の埋め込みが,短時間のポーズ予測に補助的な条件付き信号を提供するかどうかを検討する。再帰的予測設定におけるマルチモーダル条件付けをさらに評価するために、15ステップのロールポーズ予測を行う軽量な自己回帰予測世界モデルを提案する。このフレームワークは、ポーズキーポイントと学習可能なゲーティング機構による感情埋め込みを組み合わせ、2層LSTMアーキテクチャに基づく繰り返しシーケンスモデルを用いて自己回帰展開予測を行う。 2つの小さなポーズ感情ビデオデータセット(最小の表情変化を伴う制御された動きシーケンスと、かなりの表情変化を持つ自然な感情駆動的な動きシーケンス)で実験を行った。その結果, 単純なマルチモーダル融合は予測精度を常に向上しないが, 正規化ゲーティング融合は感情駆動型モーションシーケンスの性能を著しく向上させることがわかった。さらに, 対物摂動実験により, 予測軌道は多モーダル入力の変化に対して測定可能な感度を示し, 顔表情の埋め込みが冗長な特徴よりも補助的条件信号として機能することが示唆された。要約すると、これらの結果は、表情に基づく感情の埋め込みを、軽量な予測的世界モデルアーキテクチャに基づく感情条件の短期的ポーズ予測に組み込むことは、実現可能なアプローチであることを示している。

論文の概要: Emotion-Conditioned Short-Horizon Human Pose Forecasting with a Lightweight Predictive World Model

関連論文リスト