Fugu-MT 論文翻訳(概要): HO-Flow: Generalizable Hand-Object Interaction Generation with Latent Flow Matching

論文の概要: HO-Flow: Generalizable Hand-Object Interaction Generation with Latent Flow Matching

arxiv url: http://arxiv.org/abs/2604.10836v1
Date: Sun, 12 Apr 2026 22:06:11 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-14 20:13:16.240539
Title: HO-Flow: Generalizable Hand-Object Interaction Generation with Latent Flow Matching
Title（参考訳）: HO-Flow: ラテントフローマッチングを用いた汎用ハンドオブジェクトインタラクション生成
Authors: Zerui Chen, Rolandos Alexandros Potamias, Shizhe Chen, Jiankang Deng, Cordelia Schmid, Stefanos Zafeiriou,
Abstract要約: HO-Flowはテキストと正準3Dオブジェクトから現実的な手動動作シーケンスを合成するためのフレームワークである。まず、手動と物体の動きのシーケンスを統一された潜在多様体に符号化するために、相互作用を意識した変分オートエンコーダを用いる。次に、自己回帰的時間的推論と連続的な潜伏生成を組み合わせたマスク付きフローマッチングモデルを利用する。
参考スコア（独自算出の注目度）: 113.81911881001905
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generating realistic 3D hand-object interactions (HOI) is a fundamental challenge in computer vision and robotics, requiring both temporal coherence and high-fidelity physical plausibility. Existing methods remain limited in their ability to learn expressive motion representations for generation and perform temporal reasoning. In this paper, we present HO-Flow, a framework for synthesizing realistic hand-object motion sequences from texts and canoncial 3D objects. HO-Flow first employs an interaction-aware variational autoencoder to encode sequences of hand and object motions into a unified latent manifold by incorporating hand and object kinematics, enabling the representation to capture rich interaction dynamics. It then leverages a masked flow matching model that combines auto-regressive temporal reasoning with continuous latent generation, improving temporal coherence. To further enhance generalization, HO-Flow predicts object motions relative to the initial frame, enabling effective pre-training on large-scale synthetic data. Experiments on the GRAB, OakInk, and DexYCB benchmarks demonstrate that HO-Flow achieves state-of-the-art performance in both physical plausibility and motion diversity for interaction motion synthesis.
Abstract（参考訳）: リアルな3Dハンドオブジェクトインタラクション(HOI)を生成することは、コンピュータビジョンとロボット工学の基本的な課題であり、時間的コヒーレンスと高忠実な物理的妥当性の両方を必要とする。既存の手法は、生成のための表現力のある動作表現を学習し、時間的推論を行う能力に限られている。本稿では,テキストと正準3次元オブジェクトから現実的な手オブジェクトの動きシーケンスを合成するフレームワークHO-Flowを提案する。 HO-Flowはまず、手動と物体運動のシーケンスを手動と物体運動学を組み込むことで統一潜在多様体にエンコードし、リッチな相互作用のダイナミクスを表現できるようにする。次に、自己回帰的時間的推論と連続的な潜伏生成を組み合わせ、時間的コヒーレンスを改善するマスク付きフローマッチングモデルを活用する。 HO-Flowは、一般化をさらに促進するため、初期フレームに対する物体の動きを予測し、大規模な合成データに対して効果的な事前学習を可能にする。 GRAB、OakInk、DexYCBベンチマークの実験により、HO-Flowは相互作用運動合成のための物理的可視性と運動の多様性の両方において最先端のパフォーマンスを達成することを示した。

論文の概要: HO-Flow: Generalizable Hand-Object Interaction Generation with Latent Flow Matching

関連論文リスト