Fugu-MT 論文翻訳(概要): Grasp in Gaussians: Fast Monocular Reconstruction of Dynamic Hand-Object Interactions

論文の概要: Grasp in Gaussians: Fast Monocular Reconstruction of Dynamic Hand-Object Interactions

arxiv url: http://arxiv.org/abs/2604.12929v1
Date: Tue, 14 Apr 2026 16:19:12 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-15 19:11:32.557465
Title: Grasp in Gaussians: Fast Monocular Reconstruction of Dynamic Hand-Object Interactions
Title（参考訳）: ガウスのグラフ:動的手動物体相互作用の高速な単分子再構成
Authors: Ayce Idil Aytekin, Xu Chen, Zhengyang Shen, Thabo Beeler, Helge Rhodin, Rishabh Dabral, Christian Theobalt,
Abstract要約: 単一のモノクロビデオから動的3次元物体間相互作用を再構築する頑健な手法であるガウス語でGraspを提示する。我々のキーとなる洞察は、コンパクトなガウス・オブ・ガウス表現を用いて、正確かつ時間的に安定な手動物体の動きを復元できるということである。公開ベンチマークの実験では、GraGは時間的にコヒーレントな手-物体の相互作用を前よりも6.4倍高速に再構成している。
参考スコア（独自算出の注目度）: 66.64605515970584
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present Grasp in Gaussians (GraG), a fast and robust method for reconstructing dynamic 3D hand-object interactions from a single monocular video. Unlike recent approaches that optimize heavy neural representations, our method focuses on tracking the hand and the object efficiently, once initialized from pretrained large models. Our key insight is that accurate and temporally stable hand-object motion can be recovered using a compact Sum-of-Gaussians (SoG) representation, revived from classical tracking literature and integrated with generative Gaussian-based initializations. We initialize object pose and geometry using a video-adapted SAM3D pipeline, then convert the resulting dense Gaussian representation into a lightweight SoG via subsampling. This compact representation enables efficient and fast tracking while preserving geometric fidelity. For the hand, we adopt a complementary strategy: starting from off-the-shelf monocular hand pose initialization, we refine hand motion using simple yet effective 2D joint and depth alignment losses, avoiding per-frame refinement of a detailed 3D hand appearance model while maintaining stable articulation. Extensive experiments on public benchmarks demonstrate that GraG reconstructs temporally coherent hand-object interactions on long sequences 6.4x faster than prior work while improving object reconstruction by 13.4% and reducing hand's per-joint position error by over 65%.
Abstract（参考訳）: In Gaussian (GraG), a fast and robust method for restructing dynamic 3D hand-ject Interaction from a single monocular video。重度ニューラル表現を最適化する最近のアプローチとは異なり、本手法はトレーニング済みの大規模モデルから一度初期化され、手と物体を効率的に追跡することに焦点を当てている。我々の重要な洞察は、コンパクトなサム・オブ・ガウシアン(SoG)表現を用いて正確かつ時間的に安定な手対象運動を復元でき、古典的な追跡文献から復活し、生成ガウシアンに基づく初期化と統合できるということである。我々は、ビデオ適応SAM3Dパイプラインを用いてオブジェクトのポーズと幾何学を初期化し、その結果の密度の高いガウス表現をサブサンプリングにより軽量なSoGに変換する。このコンパクト表現は、幾何学的忠実性を維持しつつ、効率的かつ高速な追跡を可能にする。そこで本研究では,既製の単眼手装具の初期化から始まり,単純な2次元関節と深部アライメントの損失を用いて手の動きを洗練し,より詳細な3次元手の外観モデルのフレームごとの洗練を回避し,安定な調音を維持した。公開ベンチマークにおける広範囲な実験により、GraGは長期のシーケンス上の時間的コヒーレントなハンドオブジェクトの相互作用を前よりも6.4倍高速に再構築し、オブジェクトの再構成を13.4%改善し、手の位置誤差を65%以上削減した。

論文の概要: Grasp in Gaussians: Fast Monocular Reconstruction of Dynamic Hand-Object Interactions

関連論文リスト