Fugu-MT 論文翻訳(概要): Fast 4D Mesh Generation by Spatio-Temporal Attention Chains

論文の概要: Fast 4D Mesh Generation by Spatio-Temporal Attention Chains

arxiv url: http://arxiv.org/abs/2605.19786v1
Date: Tue, 19 May 2026 12:51:50 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-20 15:03:09.33893
Title: Fast 4D Mesh Generation by Spatio-Temporal Attention Chains
Title（参考訳）: 時空間アテンションチェーンによる高速4次元メッシュ生成
Authors: Dvir Samuel, Yuval Atzmon, Gal Chechik, Yoni Kasten,
Abstract要約: 本研究では,時間的対応性を改善しつつ,4次元メッシュ生成を高速化する学習自由アプローチを提案する。空間と時間にまたがって情報を伝達する、時空間注意連鎖と呼ばれる一般的なフレームワークを活用する。最先端技術と比較すると,提案手法は9秒で4Dメッシュを生成し,高品質な結果が得られるとともに,13倍の高速化を実現している。
参考スコア（独自算出の注目度）: 46.88232446844325
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: 4D mesh generation has recently emerged as a powerful paradigm for recovering dynamic 3D structure from videos, but existing methods remain slow, computationally expensive, and difficult to scale to longer sequences. We introduce a training-free approach that accelerates 4D mesh generation while improving temporal correspondence quality. Our key observation is that temporal correspondences emerge inside a 4D backbone long before its generated meshes become visually accurate. We exploit this with a general framework we call Spatio-Temporal Attention Chain which propagates information across space and time. Starting from vertices on an anchor mesh, the chain maps vertices to latent tokens. It then follows temporal correspondences in latent space, and recovers frame-specific vertices through latent-to-vertex attention. This design avoids expensive explicit matching while preserving anchor mesh details and thereby improving dynamic mesh geometry and temporal consistency. Compared to state-of-the-art, our method generates a 4D mesh in 9 seconds, achieving a $13\times$ speedup while producing higher-quality results. Moreover, our approach scales to videos up to $16\times$ longer without degrading mesh quality. Beyond generation, the improved correspondences enable competitive zero-shot performance on two downstream tasks: 2D object tracking and 4D tracking. We further show that our framework enables reliable camera estimation, a capability not supported by prior 4D mesh generation methods.
Abstract（参考訳）: 4Dメッシュ生成はビデオから動的3D構造を復元する強力なパラダイムとして最近登場したが、既存の手法は遅く、計算コストが高く、長いシーケンスにスケールすることが難しいままである。本研究では,時間的対応性を改善しつつ,4次元メッシュ生成を高速化する学習自由アプローチを提案する。我々の重要な観察は、生成したメッシュが視覚的に正確になるずっと前に、時間的対応が4Dバックボーン内に現れることである。我々は、空間と時間にわたって情報を伝達する、時空間注意連鎖(Spatio-Temporal Attention Chain)と呼ばれる一般的なフレームワークでこれを活用します。アンカーメッシュ上の頂点から始めて、チェーンは頂点を潜在トークンにマップする。その後、ラテント空間における時間的対応に従い、ラテントから頂点への注意を通してフレーム固有の頂点を復元する。この設計では、アンカーメッシュの詳細を保存しながら、高価な明示的なマッチングを回避し、ダイナミックメッシュの幾何学と時間的一貫性を改善している。最先端技術と比較すると,提案手法は9秒で4Dメッシュを生成し,高品質な結果が得られるとともに,13\times$の高速化を実現している。さらに、当社のアプローチでは、メッシュの品質を劣化させることなく、最大16\times$の動画にスケールアップしています。世代を超えて、改良された対応により、2Dオブジェクトトラッキングと4Dトラッキングという2つの下流タスクにおいて、競争力のあるゼロショットのパフォーマンスを実現している。さらに,従来の4Dメッシュ生成手法ではサポートされていない,信頼性の高いカメラ推定が可能であることを述べる。

論文の概要: Fast 4D Mesh Generation by Spatio-Temporal Attention Chains

関連論文リスト