Fugu-MT 論文翻訳(概要): 4DEquine: Disentangling Motion and Appearance for 4D Equine Reconstruction from Monocular Video

論文の概要: 4DEquine: Disentangling Motion and Appearance for 4D Equine Reconstruction from Monocular Video

arxiv url: http://arxiv.org/abs/2603.10125v1
Date: Tue, 10 Mar 2026 18:01:10 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-12 16:22:32.642463
Title: 4DEquine: Disentangling Motion and Appearance for 4D Equine Reconstruction from Monocular Video
Title（参考訳）: 4DEquine:モノクルビデオからの4Dエクイン再構成のための遠心運動と外見
Authors: Jin Lyu, Liang An, Pujin Cheng, Yebin Liu, Xiaoying Tang,
Abstract要約: 本研究では、4次元再構成問題を動的運動再構成と静的外観再構成の2つのサブプロブレムに分離する4DEquineと呼ばれる新しいフレームワークを提案する。動画からスムーズかつピクセルアライメントなポーズと形状のシーケンスを復元するために,動作を最適化したシンプルな,効果的かつ効果的なトランスフォーマーを導入する。本研究では,高忠実でアニマタブルな3Dガウスアバターを単一画像から再構成するフィードフォワードネットワークを設計する。
参考スコア（独自算出の注目度）: 40.23548336607091
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: 4D reconstruction of equine family (e.g. horses) from monocular video is important for animal welfare. Previous mainstream 4D animal reconstruction methods require joint optimization of motion and appearance over a whole video, which is time-consuming and sensitive to incomplete observation. In this work, we propose a novel framework called 4DEquine by disentangling the 4D reconstruction problem into two sub-problems: dynamic motion reconstruction and static appearance reconstruction. For motion, we introduce a simple yet effective spatio-temporal transformer with a post-optimization stage to regress smooth and pixel-aligned pose and shape sequences from video. For appearance, we design a novel feed-forward network that reconstructs a high-fidelity, animatable 3D Gaussian avatar from as few as a single image. To assist training, we create a large-scale synthetic motion dataset, VarenPoser, which features high-quality surface motions and diverse camera trajectories, as well as a synthetic appearance dataset, VarenTex, comprising realistic multi-view images generated through multi-view diffusion. While training only on synthetic datasets, 4DEquine achieves state-of-the-art performance on real-world APT36K and AiM datasets, demonstrating the superiority of 4DEquine and our new datasets for both geometry and appearance reconstruction. Comprehensive ablation studies validate the effectiveness of both the motion and appearance reconstruction network. Project page: https://luoxue-star.github.io/4DEquine_Project_Page/.
Abstract（参考訳）: 動物福祉には, モノクローナルビデオからの馬の4次元再構成が重要である。以前の主流の4D動物再構成法では、動画全体の動きと外観を共同で最適化する必要があるが、これは時間がかかり不完全な観察に敏感である。本研究では,4次元再構成問題を動的運動再構成と静的外観再構成の2つのサブプロブレムに分割することで,4DEquineと呼ばれる新しいフレームワークを提案する。動画からスムーズかつピクセルアライメントなポーズと形状のシーケンスを復元する,後最適化段階のシンプルな時空間変圧器を提案する。本研究では,高忠実でアニマタブルな3Dガウスアバターを単一画像から再構成するフィードフォワードネットワークを設計する。トレーニングを支援するために、高品質な表面運動と多様なカメラ軌道を特徴とする大規模な合成モーションデータセットVarenPoserと、マルチビュー拡散によって生成された現実的なマルチビュー画像からなる合成外観データセットVarenTexを作成する。 4DEquineは、合成データセットのみをトレーニングしながら、実世界のAPT36KおよびAiMデータセット上で最先端のパフォーマンスを実現し、4DEquineと我々の新しいデータセットのジオメトリおよび外観再構成における優位性を実証した。総合的アブレーション研究は、運動と外観再構成ネットワークの有効性を検証した。プロジェクトページ:https://luoxue-star.github.io/4DEquine_Project_Page/。

論文の概要: 4DEquine: Disentangling Motion and Appearance for 4D Equine Reconstruction from Monocular Video

関連論文リスト