Fugu-MT 論文翻訳(概要): Flux4D: Flow-based Unsupervised 4D Reconstruction

論文の概要: Flux4D: Flow-based Unsupervised 4D Reconstruction

arxiv url: http://arxiv.org/abs/2512.03210v1
Date: Tue, 02 Dec 2025 20:28:45 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-04 20:02:55.012379
Title: Flux4D: Flow-based Unsupervised 4D Reconstruction
Title（参考訳）: Flux4D:フローベース非教師なし4D再構成
Authors: Jingkang Wang, Henry Che, Yun Chen, Ze Yang, Lily Goli, Sivabalan Manivasagam, Raquel Urtasun,
Abstract要約: 視覚的な観察から大規模なダイナミックシーンを再構築することは、コンピュータビジョンの根本的な課題である。大規模動的シーンの4次元再構成のためのシンプルでスケーラブルなフレームワークであるFlux4Dを紹介する。提案手法は,数秒以内の動的シーンの効率的な再構築を可能にし,大規模データセットに効果的にスケールし,目に見えない環境によく適応する。
参考スコア（独自算出の注目度）: 30.764886648248222
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reconstructing large-scale dynamic scenes from visual observations is a fundamental challenge in computer vision, with critical implications for robotics and autonomous systems. While recent differentiable rendering methods such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have achieved impressive photorealistic reconstruction, they suffer from scalability limitations and require annotations to decouple actor motion. Existing self-supervised methods attempt to eliminate explicit annotations by leveraging motion cues and geometric priors, yet they remain constrained by per-scene optimization and sensitivity to hyperparameter tuning. In this paper, we introduce Flux4D, a simple and scalable framework for 4D reconstruction of large-scale dynamic scenes. Flux4D directly predicts 3D Gaussians and their motion dynamics to reconstruct sensor observations in a fully unsupervised manner. By adopting only photometric losses and enforcing an "as static as possible" regularization, Flux4D learns to decompose dynamic elements directly from raw data without requiring pre-trained supervised models or foundational priors simply by training across many scenes. Our approach enables efficient reconstruction of dynamic scenes within seconds, scales effectively to large datasets, and generalizes well to unseen environments, including rare and unknown objects. Experiments on outdoor driving datasets show Flux4D significantly outperforms existing methods in scalability, generalization, and reconstruction quality.
Abstract（参考訳）: 視覚的な観察から大規模なダイナミックシーンを再構築することは、ロボット工学や自律システムにとって重要な意味を持つコンピュータビジョンの基本的な課題である。ニューラルレイディアンス・フィールド(NeRF)や3Dガウス・スプレイティング(3DGS)のような最近の微分可能なレンダリング手法は、印象的なフォトリアリスティックな再構成を実現しているが、スケーラビリティの限界に悩まされ、アクターの動きを分離するためにアノテーションを必要とする。既存の自己監督手法は、動きの手がかりや幾何学的先行点を活用することで明示的なアノテーションを排除しようとするが、それはシーンごとの最適化とハイパーパラメータチューニングに対する感度に制約される。本稿では,大規模動的シーンの4次元再構成のためのシンプルでスケーラブルなフレームワークであるFlux4Dを紹介する。 Flux4Dは3Dガウスとその運動力学を直接予測し、完全に教師なしの方法でセンサー観測を再構築する。光学的損失のみを採用して“可能な限り静的”な正規化を強制することにより、Flux4Dは、トレーニング済みの教師付きモデルや基礎的な事前処理を必要とせずに、生データから直接動的要素を分解することを学ぶ。提案手法により,数秒以内の動的シーンの効率的な再構築が可能となり,大規模なデータセットに効果的にスケールでき,希少なオブジェクトや未知のオブジェクトを含む未知の環境によく適応できる。屋外運転データセットの実験では、Flux4Dはスケーラビリティ、一般化、再構築品質において既存の手法よりも大幅に優れていた。

論文の概要: Flux4D: Flow-based Unsupervised 4D Reconstruction

関連論文リスト