ProJo4D: Progressive Joint Optimization for Sparse-View Inverse Physics Estimation
- URL: http://arxiv.org/abs/2506.05317v2
- Date: Fri, 06 Jun 2025 22:04:04 GMT
- Title: ProJo4D: Progressive Joint Optimization for Sparse-View Inverse Physics Estimation
- Authors: Daniel Rho, Jun Myeong Choi, Biswadip Dey, Roni Sengupta,
- Abstract summary: In inverse problem of estimating physics from visual data, still remains challenging.<n>We propose ProJo4D, a progressive joint optimization framework that gradually increases the set parameters guided by sensitivity.<n>We show that ProJo4D outperforms prior work in 4D future state, novel rendering of future state, and material parameter estimation.
- Score: 4.818571559544214
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural rendering has made significant strides in 3D reconstruction and novel view synthesis. With the integration with physics, it opens up new applications. The inverse problem of estimating physics from visual data, however, still remains challenging, limiting its effectiveness for applications like physically accurate digital twin creation in robotics and XR. Existing methods that incorporate physics into neural rendering frameworks typically require dense multi-view videos as input, making them impractical for scalable, real-world use. When presented with sparse multi-view videos, the sequential optimization strategy used by existing approaches introduces significant error accumulation, e.g., poor initial 3D reconstruction leads to bad material parameter estimation in subsequent stages. Instead of sequential optimization, directly optimizing all parameters at the same time also fails due to the highly non-convex and often non-differentiable nature of the problem. We propose ProJo4D, a progressive joint optimization framework that gradually increases the set of jointly optimized parameters guided by their sensitivity, leading to fully joint optimization over geometry, appearance, physical state, and material property. Evaluations on PAC-NeRF and Spring-Gaus datasets show that ProJo4D outperforms prior work in 4D future state prediction, novel view rendering of future state, and material parameter estimation, demonstrating its effectiveness in physically grounded 4D scene understanding. For demos, please visit the project webpage: https://daniel03c1.github.io/ProJo4D/
Related papers
- FastPhysGS: Accelerating Physics-based Dynamic 3DGS Simulation via Interior Completion and Adaptive Optimization [56.17833729527066]
We propose FastPhysGS, a framework for physics-based dynamic 3DGS simulation.<n>FastPhysGS achieves high-fidelity physical simulation in 1 minute using only 7 GB runtime memory.
arXiv Detail & Related papers (2026-02-02T07:00:42Z) - EVolSplat4D: Efficient Volume-based Gaussian Splatting for 4D Urban Scene Synthesis [43.898895514609286]
EvolSplat4D is a feed-forward framework that moves beyond existing per-pixel paradigms by unifying volume-based and pixel-based Gaussian prediction.<n>We show that EvolSplat4D reconstructs both static and dynamic environments with superior accuracy and consistency, outperforming both per-scene optimization and state-of-the-art feed-forward baselines.
arXiv Detail & Related papers (2026-01-22T13:39:29Z) - Motion4D: Learning 3D-Consistent Motion and Semantics for 4D Scene Understanding [54.859943475818234]
We present Motion4D, a novel framework that integrates 2D priors from foundation models into a unified 4D Gaussian Splatting representation.<n>Our method features a two-part iterative optimization framework: 1) Sequential optimization, which updates motion and semantic fields in consecutive stages to maintain local consistency, and 2) Global optimization, which jointly refines all attributes for long-term coherence.<n>Our method significantly outperforms both 2D foundation models and existing 3D-based approaches across diverse scene understanding tasks, including point-based tracking, video object segmentation, and novel view synthesis.
arXiv Detail & Related papers (2025-12-03T09:32:56Z) - Flux4D: Flow-based Unsupervised 4D Reconstruction [30.764886648248222]
Reconstructing large-scale dynamic scenes from visual observations is a fundamental challenge in computer vision.<n>We introduce Flux4D, a simple and scalable framework for 4D reconstruction of large-scale dynamic scenes.<n>Our approach enables efficient reconstruction of dynamic scenes within seconds, scales effectively to large datasets, and generalizes well to unseen environments.
arXiv Detail & Related papers (2025-12-02T20:28:45Z) - Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models [79.06910348413861]
We introduce Diff4Splat, a feed-forward method that synthesizes controllable and explicit 4D scenes from a single image.<n>Given a single input image, a camera trajectory, and an optional text prompt, Diff4Splat directly predicts a deformable 3D Gaussian field that encodes appearance, geometry, and motion.
arXiv Detail & Related papers (2025-11-01T11:16:25Z) - PhysGM: Large Physical Gaussian Model for Feed-Forward 4D Synthesis [37.21119648359889]
PhysGM is a feed-forward framework that jointly predicts a 3D Gaussian representation and its physical properties from a single image.<n>Our method effectively generates high-fidelity 4D simulations from a single image in one minute.
arXiv Detail & Related papers (2025-08-19T15:10:30Z) - E3D-Bench: A Benchmark for End-to-End 3D Geometric Foundation Models [78.1674905950243]
We present the first comprehensive benchmark for 3D geometric foundation models (GFMs)<n>GFMs directly predict dense 3D representations in a single feed-forward pass, eliminating the need for slow or unavailable precomputed camera parameters.<n>We evaluate 16 state-of-the-art GFMs, revealing their strengths and limitations across tasks and domains.<n>All code, evaluation scripts, and processed data will be publicly released to accelerate research in 3D spatial intelligence.
arXiv Detail & Related papers (2025-06-02T17:53:09Z) - QuickSplat: Fast 3D Surface Reconstruction via Learned Gaussian Initialization [69.50126552763157]
Surface reconstruction is fundamental to computer vision and graphics, enabling applications in 3D modeling, mixed reality, robotics, and more.<n>Existing approaches based on rendering obtain promising results, but optimize on a per-scene basis, resulting in a slow optimization that can struggle to model textureless regions.<n>We introduce QuickSplat, which learns data-driven priors to generate dense initializations for 2D gaussian splatting optimization of large-scale indoor scenes.
arXiv Detail & Related papers (2025-05-08T18:43:26Z) - Predict-Optimize-Distill: A Self-Improving Cycle for 4D Object Understanding [26.65605206605145]
We introduce Predict-Distill (POD), a self-improving framework that interleaves prediction and optimization.<n>POD iteratively trains a neural network to predict local part poses from RGB frames.<n>We evaluate POD on 14 real-world and 5 synthetic objects with various joint types.
arXiv Detail & Related papers (2025-04-24T11:03:15Z) - EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis [61.1662426227688]
Existing NeRF and 3DGS-based methods show promising results in achieving photorealistic renderings but require slow, per-scene optimization.<n>We introduce EVolSplat, an efficient 3D Gaussian Splatting model for urban scenes that works in a feed-forward manner.
arXiv Detail & Related papers (2025-03-26T02:47:27Z) - RoDyGS: Robust Dynamic Gaussian Splatting for Casual Videos [39.384910552854926]
We present RoDyGS, an optimization pipeline for dynamic Gaussian Splatting from casual videos.<n>It effectively learns motion and underlying geometry of scenes by separating dynamic and static primitives.<n>We also introduce a comprehensive benchmark, Kubric-MRig, that provides extensive camera and object motion along with simultaneous multi-view captures.
arXiv Detail & Related papers (2024-12-04T07:02:49Z) - GPS-Gaussian+: Generalizable Pixel-wise 3D Gaussian Splatting for Real-Time Human-Scene Rendering from Sparse Views [67.34073368933814]
We propose a generalizable Gaussian Splatting approach for high-resolution image rendering under a sparse-view camera setting.
We train our Gaussian parameter regression module on human-only data or human-scene data, jointly with a depth estimation module to lift 2D parameter maps to 3D space.
Experiments on several datasets demonstrate that our method outperforms state-of-the-art methods while achieving an exceeding rendering speed.
arXiv Detail & Related papers (2024-11-18T08:18:44Z) - Self-Calibrating 4D Novel View Synthesis from Monocular Videos Using Gaussian Splatting [14.759265492381509]
We propose a novel approach that learns a high-fidelity 4D GS scene representation with self-calibration of camera parameters.<n>It includes the extraction of 2D point features that robustly represent 3D structure.<n>Results show significant improvements over state-of-the-art methods for 4D novel view synthesis.
arXiv Detail & Related papers (2024-06-03T06:52:35Z) - EG4D: Explicit Generation of 4D Object without Score Distillation [105.63506584772331]
DG4D is a novel framework that generates high-quality and consistent 4D assets without score distillation.
Our framework outperforms the baselines in generation quality by a considerable margin.
arXiv Detail & Related papers (2024-05-28T12:47:22Z) - InstantSplat: Sparse-view Gaussian Splatting in Seconds [91.77050739918037]
We introduce InstantSplat, a novel approach for addressing sparse-view 3D scene reconstruction at lightning-fast speed.<n>InstantSplat employs a self-supervised framework that optimize 3D scene representation and camera poses.<n>It achieves an acceleration of over 30x in reconstruction and improves visual quality (SSIM) from 0.3755 to 0.7624 compared to traditional SfM with 3D-GS.
arXiv Detail & Related papers (2024-03-29T17:29:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.