Related papers: Simulation-Ready Cluttered Scene Estimation via Physics-aware Joint Shape and Pose Optimization

Simulation-Ready Cluttered Scene Estimation via Physics-aware Joint Shape and Pose Optimization

URL: http://arxiv.org/abs/2602.20150v1
Date: Mon, 23 Feb 2026 18:58:24 GMT
Title: Simulation-Ready Cluttered Scene Estimation via Physics-aware Joint Shape and Pose Optimization
Authors: Wei-Cheng Huang, Jiaheng Han, Xiaohan Ye, Zherong Pan, Kris Hauser,
Abstract summary: Estimating simulation-ready scenes from real-world observations is crucial for downstream planning and policy learning tasks.<n>Existing methods struggle in cluttered environments.<n>We propose a unified optimization-based formulation for real-to-sim scene estimation.
Score: 27.083888910311984
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Estimating simulation-ready scenes from real-world observations is crucial for downstream planning and policy learning tasks. Regretfully, existing methods struggle in cluttered environments, often exhibiting prohibitive computational cost, poor robustness, and restricted generality when scaling to multiple interacting objects. We propose a unified optimization-based formulation for real-to-sim scene estimation that jointly recovers the shapes and poses of multiple rigid objects under physical constraints. Our method is built on two key technical innovations. First, we leverage the recently introduced shape-differentiable contact model, whose global differentiability permits joint optimization over object geometry and pose while modeling inter-object contacts. Second, we exploit the structured sparsity of the augmented Lagrangian Hessian to derive an efficient linear system solver whose computational cost scales favorably with scene complexity. Building on this formulation, we develop an end-to-end real-to-sim scene estimation pipeline that integrates learning-based object initialization, physics-constrained joint shape-pose optimization, and differentiable texture refinement. Experiments on cluttered scenes with up to 5 objects and 22 convex hulls demonstrate that our approach robustly reconstructs physically valid, simulation-ready object shapes and poses.

Related papers

ArtPro: Self-Supervised Articulated Object Reconstruction with Adaptive Integration of Mobility Proposals [18.50624014637526]
ArtPro is a novel self-supervised framework that introduces adaptive integration of proposals.<n>We show that ArtPro achieves robust reconstruction of complex multi-part objects, significantly outperforming existing methods in accuracy and stability.
arXiv Detail & Related papers (2026-02-26T06:35:23Z)
Mirage2Matter: A Physically Grounded Gaussian World Model from Video [87.9732484393686]
We present Simulate Anything, a graphics-driven world modeling and simulation framework.<n>Our approach reconstructs real-world environments into a photorealistic scene representation using 3D Gaussian Splatting (3DGS)<n>We then leverage generative models to recover a physically realistic representation and integrate it into a simulation environment via a precision calibration target.
arXiv Detail & Related papers (2026-01-24T07:43:57Z)
Kinematify: Open-Vocabulary Synthesis of High-DoF Articulated Objects [59.51185639557874]
We introduce Kinematify, an automated framework that synthesizes articulated objects directly from arbitrary RGB images or textual descriptions.<n>Our method addresses two core challenges: (i) inferring kinematic topologies for high-DoF objects and (ii) estimating joint parameters from static geometry.
arXiv Detail & Related papers (2025-11-03T07:21:42Z)
GaussianArt: Unified Modeling of Geometry and Motion for Articulated Objects [4.717906057951389]
We introduce a unified representation that jointly models geometry and motion using articulated 3D Gaussians.<n>This formulation improves robustness in motion decomposition and supports articulated objects with up to 20 parts.<n>We show that our method consistently achieves superior accuracy in part-level geometry reconstruction and motion estimation across a broad range of object types.
arXiv Detail & Related papers (2025-08-20T17:59:08Z)
ArtGS: Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting [66.29782808719301]
Building articulated objects is a key challenge in computer vision.<n>Existing methods often fail to effectively integrate information across different object states.<n>We introduce ArtGS, a novel approach that leverages 3D Gaussians as a flexible and efficient representation.
arXiv Detail & Related papers (2025-02-26T10:25:32Z)
DeepSimHO: Stable Pose Estimation for Hand-Object Interaction via Physics Simulation [81.11585774044848]
We present DeepSimHO, a novel deep-learning pipeline that combines forward physics simulation and backward gradient approximation with a neural network. Our method noticeably improves the stability of the estimation and achieves superior efficiency over test-time optimization.
arXiv Detail & Related papers (2023-10-11T05:34:36Z)
UniQuadric: A SLAM Backend for Unknown Rigid Object 3D Tracking and Light-Weight Modeling [7.626461564400769]
We propose a novel SLAM backend that unifies ego-motion tracking, rigid object motion tracking, and modeling. Our system showcases the potential application of object perception in complex dynamic scenes.
arXiv Detail & Related papers (2023-09-29T07:50:09Z)
Near-realtime Facial Animation by Deep 3D Simulation Super-Resolution [7.14576106770047]
We present a neural network-based simulation framework that can efficiently and realistically enhance a facial performance produced by a low-cost, realtime physics-based simulation. We use face animation as an exemplar of such a simulation domain, where creating this semantic congruence is achieved by simply dialing in the same muscle actuation controls and skeletal pose in the two simulators. Our proposed neural network super-resolution framework generalizes from this training set to unseen expressions, compensates for modeling discrepancies between the two simulations due to limited resolution or cost-cutting approximations in the real-time variant, and does not require any semantic descriptors or parameters to
arXiv Detail & Related papers (2023-05-05T00:09:24Z)
Real-time Pose and Shape Reconstruction of Two Interacting Hands With a Single Depth Camera [79.41374930171469]
We present a novel method for real-time pose and shape reconstruction of two strongly interacting hands. Our approach combines an extensive list of favorable properties, namely it is marker-less. We show state-of-the-art results in scenes that exceed the complexity level demonstrated by previous work.
arXiv Detail & Related papers (2021-06-15T11:39:49Z)
Scalable Differentiable Physics for Learning and Control [99.4302215142673]
Differentiable physics is a powerful approach to learning and control problems that involve physical objects and environments. We develop a scalable framework for differentiable physics that can support a large number of objects and their interactions.
arXiv Detail & Related papers (2020-07-04T19:07:51Z)
Occlusion resistant learning of intuitive physics from videos [52.25308231683798]
Key ability for artificial systems is to understand physical interactions between objects, and predict future outcomes of a situation. This ability, often referred to as intuitive physics, has recently received attention and several methods were proposed to learn these physical rules from video sequences.
arXiv Detail & Related papers (2020-04-30T19:35:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.