Related papers: Picasso: Holistic Scene Reconstruction with Physics-Constrained Sampling

Picasso: Holistic Scene Reconstruction with Physics-Constrained Sampling

URL: http://arxiv.org/abs/2602.08058v1
Date: Sun, 08 Feb 2026 17:04:54 GMT
Title: Picasso: Holistic Scene Reconstruction with Physics-Constrained Sampling
Authors: Xihang Yu, Rajat Talak, Lorenzo Shaikewitz, Luca Carlone,
Abstract summary: We build a physics-constrained reconstruction pipeline that builds multi-object scene reconstructions by considering geometry, non-penetration, and physics.<n>We propose the Picasso dataset, a collection of 10 contact-rich real-world scenes with ground truth annotations.
Score: 16.06956036371399
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the presence of occlusions and measurement noise, geometrically accurate scene reconstructions -- which fit the sensor data -- can still be physically incorrect. For instance, when estimating the poses and shapes of objects in the scene and importing the resulting estimates into a simulator, small errors might translate to implausible configurations including object interpenetration or unstable equilibrium. This makes it difficult to predict the dynamic behavior of the scene using a digital twin, an important step in simulation-based planning and control of contact-rich behaviors. In this paper, we posit that object pose and shape estimation requires reasoning holistically over the scene (instead of reasoning about each object in isolation), accounting for object interactions and physical plausibility. Towards this goal, our first contribution is Picasso, a physics-constrained reconstruction pipeline that builds multi-object scene reconstructions by considering geometry, non-penetration, and physics. Picasso relies on a fast rejection sampling method that reasons over multi-object interactions, leveraging an inferred object contact graph to guide samples. Second, we propose the Picasso dataset, a collection of 10 contact-rich real-world scenes with ground truth annotations, as well as a metric to quantify physical plausibility, which we open-source as part of our benchmark. Finally, we provide an extensive evaluation of Picasso on our newly introduced dataset and on the YCB-V dataset, and show it largely outperforms the state of the art while providing reconstructions that are both physically plausible and more aligned with human intuition.

Related papers

Simulation-Ready Cluttered Scene Estimation via Physics-aware Joint Shape and Pose Optimization [27.083888910311984]
Estimating simulation-ready scenes from real-world observations is crucial for downstream planning and policy learning tasks.<n>Existing methods struggle in cluttered environments.<n>We propose a unified optimization-based formulation for real-to-sim scene estimation.
arXiv Detail & Related papers (2026-02-23T18:58:24Z)
DecoupledGaussian: Object-Scene Decoupling for Physics-Based Interaction [21.80091691062415]
We present DecoupledGaussian, a novel system that decouples static objects from their contacted surfaces captured in-the-wild videos.<n>We validate DecoupledGaussian through a comprehensive user study and quantitative benchmarks.<n>This system enhances digital interaction with objects and scenes in real-world environments, benefiting industries such as VR, robotics, and autonomous driving.
arXiv Detail & Related papers (2025-03-07T14:54:54Z)
DVMNet++: Rethinking Relative Pose Estimation for Unseen Objects [59.51874686414509]
Existing approaches typically predict 3D translation utilizing the ground-truth object bounding box and approximate 3D rotation with a large number of discrete hypotheses.<n>We present a Deep Voxel Matching Network (DVMNet++) that computes the relative object pose in a single pass.<n>Our approach delivers more accurate relative pose estimates for novel objects at a lower computational cost compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-03-20T15:41:32Z)
ICGNet: A Unified Approach for Instance-Centric Grasping [42.92991092305974]
We introduce an end-to-end architecture for object-centric grasping. We show the effectiveness of the proposed method by extensively evaluating it against state-of-the-art methods on synthetic datasets.
arXiv Detail & Related papers (2024-01-18T12:41:41Z)
DeepSimHO: Stable Pose Estimation for Hand-Object Interaction via Physics Simulation [81.11585774044848]
We present DeepSimHO, a novel deep-learning pipeline that combines forward physics simulation and backward gradient approximation with a neural network. Our method noticeably improves the stability of the estimation and achieves superior efficiency over test-time optimization.
arXiv Detail & Related papers (2023-10-11T05:34:36Z)
The Drunkard's Odometry: Estimating Camera Motion in Deforming Scenes [79.00228778543553]
This dataset is the first large set of exploratory camera trajectories with ground truth inside 3D scenes. Simulations in realistic 3D buildings lets us obtain a vast amount of data and ground truth labels. We present a novel deformable odometry method, dubbed the Drunkard's Odometry, which decomposes optical flow estimates into rigid-body camera motion.
arXiv Detail & Related papers (2023-06-29T13:09:31Z)
Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels. Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions. We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z)
GeoSim: Photorealistic Image Simulation with Geometry-Aware Composition [81.24107630746508]
We present GeoSim, a geometry-aware image composition process that synthesizes novel urban driving scenes. We first build a diverse bank of 3D objects with both realistic geometry and appearance from sensor data. The resulting synthetic images are photorealistic, traffic-aware, and geometrically consistent, allowing image simulation to scale to complex use cases.
arXiv Detail & Related papers (2021-01-16T23:00:33Z)
Occlusion resistant learning of intuitive physics from videos [52.25308231683798]
Key ability for artificial systems is to understand physical interactions between objects, and predict future outcomes of a situation. This ability, often referred to as intuitive physics, has recently received attention and several methods were proposed to learn these physical rules from video sequences.
arXiv Detail & Related papers (2020-04-30T19:35:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.