DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness
- URL: http://arxiv.org/abs/2503.22677v2
- Date: Thu, 28 Aug 2025 01:47:51 GMT
- Title: DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness
- Authors: Ruining Li, Chuanxia Zheng, Christian Rupprecht, Andrea Vedaldi,
- Abstract summary: Most 3D object generators prioritize aesthetic quality, often neglecting the physical constraints necessary for practical applications.<n>Previous approaches to generating stable 3D objects relied on differentiable physics simulators to optimize geometry at test time.<n>This framework leverages feedback from a (non-differentiable) simulator to increase the likelihood that the 3D generator directly outputs stable 3D objects.
- Score: 79.4785166021062
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most 3D object generators prioritize aesthetic quality, often neglecting the physical constraints necessary for practical applications. One such constraint is that a 3D object should be self-supporting, i.e., remain balanced under gravity. Previous approaches to generating stable 3D objects relied on differentiable physics simulators to optimize geometry at test time, which is slow, unstable, and prone to local optima. Inspired by the literature on aligning generative models with external feedback, we propose Direct Simulation Optimization (DSO). This framework leverages feedback from a (non-differentiable) simulator to increase the likelihood that the 3D generator directly outputs stable 3D objects. We construct a dataset of 3D objects labeled with stability scores obtained from the physics simulator. This dataset enables fine-tuning of the 3D generator using the stability score as an alignment metric, via direct preference optimization (DPO) or direct reward optimization (DRO) - a novel objective we introduce to align diffusion models without requiring pairwise preferences. Our experiments demonstrate that the fine-tuned feed-forward generator, using either the DPO or DRO objective, is significantly faster and more likely to produce stable objects than test-time optimization. Notably, the DSO framework functions even without any ground-truth 3D objects for training, allowing the 3D generator to self-improve by automatically collecting simulation feedback on its own outputs.
Related papers
- FastPhysGS: Accelerating Physics-based Dynamic 3DGS Simulation via Interior Completion and Adaptive Optimization [56.17833729527066]
We propose FastPhysGS, a framework for physics-based dynamic 3DGS simulation.<n>FastPhysGS achieves high-fidelity physical simulation in 1 minute using only 7 GB runtime memory.
arXiv Detail & Related papers (2026-02-02T07:00:42Z) - Particulate: Feed-Forward 3D Object Articulation [89.78788418174946]
Particulate is a feed-forward approach that, given a single static 3D mesh of an everyday object, directly infers all attributes of the underlying articulated structure.<n>We train the network end-to-end on a diverse collection of articulated 3D assets from public datasets.<n>During inference, Particulate lifts the network's feed-forward prediction to the input mesh, yielding a fully articulated 3D model in seconds.
arXiv Detail & Related papers (2025-12-12T18:59:51Z) - 3DID: Direct 3D Inverse Design for Aerodynamics with Physics-Aware Optimization [53.40777496077015]
Inverse design aims to design the input variables of a physical system to optimize a specified objective function.<n>In 3D domains, the design space grows exponentially, rendering exhaustive grid-based searches infeasible.<n>We propose a 3D Inverse Design framework that directly navigates the 3D design space by coupling a continuous latent representation with a physics-aware optimization strategy.
arXiv Detail & Related papers (2025-12-06T13:09:03Z) - PAT3D: Physics-Augmented Text-to-3D Scene Generation [47.18949891825537]
PAT3D generates 3D objects, infers their spatial relations, and organizes them into a hierarchical scene tree.<n>A differentiable rigid-body simulator ensures realistic object interactions under gravity.<n>Experiments demonstrate that PAT3D substantially outperforms prior approaches in physical plausibility, semantic consistency, and visual quality.
arXiv Detail & Related papers (2025-11-26T23:23:58Z) - PhysGM: Large Physical Gaussian Model for Feed-Forward 4D Synthesis [37.21119648359889]
PhysGM is a feed-forward framework that jointly predicts a 3D Gaussian representation and its physical properties from a single image.<n>Our method effectively generates high-fidelity 4D simulations from a single image in one minute.
arXiv Detail & Related papers (2025-08-19T15:10:30Z) - R3D2: Realistic 3D Asset Insertion via Diffusion for Autonomous Driving Simulation [78.26308457952636]
This paper introduces R3D2, a lightweight, one-step diffusion model designed to overcome limitations in autonomous driving simulation.<n>It enables realistic insertion of complete 3D assets into existing scenes by generating plausible rendering effects-such as shadows and consistent lighting-in real time.<n>We show that R3D2 significantly enhances the realism of inserted assets, enabling use-cases like text-to-3D asset insertion and cross-scene/dataset object transfer.
arXiv Detail & Related papers (2025-06-09T14:50:19Z) - DSplats: 3D Generation by Denoising Splats-Based Multiview Diffusion Models [67.50989119438508]
We introduce DSplats, a novel method that directly denoises multiview images using Gaussian-based Reconstructors to produce realistic 3D assets.
Our experiments demonstrate that DSplats not only produces high-quality, spatially consistent outputs, but also sets a new standard in single-image to 3D reconstruction.
arXiv Detail & Related papers (2024-12-11T07:32:17Z) - Enhancing Single Image to 3D Generation using Gaussian Splatting and Hybrid Diffusion Priors [17.544733016978928]
3D object generation from a single image involves estimating the full 3D geometry and texture of unseen views from an unposed RGB image captured in the wild.
Recent advancements in 3D object generation have introduced techniques that reconstruct an object's 3D shape and texture.
We propose bridging the gap between 2D and 3D diffusion models to address this limitation.
arXiv Detail & Related papers (2024-10-12T10:14:11Z) - DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data [50.164670363633704]
We present DIRECT-3D, a diffusion-based 3D generative model for creating high-quality 3D assets from text prompts.
Our model is directly trained on extensive noisy and unaligned in-the-wild' 3D assets.
We achieve state-of-the-art performance in both single-class generation and text-to-3D generation.
arXiv Detail & Related papers (2024-06-06T17:58:15Z) - Atlas3D: Physically Constrained Self-Supporting Text-to-3D for Simulation and Fabrication [50.541882834405946]
We introduce Atlas3D, an automatic and easy-to-implement text-to-3D method.
Our approach combines a novel differentiable simulation-based loss function with physically inspired regularization.
We verify Atlas3D's efficacy through extensive generation tasks and validate the resulting 3D models in both simulated and real-world environments.
arXiv Detail & Related papers (2024-05-28T18:33:18Z) - ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance [76.7746870349809]
We present ComboVerse, a 3D generation framework that produces high-quality 3D assets with complex compositions by learning to combine multiple models.
Our proposed framework emphasizes spatial alignment of objects, compared with standard score distillation sampling.
arXiv Detail & Related papers (2024-03-19T03:39:43Z) - L3GO: Language Agents with Chain-of-3D-Thoughts for Generating
Unconventional Objects [53.4874127399702]
We propose a language agent with chain-of-3D-thoughts (L3GO), an inference-time approach that can reason about part-based 3D mesh generation.
We develop a new benchmark, Unconventionally Feasible Objects (UFO), as well as SimpleBlenv, a wrapper environment built on top of Blender.
Our approach surpasses the standard GPT-4 and other language agents for 3D mesh generation on ShapeNet.
arXiv Detail & Related papers (2024-02-14T09:51:05Z) - Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D
priors [16.93758384693786]
Bidirectional Diffusion(BiDiff) is a unified framework that incorporates both a 3D and a 2D diffusion process.
Our model achieves high-quality, diverse, and scalable 3D generation.
arXiv Detail & Related papers (2023-12-07T10:00:04Z) - Amodal 3D Reconstruction for Robotic Manipulation via Stability and
Connectivity [3.359622001455893]
Learning-based 3D object reconstruction enables single- or few-shot estimation of 3D object models.
Existing 3D reconstruction techniques optimize for visual reconstruction fidelity, typically measured by chamfer distance or voxel IOU.
We propose ARM, an amodal 3D reconstruction system that introduces a stability prior over object shapes, (2) a connectivity prior, and (3) a multi-channel input representation.
arXiv Detail & Related papers (2020-09-28T08:52:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.