Feature Splatting: Language-Driven Physics-Based Scene Synthesis and Editing
- URL: http://arxiv.org/abs/2404.01223v1
- Date: Mon, 1 Apr 2024 16:31:04 GMT
- Title: Feature Splatting: Language-Driven Physics-Based Scene Synthesis and Editing
- Authors: Ri-Zhao Qiu, Ge Yang, Weijia Zeng, Xiaolong Wang,
- Abstract summary: We introduce Feature Splatting, an approach that unifies physics-based dynamic scene synthesis with rich semantics.
Our first contribution is a way to distill high-quality, object-centric vision-language features into 3D Gaussians.
Our second contribution is a way to synthesize physics-based dynamics from an otherwise static scene using a particle-based simulator.
- Score: 11.46530458561589
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scene representations using 3D Gaussian primitives have produced excellent results in modeling the appearance of static and dynamic 3D scenes. Many graphics applications, however, demand the ability to manipulate both the appearance and the physical properties of objects. We introduce Feature Splatting, an approach that unifies physics-based dynamic scene synthesis with rich semantics from vision language foundation models that are grounded by natural language. Our first contribution is a way to distill high-quality, object-centric vision-language features into 3D Gaussians, that enables semi-automatic scene decomposition using text queries. Our second contribution is a way to synthesize physics-based dynamics from an otherwise static scene using a particle-based simulator, in which material properties are assigned automatically via text queries. We ablate key techniques used in this pipeline, to illustrate the challenge and opportunities in using feature-carrying 3D Gaussians as a unified format for appearance, geometry, material properties and semantics grounded on natural language. Project website: https://feature-splatting.github.io/
Related papers
- GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs [21.3615403516602]
Estimating physical properties for visual data is a crucial task in computer vision, graphics, and robotics.
We introduce GaussianProperty, a training-free framework that assigns physical properties of materials to 3D Gaussians.
We demonstrate that 3D Gaussians with physical property annotations enable applications in physics-based dynamic simulation and robotic grasping.
arXiv Detail & Related papers (2024-12-15T17:44:10Z) - Automated 3D Physical Simulation of Open-world Scene with Gaussian Splatting [22.40115216094332]
We present Sim Anything, a physics-based approach that endows static 3D objects with interactive dynamics.
Inspired by human visual reasoning, we propose MLLM-based Physical Property Perception.
We also simulate objects in an open-world scene with particles sampled via the Physical-Geometric Adaptive Sampling.
arXiv Detail & Related papers (2024-11-19T12:52:21Z) - Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering [57.895846642868904]
We present a 3D generative model named DynaVol-S for dynamic scenes that enables object-centric learning.
voxelization infers per-object occupancy probabilities at individual spatial locations.
Our approach integrates 2D semantic features to create 3D semantic grids, representing the scene through multiple disentangled voxel grids.
arXiv Detail & Related papers (2024-07-30T15:33:58Z) - Latent Intuitive Physics: Learning to Transfer Hidden Physics from A 3D Video [58.043569985784806]
We introduce latent intuitive physics, a transfer learning framework for physics simulation.
It can infer hidden properties of fluids from a single 3D video and simulate the observed fluid in novel scenes.
We validate our model in three ways: (i) novel scene simulation with the learned visual-world physics, (ii) future prediction of the observed fluid dynamics, and (iii) supervised particle simulation.
arXiv Detail & Related papers (2024-06-18T16:37:44Z) - HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting [53.6394928681237]
holistic understanding of urban scenes based on RGB images is a challenging yet important problem.
Our main idea involves the joint optimization of geometry, appearance, semantics, and motion using a combination of static and dynamic 3D Gaussians.
Our approach offers the ability to render new viewpoints in real-time, yielding 2D and 3D semantic information with high accuracy.
arXiv Detail & Related papers (2024-03-19T13:39:05Z) - Reconstruction and Simulation of Elastic Objects with Spring-Mass 3D Gaussians [23.572267290979045]
Spring-Gaus is a 3D physical object representation for reconstructing and simulating elastic objects from videos of the object from multiple viewpoints.
We develop and integrate a 3D Spring-Mass model into 3D Gaussian kernels, enabling the reconstruction of the visual appearance, shape, and physical dynamics of the object.
We evaluate Spring-Gaus on both synthetic and real-world datasets, demonstrating accurate reconstruction and simulation of elastic objects.
arXiv Detail & Related papers (2024-03-14T14:25:10Z) - CG3D: Compositional Generation for Text-to-3D via Gaussian Splatting [57.14748263512924]
CG3D is a method for compositionally generating scalable 3D assets.
Gamma radiance fields, parameterized to allow for compositions of objects, possess the capability to enable semantically and physically consistent scenes.
arXiv Detail & Related papers (2023-11-29T18:55:38Z) - Differentiable Blocks World: Qualitative 3D Decomposition by Rendering
Primitives [70.32817882783608]
We present an approach that produces a simple, compact, and actionable 3D world representation by means of 3D primitives.
Unlike existing primitive decomposition methods that rely on 3D input data, our approach operates directly on images.
We show that the resulting textured primitives faithfully reconstruct the input images and accurately model the visible 3D points.
arXiv Detail & Related papers (2023-07-11T17:58:31Z) - 3D-IntPhys: Towards More Generalized 3D-grounded Visual Intuitive
Physics under Challenging Scenes [68.66237114509264]
We present a framework capable of learning 3D-grounded visual intuitive physics models from videos of complex scenes with fluids.
We show our model can make long-horizon future predictions by learning from raw images and significantly outperforms models that do not employ an explicit 3D representation space.
arXiv Detail & Related papers (2023-04-22T19:28:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.