DeformGS: Scene Flow in Highly Deformable Scenes for Deformable Object Manipulation
- URL: http://arxiv.org/abs/2312.00583v2
- Date: Fri, 30 Aug 2024 15:16:43 GMT
- Title: DeformGS: Scene Flow in Highly Deformable Scenes for Deformable Object Manipulation
- Authors: Bardienus P. Duisterhof, Zhao Mandi, Yunchao Yao, Jia-Wei Liu, Jenny Seidenschwarz, Mike Zheng Shou, Deva Ramanan, Shuran Song, Stan Birchfield, Bowen Wen, Jeffrey Ichnowski,
- Abstract summary: DeformGS is an approach to recover scene flow in highly deformable scenes using simultaneous video captures of a dynamic scene from multiple cameras.
DeformGS improves 3D tracking by an average of 55.8% compared to the state-of-the-art.
With sufficient texture, DeformGS achieves a median tracking error of 3.3 mm on a cloth of 1.5 x 1.5 m in area.
- Score: 66.7719069053058
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Teaching robots to fold, drape, or reposition deformable objects such as cloth will unlock a variety of automation applications. While remarkable progress has been made for rigid object manipulation, manipulating deformable objects poses unique challenges, including frequent occlusions, infinite-dimensional state spaces and complex dynamics. Just as object pose estimation and tracking have aided robots for rigid manipulation, dense 3D tracking (scene flow) of highly deformable objects will enable new applications in robotics while aiding existing approaches, such as imitation learning or creating digital twins with real2sim transfer. We propose DeformGS, an approach to recover scene flow in highly deformable scenes, using simultaneous video captures of a dynamic scene from multiple cameras. DeformGS builds on recent advances in Gaussian splatting, a method that learns the properties of a large number of Gaussians for state-of-the-art and fast novel-view synthesis. DeformGS learns a deformation function to project a set of Gaussians with canonical properties into world space. The deformation function uses a neural-voxel encoding and a multilayer perceptron (MLP) to infer Gaussian position, rotation, and a shadow scalar. We enforce physics-inspired regularization terms based on conservation of momentum and isometry, which leads to trajectories with smaller trajectory errors. We also leverage existing foundation models SAM and XMEM to produce noisy masks, and learn a per-Gaussian mask for better physics-inspired regularization. DeformGS achieves high-quality 3D tracking on highly deformable scenes with shadows and occlusions. In experiments, DeformGS improves 3D tracking by an average of 55.8% compared to the state-of-the-art. With sufficient texture, DeformGS achieves a median tracking error of 3.3 mm on a cloth of 1.5 x 1.5 m in area. Website: https://deformgs.github.io
Related papers
- Dynamic 3D Gaussian Tracking for Graph-Based Neural Dynamics Modeling [10.247075501610492]
We introduce a framework to learn object dynamics directly from multi-view RGB videos.
We train a particle-based dynamics model using Graph Neural Networks.
Our method can predict object motions under varying initial configurations and unseen robot actions.
arXiv Detail & Related papers (2024-10-24T17:02:52Z) - MotionGS: Exploring Explicit Motion Guidance for Deformable 3D Gaussian Splatting [56.785233997533794]
We propose a novel deformable 3D Gaussian splatting framework called MotionGS.
MotionGS explores explicit motion priors to guide the deformation of 3D Gaussians.
Experiments in the monocular dynamic scenes validate that MotionGS surpasses state-of-the-art methods.
arXiv Detail & Related papers (2024-10-10T08:19:47Z) - LoopGaussian: Creating 3D Cinemagraph with Multi-view Images via Eulerian Motion Field [13.815932949774858]
Cinemagraph is a form of visual media that combines elements of still photography and subtle motion to create a captivating experience.
We propose LoopGaussian to elevate cinemagraph from 2D image space to 3D space using 3D Gaussian modeling.
Experiment results validate the effectiveness of our approach, demonstrating high-quality and visually appealing scene generation.
arXiv Detail & Related papers (2024-04-13T11:07:53Z) - HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting [53.6394928681237]
holistic understanding of urban scenes based on RGB images is a challenging yet important problem.
Our main idea involves the joint optimization of geometry, appearance, semantics, and motion using a combination of static and dynamic 3D Gaussians.
Our approach offers the ability to render new viewpoints in real-time, yielding 2D and 3D semantic information with high accuracy.
arXiv Detail & Related papers (2024-03-19T13:39:05Z) - UV Gaussians: Joint Learning of Mesh Deformation and Gaussian Textures for Human Avatar Modeling [71.87807614875497]
We propose UV Gaussians, which models the 3D human body by jointly learning mesh deformations and 2D UV-space Gaussian textures.
We collect and process a new dataset of human motion, which includes multi-view images, scanned models, parametric model registration, and corresponding texture maps. Experimental results demonstrate that our method achieves state-of-the-art synthesis of novel view and novel pose.
arXiv Detail & Related papers (2024-03-18T09:03:56Z) - Rig3DGS: Creating Controllable Portraits from Casual Monocular Videos [33.779636707618785]
We introduce Rig3DGS to create controllable 3D human portraits from casual smartphone videos.
Key innovation is a carefully designed deformation method which is guided by a learnable prior derived from a 3D morphable model.
We demonstrate the effectiveness of our learned deformation through extensive quantitative and qualitative experiments.
arXiv Detail & Related papers (2024-02-06T05:40:53Z) - MoDA: Modeling Deformable 3D Objects from Casual Videos [84.29654142118018]
We propose neural dual quaternion blend skinning (NeuDBS) to achieve 3D point deformation without skin-collapsing artifacts.
In the endeavor to register 2D pixels across different frames, we establish a correspondence between canonical feature embeddings that encodes 3D points within the canonical space.
Our approach can reconstruct 3D models for humans and animals with better qualitative and quantitative performance than state-of-the-art methods.
arXiv Detail & Related papers (2023-04-17T13:49:04Z) - Animatable Implicit Neural Representations for Creating Realistic
Avatars from Videos [63.16888987770885]
This paper addresses the challenge of reconstructing an animatable human model from a multi-view video.
We introduce a pose-driven deformation field based on the linear blend skinning algorithm.
We show that our approach significantly outperforms recent human modeling methods.
arXiv Detail & Related papers (2022-03-15T17:56:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.