Related papers: Segment Any 4D Gaussians

Segment Any 4D Gaussians

URL: http://arxiv.org/abs/2407.04504v2
Date: Fri, 12 Jul 2024 12:06:25 GMT
Title: Segment Any 4D Gaussians
Authors: Shengxiang Ji, Guanjun Wu, Jiemin Fang, Jiazhong Cen, Taoran Yi, Wenyu Liu, Qi Tian, Xinggang Wang,
Abstract summary: We propose Segment Any 4D Gaussians (SA4D) to segment anything in the 4D digital world based on 4D Gaussians. SA4D achieves precise, high-quality segmentation within seconds in 4D Gaussians and shows the ability to remove, recolor, compose, and render high-quality anything masks.
Score: 69.53172192552508
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modeling, understanding, and reconstructing the real world are crucial in XR/VR. Recently, 3D Gaussian Splatting (3D-GS) methods have shown remarkable success in modeling and understanding 3D scenes. Similarly, various 4D representations have demonstrated the ability to capture the dynamics of the 4D world. However, there is a dearth of research focusing on segmentation within 4D representations. In this paper, we propose Segment Any 4D Gaussians (SA4D), one of the first frameworks to segment anything in the 4D digital world based on 4D Gaussians. In SA4D, an efficient temporal identity feature field is introduced to handle Gaussian drifting, with the potential to learn precise identity features from noisy and sparse input. Additionally, a 4D segmentation refinement process is proposed to remove artifacts. Our SA4D achieves precise, high-quality segmentation within seconds in 4D Gaussians and shows the ability to remove, recolor, compose, and render high-quality anything masks. More demos are available at: https://jsxzs.github.io/sa4d/.

Related papers

Disentangled 4D Gaussian Splatting: Towards Faster and More Efficient Dynamic Scene Rendering [12.27734287104036]
Novel-entangleview synthesis (NVS) for dynamic scenes from 2D images presents significant challenges. We introduce Disentangled 4D Gaussianting (Disentangled4DGS), a novel representation and rendering approach that disentangles temporal and spatial deformations. Our approach achieves an unprecedented average rendering speed of 343 FPS at a resolution of $1352times1014$ on a 3090 GPU.
arXiv Detail & Related papers (2025-03-28T05:46:02Z)
GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through Efficient Dense 3D Point Tracking [38.104532522698285]
Training a video Diffusion Transformer (DiT) directly to control 4D content requires expensive multi-view videos. Inspired by Monocular Dynamic novel View Synthesis (MDVS), we bring pseudo 4D Gaussian fields to video generation. We finetune a pretrained DiT to generate videos following the guidance of the rendered video, dubbed as GS-DiT.
arXiv Detail & Related papers (2025-01-05T23:55:33Z)
GenXD: Generating Any 3D and 4D Scenes [137.5455092319533]
We propose to jointly investigate general 3D and 4D generation by leveraging camera and object movements commonly observed in daily life. By leveraging all the 3D and 4D data, we develop our framework, GenXD, which allows us to produce any 3D or 4D scene.
arXiv Detail & Related papers (2024-11-04T17:45:44Z)
Disco4D: Disentangled 4D Human Generation and Animation from a Single Image [49.188657545633475]
textbfD4D is a novel framework for 4D human generation and animation from a single image. It disentangles clothings from the human body (with SMPL-X model) It supports 4D human animation with vivid dynamics.
arXiv Detail & Related papers (2024-09-25T18:46:06Z)
4D Panoptic Scene Graph Generation [102.22082008976228]
We introduce 4D Panoptic Scene Graph (PSG-4D), a new representation that bridges the raw visual data perceived in a dynamic 4D world and high-level visual understanding. Specifically, PSG-4D abstracts rich 4D sensory data into nodes, which represent entities with precise location and status information, and edges, which capture the temporal relations. We propose PSG4DFormer, a Transformer-based model that can predict panoptic segmentation masks, track masks along the time axis, and generate the corresponding scene graphs.
arXiv Detail & Related papers (2024-05-16T17:56:55Z)
4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency [118.15258850780417]
This work introduces 4DGen, a novel framework for grounded 4D content creation. We identify static 3D assets and monocular video sequences as key components in constructing the 4D content. Our pipeline facilitates conditional 4D generation, enabling users to specify geometry (3D assets) and motion (monocular videos)
arXiv Detail & Related papers (2023-12-28T18:53:39Z)
DreamGaussian4D: Generative 4D Gaussian Splatting [56.49043443452339]
We introduce DreamGaussian4D (DG4D), an efficient 4D generation framework that builds on Gaussian Splatting (GS) Our key insight is that combining explicit modeling of spatial transformations with static GS makes an efficient and powerful representation for 4D generation. Video generation methods have the potential to offer valuable spatial-temporal priors, enhancing the high-quality 4D generation.
arXiv Detail & Related papers (2023-12-28T17:16:44Z)
4D Gaussian Splatting for Real-Time Dynamic Scene Rendering [103.32717396287751]
We propose 4D Gaussian Splatting (4D-GS) as a holistic representation for dynamic scenes. A neuralvoxel encoding algorithm inspired by HexPlane is proposed to efficiently build features from 4D neural voxels. Our 4D-GS method achieves real-time rendering under high resolutions, 82 FPS at an 800$times$800 resolution on an 3090 GPU.
arXiv Detail & Related papers (2023-10-12T17:21:41Z)
Learning to Generate Customized Dynamic 3D Facial Expressions [47.5220752079009]
We study 3D image-to-video translation with a particular focus on 4D facial expressions. We employ a deep mesh-decoder like architecture to synthesize realistic high resolution facial expressions. We trained our model using a high resolution dataset with 4D scans of six facial expressions from 180 subjects.
arXiv Detail & Related papers (2020-07-19T22:38:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.