Related papers: SADG: Segment Any Dynamic Gaussian Without Object Trackers

SADG: Segment Any Dynamic Gaussian Without Object Trackers

URL: http://arxiv.org/abs/2411.19290v1
Date: Thu, 28 Nov 2024 17:47:48 GMT
Title: SADG: Segment Any Dynamic Gaussian Without Object Trackers
Authors: Yun-Jin Li, Mariia Gladkova, Yan Xia, Daniel Cremers,
Abstract summary: SADG, Segment Any Dynamic Gaussian Without Object Trackers, is a novel approach that combines dynamic Gaussian Splatting representation and semantic information without reliance on object IDs.<n>We learn semantically-aware features by leveraging masks generated from the Segment Anything Model (SAM) and utilizing our novel contrastive learning objective based on hard pixel mining.<n>We evaluate SADG on proposed benchmarks and demonstrate the superior performance of our approach in segmenting objects within dynamic scenes.
Score: 39.77468734311312
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Understanding dynamic 3D scenes is fundamental for various applications, including extended reality (XR) and autonomous driving. Effectively integrating semantic information into 3D reconstruction enables holistic representation that opens opportunities for immersive and interactive applications. We introduce SADG, Segment Any Dynamic Gaussian Without Object Trackers, a novel approach that combines dynamic Gaussian Splatting representation and semantic information without reliance on object IDs. In contrast to existing works, we do not rely on supervision based on object identities to enable consistent segmentation of dynamic 3D objects. To this end, we propose to learn semantically-aware features by leveraging masks generated from the Segment Anything Model (SAM) and utilizing our novel contrastive learning objective based on hard pixel mining. The learned Gaussian features can be effectively clustered without further post-processing. This enables fast computation for further object-level editing, such as object removal, composition, and style transfer by manipulating the Gaussians in the scene. We further extend several dynamic novel-view datasets with segmentation benchmarks to enable testing of learned feature fields from unseen viewpoints. We evaluate SADG on proposed benchmarks and demonstrate the superior performance of our approach in segmenting objects within dynamic scenes along with its effectiveness for further downstream editing tasks.

Related papers

ODG: Occupancy Prediction Using Dual Gaussians [38.9869091446875]
Occupancy prediction infers fine-grained 3D geometry and semantics from camera images of the surrounding environment.<n>Existing methods either adopt dense grids as scene representation, or learn the entire scene using a single set of sparse queries.<n>We present ODG, a hierarchical dual sparse Gaussian representation to effectively capture complex scene dynamics.
arXiv Detail & Related papers (2025-06-11T06:03:03Z)
MonoMobility: Zero-Shot 3D Mobility Analysis from Monocular Videos [43.906631899750906]
We propose an innovative framework that can analyze 3D mobility from monocular videos in a zero-shot manner.<n>This framework can precisely parse motion parts and motion attributes only using a monocular video, completely eliminating the need for annotated training data.<n>Building on this, we introduce an end-to-end dynamic scene optimization algorithm specifically designed for articulated objects.
arXiv Detail & Related papers (2025-05-17T06:21:05Z)
IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments [56.85804719947]
We present IAAO, a framework that builds an explicit 3D model for intelligent agents to gain understanding of articulated objects in their environment through interaction. We first build hierarchical features and label fields for each object state using 3D Gaussian Splatting (3DGS) by distilling mask features and view-consistent labels from multi-view images. We then perform object- and part-level queries on the 3D Gaussian primitives to identify static and articulated elements, estimating global transformations and local articulation parameters along with affordances.
arXiv Detail & Related papers (2025-04-09T12:36:48Z)
ArtGS: Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting [66.29782808719301]
Building articulated objects is a key challenge in computer vision. Existing methods often fail to effectively integrate information across different object states. We introduce ArtGS, a novel approach that leverages 3D Gaussians as a flexible and efficient representation.
arXiv Detail & Related papers (2025-02-26T10:25:32Z)
Go-SLAM: Grounded Object Segmentation and Localization with Gaussian Splatting SLAM [12.934788858420752]
Go-SLAM is a novel framework that utilizes 3D Gaussian Splatting SLAM to reconstruct dynamic environments. Our system facilitates open-vocabulary querying, allowing users to locate objects using natural language descriptions.
arXiv Detail & Related papers (2024-09-25T13:56:08Z)
DENSER: 3D Gaussians Splatting for Scene Reconstruction of Dynamic Urban Environments [0.0]
We propose DENSER, a framework that significantly enhances the representation of dynamic objects. The proposed approach significantly outperforms state-of-the-art methods by a wide margin.
arXiv Detail & Related papers (2024-09-16T07:11:58Z)
GIC: Gaussian-Informed Continuum for Physical Property Identification and Simulation [60.33467489955188]
This paper studies the problem of estimating physical properties (system identification) through visual observations. To facilitate geometry-aware guidance in physical property estimation, we introduce a novel hybrid framework. We propose a new dynamic 3D Gaussian framework based on motion factorization to recover the object as 3D Gaussian point sets. In addition to the extracted object surfaces, the Gaussian-informed continuum also enables the rendering of object masks during simulations.
arXiv Detail & Related papers (2024-06-21T07:37:17Z)
LOCATE: Self-supervised Object Discovery via Flow-guided Graph-cut and Bootstrapped Self-training [13.985488693082981]
We propose a self-supervised object discovery approach that leverages motion and appearance information to produce high-quality object segmentation masks. We demonstrate the effectiveness of our approach, named LOCATE, on multiple standard video object segmentation, image saliency detection, and object segmentation benchmarks.
arXiv Detail & Related papers (2023-08-22T07:27:09Z)
Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast Contrastive Fusion [110.84357383258818]
We propose a novel approach to lift 2D segments to 3D and fuse them by means of a neural field representation. The core of our approach is a slow-fast clustering objective function, which is scalable and well-suited for scenes with a large number of objects. Our approach outperforms the state-of-the-art on challenging scenes from the ScanNet, Hypersim, and Replica datasets.
arXiv Detail & Related papers (2023-06-07T17:57:45Z)
Semi-Weakly Supervised Object Kinematic Motion Prediction [56.282759127180306]
Given a 3D object, kinematic motion prediction aims to identify the mobile parts as well as the corresponding motion parameters. We propose a graph neural network to learn the map between hierarchical part-level segmentation and mobile parts parameters. The network predictions yield a large scale of 3D objects with pseudo labeled mobility information.
arXiv Detail & Related papers (2023-03-31T02:37:36Z)
"What's This?" -- Learning to Segment Unknown Objects from Manipulation Sequences [27.915309216800125]
We present a novel framework for self-supervised grasped object segmentation with a robotic manipulator. We propose a single, end-to-end trainable architecture which jointly incorporates motion cues and semantic knowledge. Our method neither depends on any visual registration of a kinematic robot or 3D object models, nor on precise hand-eye calibration or any additional sensor data.
arXiv Detail & Related papers (2020-11-06T10:55:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.