SAD: Segment Any RGBD
- URL: http://arxiv.org/abs/2305.14207v1
- Date: Tue, 23 May 2023 16:26:56 GMT
- Title: SAD: Segment Any RGBD
- Authors: Jun Cen, Yizheng Wu, Kewei Wang, Xingyi Li, Jingkang Yang, Yixuan Pei,
Lingdong Kong, Ziwei Liu, Qifeng Chen
- Abstract summary: The Segment Anything Model (SAM) has demonstrated its effectiveness in segmenting any part of 2D RGB images.
We propose the Segment Any RGBD (SAD) model, which is specifically designed to extract geometry information directly from images.
- Score: 54.24917975958583
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The Segment Anything Model (SAM) has demonstrated its effectiveness in
segmenting any part of 2D RGB images. However, SAM exhibits a stronger emphasis
on texture information while paying less attention to geometry information when
segmenting RGB images. To address this limitation, we propose the Segment Any
RGBD (SAD) model, which is specifically designed to extract geometry
information directly from images. Inspired by the natural ability of humans to
identify objects through the visualization of depth maps, SAD utilizes SAM to
segment the rendered depth map, thus providing cues with enhanced geometry
information and mitigating the issue of over-segmentation. We further include
the open-vocabulary semantic segmentation in our framework, so that the 3D
panoptic segmentation is fulfilled. The project is available on
https://github.com/Jun-CEN/SegmentAnyRGBD.
Related papers
- Depth-guided Texture Diffusion for Image Semantic Segmentation [47.46257473475867]
We introduce a Depth-guided Texture Diffusion approach that effectively tackles the outlined challenge.
Our method extracts low-level features from edges and textures to create a texture image.
By integrating this enriched depth map with the original RGB image into a joint feature embedding, our method effectively bridges the disparity between the depth map and the image.
arXiv Detail & Related papers (2024-08-17T04:55:03Z) - MeshSegmenter: Zero-Shot Mesh Semantic Segmentation via Texture Synthesis [27.703204488877038]
MeshSegmenter is a framework designed for zero-shot 3D semantic segmentation.
It delivers accurate 3D segmentation across diverse meshes and segment descriptions.
arXiv Detail & Related papers (2024-07-18T16:50:59Z) - View-Consistent Hierarchical 3D Segmentation Using Ultrametric Feature Fields [52.08335264414515]
We learn a novel feature field within a Neural Radiance Field (NeRF) representing a 3D scene.
Our method takes view-inconsistent multi-granularity 2D segmentations as input and produces a hierarchy of 3D-consistent segmentations as output.
We evaluate our method and several baselines on synthetic datasets with multi-view images and multi-granular segmentation, showcasing improved accuracy and viewpoint-consistency.
arXiv Detail & Related papers (2024-05-30T04:14:58Z) - Part123: Part-aware 3D Reconstruction from a Single-view Image [54.589723979757515]
Part123 is a novel framework for part-aware 3D reconstruction from a single-view image.
We introduce contrastive learning into a neural rendering framework to learn a part-aware feature space.
A clustering-based algorithm is also developed to automatically derive 3D part segmentation results from the reconstructed models.
arXiv Detail & Related papers (2024-05-27T07:10:21Z) - A One Stop 3D Target Reconstruction and multilevel Segmentation Method [0.0]
We propose an open-source one stop 3D target reconstruction and multilevel segmentation framework (OSTRA)
OSTRA performs segmentation on 2D images, tracks multiple instances with segmentation labels in the image sequence, and then reconstructs labelled 3D objects or multiple parts with Multi-View Stereo (MVS) or RGBD-based 3D reconstruction methods.
Our method opens up a new avenue for reconstructing 3D targets embedded with rich multi-scale segmentation information in complex scenes.
arXiv Detail & Related papers (2023-08-14T07:12:31Z) - TomoSAM: a 3D Slicer extension using SAM for tomography segmentation [62.997667081978825]
TomoSAM has been developed to integrate the cutting-edge Segment Anything Model (SAM) into 3D Slicer.
SAM is a promptable deep learning model that is able to identify objects and create image masks in a zero-shot manner.
The synergy between these tools aids in the segmentation of complex 3D datasets from tomography or other imaging techniques.
arXiv Detail & Related papers (2023-06-14T16:13:27Z) - 3D Instance Segmentation of MVS Buildings [5.2517244720510305]
We present a novel framework for instance segmentation of 3D buildings from Multi-view Stereo (MVS) urban scenes.
The emphasis of this work lies in detecting and segmenting 3D building instances even if they are attached and embedded in a large and imprecise 3D surface model.
arXiv Detail & Related papers (2021-12-18T11:12:38Z) - Panoptic 3D Scene Reconstruction From a Single RGB Image [24.960786016915105]
Understanding 3D scenes from a single image is fundamental to a wide variety of tasks, such as for robotics, motion planning, or augmented reality.
Inspired by 2D panoptic segmentation, we propose to unify the tasks of geometric reconstruction, 3D semantic segmentation, and 3D instance segmentation into the task of panoptic 3D scene reconstruction.
We demonstrate that this holistic view of joint scene reconstruction, semantic, and instance segmentation is beneficial over treating the tasks independently, thus outperforming alternative approaches.
arXiv Detail & Related papers (2021-11-03T18:06:38Z) - Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD
Images [69.5662419067878]
Grounding referring expressions in RGBD image has been an emerging field.
We present a novel task of 3D visual grounding in single-view RGBD image where the referred objects are often only partially scanned due to occlusion.
Our approach first fuses the language and the visual features at the bottom level to generate a heatmap that localizes the relevant regions in the RGBD image.
Then our approach conducts an adaptive feature learning based on the heatmap and performs the object-level matching with another visio-linguistic fusion to finally ground the referred object.
arXiv Detail & Related papers (2021-03-14T11:18:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.