Related papers: SGIFormer: Semantic-guided and Geometric-enhanced Interleaving Transformer for 3D Instance Segmentation

SGIFormer: Semantic-guided and Geometric-enhanced Interleaving Transformer for 3D Instance Segmentation

URL: http://arxiv.org/abs/2407.11564v1
Date: Tue, 16 Jul 2024 10:17:28 GMT
Title: SGIFormer: Semantic-guided and Geometric-enhanced Interleaving Transformer for 3D Instance Segmentation
Authors: Lei Yao, Yi Wang, Moyun Liu, Lap-Pui Chau,
Abstract summary: This paper introduces a novel method, named SGIFormer, for 3D instance segmentation. It is composed of the Semantic-guided Mix Query (SMQ) and the Geometric-enhanced Interleaving Transformer (GIT) decoder. It attains state-of-the-art performance on ScanNet V2, ScanNet200, and the challenging high-fidelity ScanNet++ benchmark.
Score: 14.214197948110115
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In recent years, transformer-based models have exhibited considerable potential in point cloud instance segmentation. Despite the promising performance achieved by existing methods, they encounter challenges such as instance query initialization problems and excessive reliance on stacked layers, rendering them incompatible with large-scale 3D scenes. This paper introduces a novel method, named SGIFormer, for 3D instance segmentation, which is composed of the Semantic-guided Mix Query (SMQ) initialization and the Geometric-enhanced Interleaving Transformer (GIT) decoder. Specifically, the principle of our SMQ initialization scheme is to leverage the predicted voxel-wise semantic information to implicitly generate the scene-aware query, yielding adequate scene prior and compensating for the learnable query set. Subsequently, we feed the formed overall query into our GIT decoder to alternately refine instance query and global scene features for further capturing fine-grained information and reducing complex design intricacies simultaneously. To emphasize geometric property, we consider bias estimation as an auxiliary task and progressively integrate shifted point coordinates embedding to reinforce instance localization. SGIFormer attains state-of-the-art performance on ScanNet V2, ScanNet200 datasets, and the challenging high-fidelity ScanNet++ benchmark, striking a balance between accuracy and efficiency. The code, weights, and demo videos are publicly available at https://rayyoh.github.io/sgiformer.

Related papers

Beyond the Final Layer: Hierarchical Query Fusion Transformer with Agent-Interpolation Initialization for 3D Instance Segmentation [33.58208166717537]
3D instance segmentation aims to predict a set of object instances in a scene and represent them as binary foreground masks with corresponding semantic labels. transformer-based methods are gaining increasing attention due to their elegant pipelines, reduced manual selection of geometric properties, and superior performance.
arXiv Detail & Related papers (2025-02-06T15:19:48Z)
Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries. We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images. Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z)
SAGS: Structure-Aware 3D Gaussian Splatting [53.6730827668389]
We propose a structure-aware Gaussian Splatting method (SAGS) that implicitly encodes the geometry of the scene. SAGS reflects to state-of-the-art rendering performance and reduced storage requirements on benchmark novel-view synthesis datasets.
arXiv Detail & Related papers (2024-04-29T23:26:30Z)
InstantSplat: Sparse-view Gaussian Splatting in Seconds [91.77050739918037]
We introduce InstantSplat, a novel approach for addressing sparse-view 3D scene reconstruction at lightning-fast speed. InstantSplat employs a self-supervised framework that optimize 3D scene representation and camera poses. It achieves an acceleration of over 30x in reconstruction and improves visual quality (SSIM) from 0.3755 to 0.7624 compared to traditional SfM with 3D-GS.
arXiv Detail & Related papers (2024-03-29T17:29:58Z)
AutoInst: Automatic Instance-Based Segmentation of LiDAR 3D Scans [41.17467024268349]
Making sense of 3D environments requires fine-grained scene understanding. We propose to predict instance segmentations for 3D scenes in an unsupervised way. Our approach attains 13.3% higher Average Precision and 9.1% higher F1 score compared to the best-performing baseline.
arXiv Detail & Related papers (2024-03-24T22:53:16Z)
SAI3D: Segment Any Instance in 3D Scenes [68.57002591841034]
We introduce SAI3D, a novel zero-shot 3D instance segmentation approach. Our method partitions a 3D scene into geometric primitives, which are then progressively merged into 3D instance segmentations. Empirical evaluations on ScanNet, Matterport3D and the more challenging ScanNet++ datasets demonstrate the superiority of our approach.
arXiv Detail & Related papers (2023-12-17T09:05:47Z)
Living Scenes: Multi-object Relocalization and Reconstruction in Changing 3D Environments [20.890476387720483]
MoRE is a novel approach for multi-object relocalization and reconstruction in evolving environments. We view these environments as "living scenes" and consider the problem of transforming scans taken at different points in time into a 3D reconstruction of the object instances.
arXiv Detail & Related papers (2023-12-14T17:09:57Z)
Spatial-Temporal Graph Enhanced DETR Towards Multi-Frame 3D Object Detection [54.041049052843604]
We present STEMD, a novel end-to-end framework that enhances the DETR-like paradigm for multi-frame 3D object detection. First, to model the inter-object spatial interaction and complex temporal dependencies, we introduce the spatial-temporal graph attention network. Finally, it poses a challenge for the network to distinguish between the positive query and other highly similar queries that are not the best match.
arXiv Detail & Related papers (2023-07-01T13:53:14Z)
Geometry-Contrastive Transformer for Generalized 3D Pose Transfer [95.56457218144983]
The intuition of this work is to perceive the geometric inconsistency between the given meshes with the powerful self-attention mechanism. We propose a novel geometry-contrastive Transformer that has an efficient 3D structured perceiving ability to the global geometric inconsistencies. We present a latent isometric regularization module together with a novel semi-synthesized dataset for the cross-dataset 3D pose transfer task.
arXiv Detail & Related papers (2021-12-14T13:14:24Z)
DyCo3D: Robust Instance Segmentation of 3D Point Clouds through Dynamic Convolution [136.7261709896713]
We propose a data-driven approach that generates the appropriate convolution kernels to apply in response to the nature of the instances. The proposed method achieves promising results on both ScanetNetV2 and S3DIS. It also improves inference speed by more than 25% over the current state-of-the-art.
arXiv Detail & Related papers (2020-11-26T14:56:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.