MSTA3D: Multi-scale Twin-attention for 3D Instance Segmentation
- URL: http://arxiv.org/abs/2411.01781v3
- Date: Mon, 11 Nov 2024 10:48:05 GMT
- Title: MSTA3D: Multi-scale Twin-attention for 3D Instance Segmentation
- Authors: Duc Dang Trung Tran, Byeongkeun Kang, Yeejin Lee,
- Abstract summary: MSTA3D is a novel framework for superpoint-based 3D instance segmentation.
It exploits multi-scale feature representation and introduces a twin-attention mechanism to effectively capture them.
Our approach surpasses state-of-the-art 3D instance segmentation methods.
- Score: 7.400926717561454
- License:
- Abstract: Recently, transformer-based techniques incorporating superpoints have become prevalent in 3D instance segmentation. However, they often encounter an over-segmentation problem, especially noticeable with large objects. Additionally, unreliable mask predictions stemming from superpoint mask prediction further compound this issue. To address these challenges, we propose a novel framework called MSTA3D. It leverages multi-scale feature representation and introduces a twin-attention mechanism to effectively capture them. Furthermore, MSTA3D integrates a box query with a box regularizer, offering a complementary spatial constraint alongside semantic queries. Experimental evaluations on ScanNetV2, ScanNet200 and S3DIS datasets demonstrate that our approach surpasses state-of-the-art 3D instance segmentation methods.
Related papers
- Any3DIS: Class-Agnostic 3D Instance Segmentation by 2D Mask Tracking [6.599971425078935]
Existing 3D instance segmentation methods frequently encounter issues with over-segmentation, leading to redundant and inaccurate 3D proposals that complicate downstream tasks.
This challenge arises from their unsupervised merging approach, where dense 2D masks are lifted across frames into point clouds to form 3D candidate proposals without direct supervision.
We propose a 3D-Aware 2D Mask Tracking module that uses robust 3D priors from a 2D mask segmentation and tracking foundation model (SAM-2) to ensure consistent object masks across video frames.
arXiv Detail & Related papers (2024-11-25T08:26:31Z) - SA3DIP: Segment Any 3D Instance with Potential 3D Priors [41.907914881608995]
We propose SA3DIP, a novel method for Segmenting Any 3D Instances via exploiting potential 3D Priors.
Specifically, on one hand, we generate complementary 3D primitives based on both geometric and textural priors.
On the other hand, we introduce supplemental constraints from the 3D space by using a 3D detector to guide a further merging process.
arXiv Detail & Related papers (2024-11-06T10:39:00Z) - EmbodiedSAM: Online Segment Any 3D Thing in Real Time [61.2321497708998]
Embodied tasks require the agent to fully understand 3D scenes simultaneously with its exploration.
An online, real-time, fine-grained and highly-generalized 3D perception model is desperately needed.
arXiv Detail & Related papers (2024-08-21T17:57:06Z) - Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance [49.14140194332482]
We introduce Open3DIS, a novel solution designed to tackle the problem of Open-Vocabulary Instance within 3D scenes.
Objects within 3D environments exhibit diverse shapes, scales, and colors, making precise instance-level identification a challenging task.
arXiv Detail & Related papers (2023-12-17T10:07:03Z) - SAI3D: Segment Any Instance in 3D Scenes [68.57002591841034]
We introduce SAI3D, a novel zero-shot 3D instance segmentation approach.
Our method partitions a 3D scene into geometric primitives, which are then progressively merged into 3D instance segmentations.
Empirical evaluations on ScanNet, Matterport3D and the more challenging ScanNet++ datasets demonstrate the superiority of our approach.
arXiv Detail & Related papers (2023-12-17T09:05:47Z) - MWSIS: Multimodal Weakly Supervised Instance Segmentation with 2D Box
Annotations for Autonomous Driving [13.08936676096554]
We propose a novel framework called Multimodal WeaklySupervised Instance (MWSIS)
MWSIS incorporates various fine-grained label generation and correction modules for both 2D and 3D modalities.
It outperforms fully supervised instance segmentation with only 5% fully supervised annotations.
arXiv Detail & Related papers (2023-12-12T05:12:22Z) - Mask-Attention-Free Transformer for 3D Instance Segmentation [68.29828726317723]
transformer-based methods have dominated 3D instance segmentation, where mask attention is commonly involved.
We develop a series of position-aware designs to overcome the low-recall issue and perform cross-attention by imposing positional prior.
Experiments show that our approach converges 4x faster than existing work, sets a new state of the art on ScanNetv2 3D instance segmentation benchmark, and also demonstrates superior performance across various datasets.
arXiv Detail & Related papers (2023-09-04T16:09:28Z) - Mask3D: Mask Transformer for 3D Semantic Instance Segmentation [89.41640045953378]
We show that we can leverage generic Transformer building blocks to directly predict instance masks from 3D point clouds.
Using Transformer decoders, the instance queries are learned by iteratively attending to point cloud features at multiple scales.
Mask3D sets a new state-of-the-art on ScanNet test (+6.2 mAP), S3DIS 6-fold (+10.1 mAP),LS3D (+11.2 mAP) and ScanNet200 test (+12.4 mAP)
arXiv Detail & Related papers (2022-10-06T17:55:09Z) - CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection [57.44434974289945]
We propose Contextualized Multi-Stage Refinement for 3D Object Detection (CMR3D) framework.
Our framework takes a 3D scene as input and strives to explicitly integrate useful contextual information of the scene.
In addition to 3D object detection, we investigate the effectiveness of our framework for the problem of 3D object counting.
arXiv Detail & Related papers (2022-09-13T05:26:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.