BEEP3D: Box-Supervised End-to-End Pseudo-Mask Generation for 3D Instance Segmentation
- URL: http://arxiv.org/abs/2510.12182v1
- Date: Tue, 14 Oct 2025 06:23:18 GMT
- Title: BEEP3D: Box-Supervised End-to-End Pseudo-Mask Generation for 3D Instance Segmentation
- Authors: Youngju Yoo, Seho Kim, Changick Kim,
- Abstract summary: 3D instance segmentation is crucial for understanding complex 3D environments, yet fully supervised methods require dense point-level annotations.<n>Box-level annotations inherently introduce ambiguity in overlapping regions, making accurate point-to-instance assignment challenging.<n>Recent methods address this ambiguity by generating pseudo-masks through training a dedicated pseudo-labeler in an additional training stage.<n>We propose BEEP3D-Box-supervised End-to-End Pseudo-mask generation for 3D instance segmentation.
- Score: 28.97274092946373
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: 3D instance segmentation is crucial for understanding complex 3D environments, yet fully supervised methods require dense point-level annotations, resulting in substantial annotation costs and labor overhead. To mitigate this, box-level annotations have been explored as a weaker but more scalable form of supervision. However, box annotations inherently introduce ambiguity in overlapping regions, making accurate point-to-instance assignment challenging. Recent methods address this ambiguity by generating pseudo-masks through training a dedicated pseudo-labeler in an additional training stage. However, such two-stage pipelines often increase overall training time and complexity, hinder end-to-end optimization. To overcome these challenges, we propose BEEP3D-Box-supervised End-to-End Pseudo-mask generation for 3D instance segmentation. BEEP3D adopts a student-teacher framework, where the teacher model serves as a pseudo-labeler and is updated by the student model via an Exponential Moving Average. To better guide the teacher model to generate precise pseudo-masks, we introduce an instance center-based query refinement that enhances position query localization and leverages features near instance centers. Additionally, we design two novel losses-query consistency loss and masked feature consistency loss-to align semantic and geometric signals between predictions and pseudo-masks. Extensive experiments on ScanNetV2 and S3DIS datasets demonstrate that BEEP3D achieves competitive or superior performance compared to state-of-the-art weakly supervised methods while remaining computationally efficient.
Related papers
- UniC-Lift: Unified 3D Instance Segmentation via Contrastive Learning [6.502142457981839]
3D Gaussian Splatting (3DGS) and Neural Radiance Fields (NeRF) have advanced novel-view synthesis.<n>Recent methods extend multi-view 2D segmentation to 3D, enabling instance/semantic segmentation for better scene understanding.<n>Key challenge is the inconsistency of 2D instance labels across views, leading to poor 3D predictions.<n>We propose a unified framework that merges these steps, reducing training time and improving performance by introducing a learnable feature embedding for segmentation in Gaussian primitives.
arXiv Detail & Related papers (2025-12-31T10:20:01Z) - DBGroup: Dual-Branch Point Grouping for Weakly Supervised 3D Instance Segmentation [12.044632781901088]
Weakly supervised 3D instance segmentation is essential for 3D scene understanding.<n>Existing methods rely on two forms of weak supervision: one-thing-one-click annotations and bounding box annotations.<n>We propose textbfDBGroup, a two-stage weakly supervised 3D instance segmentation framework.
arXiv Detail & Related papers (2025-11-13T06:12:13Z) - Class-agnostic 3D Segmentation by Granularity-Consistent Automatic 2D Mask Tracking [10.223105883919278]
We introduce a Granularity-Consistent automatic 2D Mask Tracking approach that maintains temporal correspondences across frames.<n>Our method effectively generated consistent and accurate 3D segmentations.
arXiv Detail & Related papers (2025-11-02T03:52:42Z) - SGS-3D: High-Fidelity 3D Instance Segmentation via Reliable Semantic Mask Splitting and Growing [20.383892902000976]
We propose splitting and growing reliable semantic masks for high-fidelity 3D instance segmentation (SGS-3D)<n>For semantic guidance, we introduce a mask filtering strategy that leverages the co-occurrence of 3D geometry primitives.<n>For the geometric refinement, we construct fine-grained object instances by exploiting both spatial continuity and high-level features.
arXiv Detail & Related papers (2025-09-05T14:37:31Z) - Ambiguity-aware Point Cloud Segmentation by Adaptive Margin Contrastive Learning [65.94127546086156]
We propose an adaptive margin contrastive learning method for semantic segmentation on point clouds.<n>We first design AMContrast3D, a method comprising contrastive learning into an ambiguity estimation framework.<n>Inspired by the insight of joint training, we propose AMContrast3D++ integrating with two branches trained in parallel.
arXiv Detail & Related papers (2025-07-09T07:00:32Z) - SP3D: Boosting Sparsely-Supervised 3D Object Detection via Accurate Cross-Modal Semantic Prompts [13.349110509879312]
sparsely-supervised 3D object detection has gained great attention, achieving performance close to fully-supervised 3D objectors.<n>We propose a boosting strategy, termed SP3D, to boost the 3D detector with robust feature discrimination capability under sparse annotation settings.<n> Experiments have validated that SP3D can enhance the performance of sparsely supervised detectors by a large margin under meager labeling conditions.
arXiv Detail & Related papers (2025-03-09T06:08:04Z) - Bayesian Self-Training for Semi-Supervised 3D Segmentation [59.544558398992386]
3D segmentation is a core problem in computer vision.
densely labeling 3D point clouds to employ fully-supervised training remains too labor intensive and expensive.
Semi-supervised training provides a more practical alternative, where only a small set of labeled data is given, accompanied by a larger unlabeled set.
arXiv Detail & Related papers (2024-09-12T14:54:31Z) - Decoupled Pseudo-labeling for Semi-Supervised Monocular 3D Object Detection [108.672972439282]
We introduce a novel decoupled pseudo-labeling (DPL) approach for SSM3OD.
Our approach features a Decoupled Pseudo-label Generation (DPG) module, designed to efficiently generate pseudo-labels.
We also present a DepthGradient Projection (DGP) module to mitigate optimization conflicts caused by noisy depth supervision of pseudo-labels.
arXiv Detail & Related papers (2024-03-26T05:12:18Z) - AutoInst: Automatic Instance-Based Segmentation of LiDAR 3D Scans [41.17467024268349]
Making sense of 3D environments requires fine-grained scene understanding.
We propose to predict instance segmentations for 3D scenes in an unsupervised way.
Our approach attains 13.3% higher Average Precision and 9.1% higher F1 score compared to the best-performing baseline.
arXiv Detail & Related papers (2024-03-24T22:53:16Z) - Diffusion-SS3D: Diffusion Model for Semi-supervised 3D Object Detection [77.23918785277404]
We present Diffusion-SS3D, a new perspective of enhancing the quality of pseudo-labels via the diffusion model for semi-supervised 3D object detection.
Specifically, we include noises to produce corrupted 3D object size and class label, distributions, and then utilize the diffusion model as a denoising process to obtain bounding box outputs.
We conduct experiments on the ScanNet and SUN RGB-D benchmark datasets to demonstrate that our approach achieves state-of-the-art performance against existing methods.
arXiv Detail & Related papers (2023-12-05T18:54:03Z) - Collaborative Propagation on Multiple Instance Graphs for 3D Instance
Segmentation with Single-point Supervision [63.429704654271475]
We propose a novel weakly supervised method RWSeg that only requires labeling one object with one point.
With these sparse weak labels, we introduce a unified framework with two branches to propagate semantic and instance information.
Specifically, we propose a Cross-graph Competing Random Walks (CRW) algorithm that encourages competition among different instance graphs.
arXiv Detail & Related papers (2022-08-10T02:14:39Z) - SESS: Self-Ensembling Semi-Supervised 3D Object Detection [138.80825169240302]
We propose SESS, a self-ensembling semi-supervised 3D object detection framework. Specifically, we design a thorough perturbation scheme to enhance generalization of the network on unlabeled and new unseen data.
Our SESS achieves competitive performance compared to the state-of-the-art fully-supervised method by using only 50% labeled data.
arXiv Detail & Related papers (2019-12-26T08:48:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.