Related papers: BEEP3D: Box-Supervised End-to-End Pseudo-Mask Generation for 3D Instance Segmentation

BEEP3D: Box-Supervised End-to-End Pseudo-Mask Generation for 3D Instance Segmentation

URL: http://arxiv.org/abs/2510.12182v1
Date: Tue, 14 Oct 2025 06:23:18 GMT
Title: BEEP3D: Box-Supervised End-to-End Pseudo-Mask Generation for 3D Instance Segmentation
Authors: Youngju Yoo, Seho Kim, Changick Kim,
Abstract summary: 3D instance segmentation is crucial for understanding complex 3D environments, yet fully supervised methods require dense point-level annotations.<n>Box-level annotations inherently introduce ambiguity in overlapping regions, making accurate point-to-instance assignment challenging.<n>Recent methods address this ambiguity by generating pseudo-masks through training a dedicated pseudo-labeler in an additional training stage.<n>We propose BEEP3D-Box-supervised End-to-End Pseudo-mask generation for 3D instance segmentation.
Score: 28.97274092946373
License: http://creativecommons.org/licenses/by/4.0/
Abstract: 3D instance segmentation is crucial for understanding complex 3D environments, yet fully supervised methods require dense point-level annotations, resulting in substantial annotation costs and labor overhead. To mitigate this, box-level annotations have been explored as a weaker but more scalable form of supervision. However, box annotations inherently introduce ambiguity in overlapping regions, making accurate point-to-instance assignment challenging. Recent methods address this ambiguity by generating pseudo-masks through training a dedicated pseudo-labeler in an additional training stage. However, such two-stage pipelines often increase overall training time and complexity, hinder end-to-end optimization. To overcome these challenges, we propose BEEP3D-Box-supervised End-to-End Pseudo-mask generation for 3D instance segmentation. BEEP3D adopts a student-teacher framework, where the teacher model serves as a pseudo-labeler and is updated by the student model via an Exponential Moving Average. To better guide the teacher model to generate precise pseudo-masks, we introduce an instance center-based query refinement that enhances position query localization and leverages features near instance centers. Additionally, we design two novel losses-query consistency loss and masked feature consistency loss-to align semantic and geometric signals between predictions and pseudo-masks. Extensive experiments on ScanNetV2 and S3DIS datasets demonstrate that BEEP3D achieves competitive or superior performance compared to state-of-the-art weakly supervised methods while remaining computationally efficient.

Related papers

UniC-Lift: Unified 3D Instance Segmentation via Contrastive Learning [6.502142457981839]
3D Gaussian Splatting (3DGS) and Neural Radiance Fields (NeRF) have advanced novel-view synthesis.<n>Recent methods extend multi-view 2D segmentation to 3D, enabling instance/semantic segmentation for better scene understanding.<n>Key challenge is the inconsistency of 2D instance labels across views, leading to poor 3D predictions.<n>We propose a unified framework that merges these steps, reducing training time and improving performance by introducing a learnable feature embedding for segmentation in Gaussian primitives.
arXiv Detail & Related papers (2025-12-31T10:20:01Z)
DBGroup: Dual-Branch Point Grouping for Weakly Supervised 3D Instance Segmentation [12.044632781901088]
Weakly supervised 3D instance segmentation is essential for 3D scene understanding.<n>Existing methods rely on two forms of weak supervision: one-thing-one-click annotations and bounding box annotations.<n>We propose textbfDBGroup, a two-stage weakly supervised 3D instance segmentation framework.
arXiv Detail & Related papers (2025-11-13T06:12:13Z)
Class-agnostic 3D Segmentation by Granularity-Consistent Automatic 2D Mask Tracking [10.223105883919278]
We introduce a Granularity-Consistent automatic 2D Mask Tracking approach that maintains temporal correspondences across frames.<n>Our method effectively generated consistent and accurate 3D segmentations.
arXiv Detail & Related papers (2025-11-02T03:52:42Z)
SGS-3D: High-Fidelity 3D Instance Segmentation via Reliable Semantic Mask Splitting and Growing [20.383892902000976]
We propose splitting and growing reliable semantic masks for high-fidelity 3D instance segmentation (SGS-3D)<n>For semantic guidance, we introduce a mask filtering strategy that leverages the co-occurrence of 3D geometry primitives.<n>For the geometric refinement, we construct fine-grained object instances by exploiting both spatial continuity and high-level features.
arXiv Detail & Related papers (2025-09-05T14:37:31Z)
Ambiguity-aware Point Cloud Segmentation by Adaptive Margin Contrastive Learning [65.94127546086156]
We propose an adaptive margin contrastive learning method for semantic segmentation on point clouds.<n>We first design AMContrast3D, a method comprising contrastive learning into an ambiguity estimation framework.<n>Inspired by the insight of joint training, we propose AMContrast3D++ integrating with two branches trained in parallel.
arXiv Detail & Related papers (2025-07-09T07:00:32Z)
SP3D: Boosting Sparsely-Supervised 3D Object Detection via Accurate Cross-Modal Semantic Prompts [13.349110509879312]
sparsely-supervised 3D object detection has gained great attention, achieving performance close to fully-supervised 3D objectors.<n>We propose a boosting strategy, termed SP3D, to boost the 3D detector with robust feature discrimination capability under sparse annotation settings.<n> Experiments have validated that SP3D can enhance the performance of sparsely supervised detectors by a large margin under meager labeling conditions.
arXiv Detail & Related papers (2025-03-09T06:08:04Z)
Bayesian Self-Training for Semi-Supervised 3D Segmentation [59.544558398992386]
3D segmentation is a core problem in computer vision. densely labeling 3D point clouds to employ fully-supervised training remains too labor intensive and expensive. Semi-supervised training provides a more practical alternative, where only a small set of labeled data is given, accompanied by a larger unlabeled set.
arXiv Detail & Related papers (2024-09-12T14:54:31Z)
Decoupled Pseudo-labeling for Semi-Supervised Monocular 3D Object Detection [108.672972439282]
We introduce a novel decoupled pseudo-labeling (DPL) approach for SSM3OD. Our approach features a Decoupled Pseudo-label Generation (DPG) module, designed to efficiently generate pseudo-labels. We also present a DepthGradient Projection (DGP) module to mitigate optimization conflicts caused by noisy depth supervision of pseudo-labels.
arXiv Detail & Related papers (2024-03-26T05:12:18Z)
AutoInst: Automatic Instance-Based Segmentation of LiDAR 3D Scans [41.17467024268349]
Making sense of 3D environments requires fine-grained scene understanding. We propose to predict instance segmentations for 3D scenes in an unsupervised way. Our approach attains 13.3% higher Average Precision and 9.1% higher F1 score compared to the best-performing baseline.
arXiv Detail & Related papers (2024-03-24T22:53:16Z)
Diffusion-SS3D: Diffusion Model for Semi-supervised 3D Object Detection [77.23918785277404]
We present Diffusion-SS3D, a new perspective of enhancing the quality of pseudo-labels via the diffusion model for semi-supervised 3D object detection. Specifically, we include noises to produce corrupted 3D object size and class label, distributions, and then utilize the diffusion model as a denoising process to obtain bounding box outputs. We conduct experiments on the ScanNet and SUN RGB-D benchmark datasets to demonstrate that our approach achieves state-of-the-art performance against existing methods.
arXiv Detail & Related papers (2023-12-05T18:54:03Z)
Collaborative Propagation on Multiple Instance Graphs for 3D Instance Segmentation with Single-point Supervision [63.429704654271475]
We propose a novel weakly supervised method RWSeg that only requires labeling one object with one point. With these sparse weak labels, we introduce a unified framework with two branches to propagate semantic and instance information. Specifically, we propose a Cross-graph Competing Random Walks (CRW) algorithm that encourages competition among different instance graphs.
arXiv Detail & Related papers (2022-08-10T02:14:39Z)
SESS: Self-Ensembling Semi-Supervised 3D Object Detection [138.80825169240302]
We propose SESS, a self-ensembling semi-supervised 3D object detection framework. Specifically, we design a thorough perturbation scheme to enhance generalization of the network on unlabeled and new unseen data. Our SESS achieves competitive performance compared to the state-of-the-art fully-supervised method by using only 50% labeled data.
arXiv Detail & Related papers (2019-12-26T08:48:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.