Related papers: Class-agnostic 3D Segmentation by Granularity-Consistent Automatic 2D Mask Tracking

Class-agnostic 3D Segmentation by Granularity-Consistent Automatic 2D Mask Tracking

URL: http://arxiv.org/abs/2511.00785v1
Date: Sun, 02 Nov 2025 03:52:42 GMT
Title: Class-agnostic 3D Segmentation by Granularity-Consistent Automatic 2D Mask Tracking
Authors: Juan Wang, Yasutomo Kawanishi, Tomo Miyazaki, Zhijie Wang, Shinichiro Omachi,
Abstract summary: We introduce a Granularity-Consistent automatic 2D Mask Tracking approach that maintains temporal correspondences across frames.<n>Our method effectively generated consistent and accurate 3D segmentations.
Score: 10.223105883919278
License: http://creativecommons.org/licenses/by/4.0/
Abstract: 3D instance segmentation is an important task for real-world applications. To avoid costly manual annotations, existing methods have explored generating pseudo labels by transferring 2D masks from foundation models to 3D. However, this approach is often suboptimal since the video frames are processed independently. This causes inconsistent segmentation granularity and conflicting 3D pseudo labels, which degrades the accuracy of final segmentation. To address this, we introduce a Granularity-Consistent automatic 2D Mask Tracking approach that maintains temporal correspondences across frames, eliminating conflicting pseudo labels. Combined with a three-stage curriculum learning framework, our approach progressively trains from fragmented single-view data to unified multi-view annotations, ultimately globally coherent full-scene supervision. This structured learning pipeline enables the model to progressively expose to pseudo-labels of increasing consistency. Thus, we can robustly distill a consistent 3D representation from initially fragmented and contradictory 2D priors. Experimental results demonstrated that our method effectively generated consistent and accurate 3D segmentations. Furthermore, the proposed method achieved state-of-the-art results on standard benchmarks and open-vocabulary ability.

Related papers

UniC-Lift: Unified 3D Instance Segmentation via Contrastive Learning [6.502142457981839]
3D Gaussian Splatting (3DGS) and Neural Radiance Fields (NeRF) have advanced novel-view synthesis.<n>Recent methods extend multi-view 2D segmentation to 3D, enabling instance/semantic segmentation for better scene understanding.<n>Key challenge is the inconsistency of 2D instance labels across views, leading to poor 3D predictions.<n>We propose a unified framework that merges these steps, reducing training time and improving performance by introducing a learnable feature embedding for segmentation in Gaussian primitives.
arXiv Detail & Related papers (2025-12-31T10:20:01Z)
BEEP3D: Box-Supervised End-to-End Pseudo-Mask Generation for 3D Instance Segmentation [28.97274092946373]
3D instance segmentation is crucial for understanding complex 3D environments, yet fully supervised methods require dense point-level annotations.<n>Box-level annotations inherently introduce ambiguity in overlapping regions, making accurate point-to-instance assignment challenging.<n>Recent methods address this ambiguity by generating pseudo-masks through training a dedicated pseudo-labeler in an additional training stage.<n>We propose BEEP3D-Box-supervised End-to-End Pseudo-mask generation for 3D instance segmentation.
arXiv Detail & Related papers (2025-10-14T06:23:18Z)
Integrating SAM Supervision for 3D Weakly Supervised Point Cloud Segmentation [66.65719382619538]
Current methods for 3D semantic segmentation propose training models with limited annotations to address the difficulty of annotating large, irregular, and unordered 3D point cloud data.<n>We present a novel approach that maximizes the utility of sparsely available 3D annotations incorporating segmentation masks generated by 2D foundation models.
arXiv Detail & Related papers (2025-08-27T14:13:01Z)
3D Can Be Explored In 2D: Pseudo-Label Generation for LiDAR Point Clouds Using Sensor-Intensity-Based 2D Semantic Segmentation [3.192308005611312]
We introduce a new 3D semantic segmentation pipeline that leverages aligned scenes and state-of-the-art 2D segmentation methods.<n>Our approach generates 2D views from LiDAR scans colored by sensor intensity and applies 2D semantic segmentation to these views.<n>The segmented 2D outputs are then back-projected onto the 3D points, with a simple voting-based estimator.
arXiv Detail & Related papers (2025-05-06T08:31:32Z)
OnlineAnySeg: Online Zero-Shot 3D Segmentation by Visual Foundation Model Guided 2D Mask Merging [36.9859733771263]
We propose an efficient method for lifting 2D masks into a unified 3D instance using a hashing technique.<n>By employing voxel hashing for efficient 3D scene querying, our approach reduces the time complexity of costly spatial overlap queries.<n>Our approach achieves state-of-the-art performance in online, zero-shot 3D instance segmentation with leading efficiency.
arXiv Detail & Related papers (2025-03-03T08:48:06Z)
Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding [59.51535163599723]
FreeGS is an unsupervised semantic-embedded 3DGS framework that achieves view-consistent 3D scene understanding without the need for 2D labels.<n>FreeGS performs comparably to state-of-the-art methods while avoiding the complex data preprocessing workload.
arXiv Detail & Related papers (2024-11-29T08:52:32Z)
Bayesian Self-Training for Semi-Supervised 3D Segmentation [59.544558398992386]
3D segmentation is a core problem in computer vision. densely labeling 3D point clouds to employ fully-supervised training remains too labor intensive and expensive. Semi-supervised training provides a more practical alternative, where only a small set of labeled data is given, accompanied by a larger unlabeled set.
arXiv Detail & Related papers (2024-09-12T14:54:31Z)
Decoupled Pseudo-labeling for Semi-Supervised Monocular 3D Object Detection [108.672972439282]
We introduce a novel decoupled pseudo-labeling (DPL) approach for SSM3OD. Our approach features a Decoupled Pseudo-label Generation (DPG) module, designed to efficiently generate pseudo-labels. We also present a DepthGradient Projection (DGP) module to mitigate optimization conflicts caused by noisy depth supervision of pseudo-labels.
arXiv Detail & Related papers (2024-03-26T05:12:18Z)
Weakly Supervised Monocular 3D Object Detection using Multi-View Projection and Direction Consistency [78.76508318592552]
Monocular 3D object detection has become a mainstream approach in automatic driving for its easy application. Most current methods still rely on 3D point cloud data for labeling the ground truths used in the training phase. We propose a new weakly supervised monocular 3D objection detection method, which can train the model with only 2D labels marked on images.
arXiv Detail & Related papers (2023-03-15T15:14:00Z)
Image Understands Point Cloud: Weakly Supervised 3D Semantic Segmentation via Association Learning [59.64695628433855]
We propose a novel cross-modality weakly supervised method for 3D segmentation, incorporating complementary information from unlabeled images. Basically, we design a dual-branch network equipped with an active labeling strategy, to maximize the power of tiny parts of labels. Our method even outperforms the state-of-the-art fully supervised competitors with less than 1% actively selected annotations.
arXiv Detail & Related papers (2022-09-16T07:59:04Z)
3D Guided Weakly Supervised Semantic Segmentation [27.269847900950943]
We propose a weakly supervised 2D semantic segmentation model by incorporating sparse bounding box labels with available 3D information. We manually labeled a subset of the 2D-3D Semantics(2D-3D-S) dataset with bounding boxes, and introduce our 2D-3D inference module to generate accurate pixel-wise segment proposal masks.
arXiv Detail & Related papers (2020-12-01T03:34:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.