MWSIS: Multimodal Weakly Supervised Instance Segmentation with 2D Box
Annotations for Autonomous Driving
- URL: http://arxiv.org/abs/2312.06988v4
- Date: Sun, 17 Dec 2023 07:06:56 GMT
- Title: MWSIS: Multimodal Weakly Supervised Instance Segmentation with 2D Box
Annotations for Autonomous Driving
- Authors: Guangfeng Jiang, Jun Liu, Yuzhi Wu, Wenlong Liao, Tao He, Pai Peng
- Abstract summary: We propose a novel framework called Multimodal WeaklySupervised Instance (MWSIS)
MWSIS incorporates various fine-grained label generation and correction modules for both 2D and 3D modalities.
It outperforms fully supervised instance segmentation with only 5% fully supervised annotations.
- Score: 13.08936676096554
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Instance segmentation is a fundamental research in computer vision,
especially in autonomous driving. However, manual mask annotation for instance
segmentation is quite time-consuming and costly. To address this problem, some
prior works attempt to apply weakly supervised manner by exploring 2D or 3D
boxes. However, no one has ever successfully segmented 2D and 3D instances
simultaneously by only using 2D box annotations, which could further reduce the
annotation cost by an order of magnitude. Thus, we propose a novel framework
called Multimodal Weakly Supervised Instance Segmentation (MWSIS), which
incorporates various fine-grained label generation and correction modules for
both 2D and 3D modalities to improve the quality of pseudo labels, along with a
new multimodal cross-supervision approach, named Consistency Sparse Cross-modal
Supervision (CSCS), to reduce the inconsistency of multimodal predictions by
response distillation. Particularly, transferring the 3D backbone to downstream
tasks not only improves the performance of the 3D detectors, but also
outperforms fully supervised instance segmentation with only 5% fully
supervised annotations. On the Waymo dataset, the proposed framework
demonstrates significant improvements over the baseline, especially achieving
2.59% mAP and 12.75% mAP increases for 2D and 3D instance segmentation tasks,
respectively. The code is available at
https://github.com/jiangxb98/mwsis-plugin.
Related papers
- MSTA3D: Multi-scale Twin-attention for 3D Instance Segmentation [7.400926717561454]
MSTA3D is a novel framework for superpoint-based 3D instance segmentation.
It exploits multi-scale feature representation and introduces a twin-attention mechanism to effectively capture them.
Our approach surpasses state-of-the-art 3D instance segmentation methods.
arXiv Detail & Related papers (2024-11-04T04:14:39Z) - Bayesian Self-Training for Semi-Supervised 3D Segmentation [59.544558398992386]
3D segmentation is a core problem in computer vision.
densely labeling 3D point clouds to employ fully-supervised training remains too labor intensive and expensive.
Semi-supervised training provides a more practical alternative, where only a small set of labeled data is given, accompanied by a larger unlabeled set.
arXiv Detail & Related papers (2024-09-12T14:54:31Z) - EmbodiedSAM: Online Segment Any 3D Thing in Real Time [61.2321497708998]
Embodied tasks require the agent to fully understand 3D scenes simultaneously with its exploration.
An online, real-time, fine-grained and highly-generalized 3D perception model is desperately needed.
arXiv Detail & Related papers (2024-08-21T17:57:06Z) - Instance Consistency Regularization for Semi-Supervised 3D Instance Segmentation [50.51125319374404]
We propose a novel self-training network InsTeacher3D to explore and exploit pure instance knowledge from unlabeled data.
Experimental results on multiple large-scale datasets show that the InsTeacher3D significantly outperforms prior state-of-the-art semi-supervised approaches.
arXiv Detail & Related papers (2024-06-24T16:35:58Z) - SAM-guided Graph Cut for 3D Instance Segmentation [60.75119991853605]
This paper addresses the challenge of 3D instance segmentation by simultaneously leveraging 3D geometric and multi-view image information.
We introduce a novel 3D-to-2D query framework to effectively exploit 2D segmentation models for 3D instance segmentation.
Our method achieves robust segmentation performance and can generalize across different types of scenes.
arXiv Detail & Related papers (2023-12-13T18:59:58Z) - DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance Fields [68.94868475824575]
This paper introduces a novel approach capable of generating infinite, high-quality 3D-consistent 2D annotations alongside 3D point cloud segmentations.
We leverage the strong semantic prior within a 3D generative model to train a semantic decoder.
Once trained, the decoder efficiently generalizes across the latent space, enabling the generation of infinite data.
arXiv Detail & Related papers (2023-11-18T21:58:28Z) - Weakly Supervised Monocular 3D Object Detection using Multi-View
Projection and Direction Consistency [78.76508318592552]
Monocular 3D object detection has become a mainstream approach in automatic driving for its easy application.
Most current methods still rely on 3D point cloud data for labeling the ground truths used in the training phase.
We propose a new weakly supervised monocular 3D objection detection method, which can train the model with only 2D labels marked on images.
arXiv Detail & Related papers (2023-03-15T15:14:00Z) - LWSIS: LiDAR-guided Weakly Supervised Instance Segmentation for
Autonomous Driving [34.119642131912485]
We present a more artful framework, LiDAR-guided Weakly Supervised Instance (LWSIS)
LWSIS uses the off-the-shelf 3D data, i.e., Point Cloud, together with the 3D boxes, as natural weak supervisions for training the 2D image instance segmentation models.
Our LWSIS not only exploits the complementary information in multimodal data during training, but also significantly reduces the cost of the dense 2D masks.
arXiv Detail & Related papers (2022-12-07T08:08:01Z) - ICM-3D: Instantiated Category Modeling for 3D Instance Segmentation [19.575077449759377]
We propose ICM-3D, a single-step method to segment 3D instances via instantiated categorization.
We conduct extensive experiments to verify the effectiveness of ICM-3D and show that it obtains inspiring performance across multiple frameworks, backbones and benchmarks.
arXiv Detail & Related papers (2021-08-26T13:08:37Z) - Multi-Modality Task Cascade for 3D Object Detection [22.131228757850373]
Many methods train two models in isolation and use simple feature concatenation to represent 3D sensor data.
We propose a novel Multi-Modality Task Cascade network (MTC-RCNN) that leverages 3D box proposals to improve 2D segmentation predictions.
We show that including a 2D network between two stages of 3D modules significantly improves both 2D and 3D task performance.
arXiv Detail & Related papers (2021-07-08T17:55:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.