MoPA: Multi-Modal Prior Aided Domain Adaptation for 3D Semantic
Segmentation
- URL: http://arxiv.org/abs/2309.11839v1
- Date: Thu, 21 Sep 2023 07:30:21 GMT
- Title: MoPA: Multi-Modal Prior Aided Domain Adaptation for 3D Semantic
Segmentation
- Authors: Haozhi Cao, Yuecong Xu, Jianfei Yang, Pengyu Yin, Shenghai Yuan, Lihua
Xie
- Abstract summary: Multi-modal unsupervised domain adaptation (MM-UDA) is a practical solution to embed semantic understanding in autonomous systems without expensive point-wise annotations.
Previous MM-UDA methods suffer from significant class-imbalanced performance, restricting their adoption in real applications.
We propose Multi-modal Prior Aided (MoPA) domain adaptation to improve the performance of rare objects.
- Score: 38.42077782990957
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Multi-modal unsupervised domain adaptation (MM-UDA) for 3D semantic
segmentation is a practical solution to embed semantic understanding in
autonomous systems without expensive point-wise annotations. While previous
MM-UDA methods can achieve overall improvement, they suffer from significant
class-imbalanced performance, restricting their adoption in real applications.
This imbalanced performance is mainly caused by: 1) self-training with
imbalanced data and 2) the lack of pixel-wise 2D supervision signals. In this
work, we propose Multi-modal Prior Aided (MoPA) domain adaptation to improve
the performance of rare objects. Specifically, we develop Valid Ground-based
Insertion (VGI) to rectify the imbalance supervision signals by inserting prior
rare objects collected from the wild while avoiding introducing artificial
artifacts that lead to trivial solutions. Meanwhile, our SAM consistency loss
leverages the 2D prior semantic masks from SAM as pixel-wise supervision
signals to encourage consistent predictions for each object in the semantic
mask. The knowledge learned from modal-specific prior is then shared across
modalities to achieve better rare object segmentation. Extensive experiments
show that our method achieves state-of-the-art performance on the challenging
MM-UDA benchmark. Code will be available at https://github.com/AronCao49/MoPA.
Related papers
- ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Prediction [89.89610257714006]
Existing methods prioritize higher accuracy to cater to the demands of these tasks.
We introduce a series of targeted improvements for 3D semantic occupancy prediction and flow estimation.
Our purelytemporalal architecture framework, named ALOcc, achieves an optimal tradeoff between speed and accuracy.
arXiv Detail & Related papers (2024-11-12T11:32:56Z) - Efficient Feature Aggregation and Scale-Aware Regression for Monocular 3D Object Detection [40.14197775884804]
MonoASRH is a novel monocular 3D detection framework composed of Efficient Hybrid Feature Aggregation Module (EH-FAM) and Adaptive Scale-Aware 3D Regression Head (ASRH)
EH-FAM employs multi-head attention with a global receptive field to extract semantic features for small-scale objects.
ASRH encodes 2D bounding box dimensions and then fuses scale features with the semantic features aggregated by EH-FAM.
arXiv Detail & Related papers (2024-11-05T02:33:25Z) - Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection [59.41026558455904]
We focus on multi-modal anomaly detection. Specifically, we investigate early multi-modal approaches that attempted to utilize models pre-trained on large-scale visual datasets.
We propose a Local-to-global Self-supervised Feature Adaptation (LSFA) method to finetune the adaptors and learn task-oriented representation toward anomaly detection.
arXiv Detail & Related papers (2024-01-06T07:30:41Z) - Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features.
Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
arXiv Detail & Related papers (2023-05-17T14:30:11Z) - Hierarchical Disentanglement-Alignment Network for Robust SAR Vehicle
Recognition [18.38295403066007]
HDANet integrates feature disentanglement and alignment into a unified framework.
The proposed method demonstrates impressive robustness across nine operating conditions in the MSTAR dataset.
arXiv Detail & Related papers (2023-04-07T09:11:29Z) - ADAS: A Simple Active-and-Adaptive Baseline for Cross-Domain 3D Semantic
Segmentation [38.66509154973051]
We propose an Active-and-Adaptive (ADAS) baseline to enhance the weak cross-domain generalization ability of a well-trained 3D segmentation model.
ADAS performs an active sampling operation to select a maximally-informative subset from both source and target domains for effective adaptation.
ADAS is verified to be effective in many cross-domain settings including: 1) Unsupervised Domain Adaptation (UDA), which means that all samples from target domain are unlabeled; 2) Unsupervised Few-shot Domain Adaptation (UFDA), which means that only a few unlabeled samples are available in the unlabeled target domain.
arXiv Detail & Related papers (2022-12-20T16:17:40Z) - Unsupervised Domain Adaptation for Monocular 3D Object Detection via
Self-Training [57.25828870799331]
We propose STMono3D, a new self-teaching framework for unsupervised domain adaptation on Mono3D.
We develop a teacher-student paradigm to generate adaptive pseudo labels on the target domain.
STMono3D achieves remarkable performance on all evaluated datasets and even surpasses fully supervised results on the KITTI 3D object detection dataset.
arXiv Detail & Related papers (2022-04-25T12:23:07Z) - Plugging Self-Supervised Monocular Depth into Unsupervised Domain
Adaptation for Semantic Segmentation [19.859764556851434]
We propose to exploit self-supervised monocular depth estimation to improve UDA for semantic segmentation.
Our whole proposal allows for achieving state-of-the-art performance (58.8 mIoU) in the GTA5->CS benchmark benchmark.
arXiv Detail & Related papers (2021-10-13T12:48:51Z) - End-to-End Object Detection with Fully Convolutional Network [71.56728221604158]
We introduce a Prediction-aware One-To-One (POTO) label assignment for classification to enable end-to-end detection.
A simple 3D Max Filtering (3DMF) is proposed to utilize the multi-scale features and improve the discriminability of convolutions in the local region.
Our end-to-end framework achieves competitive performance against many state-of-the-art detectors with NMS on COCO and CrowdHuman datasets.
arXiv Detail & Related papers (2020-12-07T09:14:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.