Related papers: MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation

MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation

URL: http://arxiv.org/abs/2204.12667v1
Date: Wed, 27 Apr 2022 02:28:12 GMT
Title: MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation
Authors: Inkyu Shin, Yi-Hsuan Tsai, Bingbing Zhuang, Samuel Schulter, Buyu Liu, Sparsh Garg, In So Kweon, Kuk-Jin Yoon
Abstract summary: We propose and explore a new multi-modal extension of test-time adaptation for 3D semantic segmentation. To design a framework that can take full advantage of multi-modality, each modality provides regularized self-supervisory signals to other modalities. Our regularized pseudo labels produce stable self-learning signals in numerous multi-modal test-time adaptation scenarios.
Score: 104.48766162008815
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Test-time adaptation approaches have recently emerged as a practical solution for handling domain shift without access to the source domain data. In this paper, we propose and explore a new multi-modal extension of test-time adaptation for 3D semantic segmentation. We find that directly applying existing methods usually results in performance instability at test time because multi-modal input is not considered jointly. To design a framework that can take full advantage of multi-modality, where each modality provides regularized self-supervisory signals to other modalities, we propose two complementary modules within and across the modalities. First, Intra-modal Pseudolabel Generation (Intra-PG) is introduced to obtain reliable pseudo labels within each modality by aggregating information from two models that are both pre-trained on source data but updated with target data at different paces. Second, Inter-modal Pseudo-label Refinement (Inter-PR) adaptively selects more reliable pseudo labels from different modalities based on a proposed consistency scheme. Experiments demonstrate that our regularized pseudo labels produce stable self-learning signals in numerous multi-modal test-time adaptation scenarios for 3D semantic segmentation. Visit our project website at https://www.nec-labs.com/~mas/MM-TTA.

Related papers

SAM-guided Pseudo Label Enhancement for Multi-modal 3D Semantic Segmentation [16.019735682706163]
Multi-modal 3D semantic segmentation is vital for applications such as autonomous driving and virtual reality (VR) To effectively deploy these models in real-world scenarios, it is essential to employ cross-domain adaptation techniques. Self-training with pseudo-labels has emerged as a predominant method for cross-domain adaptation in 3D semantic segmentation.
arXiv Detail & Related papers (2025-02-02T23:52:37Z)
MIFNet: Learning Modality-Invariant Features for Generalizable Multimodal Image Matching [54.740256498985026]
Keypoint detection and description methods often struggle with multimodal data. We propose a modality-invariant feature learning network (MIFNet) to compute modality-invariant features for keypoint descriptions in multimodal image matching.
arXiv Detail & Related papers (2025-01-20T06:56:30Z)
SM3Det: A Unified Model for Multi-Modal Remote Sensing Object Detection [73.49799596304418]
This paper introduces a new task called Multi-Modal datasets and Multi-Task Object Detection (M2Det) for remote sensing. It is designed to accurately detect horizontal or oriented objects from any sensor modality. This task poses challenges due to 1) the trade-offs involved in managing multi-modal modelling and 2) the complexities of multi-task optimization.
arXiv Detail & Related papers (2024-12-30T02:47:51Z)
Bridging the Gap for Test-Time Multimodal Sentiment Analysis [7.871669754963032]
Multimodal sentiment analysis (MSA) is an emerging research topic that aims to understand and recognize human sentiment or emotions through multiple modalities. In this paper, we propose two strategies: Contrastive Adaptation and Stable Pseudo-label generation (CASP) for test-time adaptation for MSA.
arXiv Detail & Related papers (2024-12-10T02:26:33Z)
Multimodality Helps Few-Shot 3D Point Cloud Semantic Segmentation [61.91492500828508]
Few-shot 3D point cloud segmentation (FS-PCS) aims at generalizing models to segment novel categories with minimal support samples. We introduce a cost-free multimodal FS-PCS setup, utilizing textual labels and the potentially available 2D image modality. We propose a simple yet effective Test-time Adaptive Cross-modal Seg (TACC) technique to mitigate training bias.
arXiv Detail & Related papers (2024-10-29T19:28:41Z)
Dual Prototype Evolving for Test-Time Generalization of Vision-Language Models [11.545127156146368]
We introduce Dual Prototype Evolving (DPE), a novel test-time adaptation approach for pre-trained vision-language models (VLMs) We create and evolve two sets of prototypes--textual and visual--to progressively capture more accurate multi-modal representations for target classes during test time. Our proposed DPE consistently outperforms previous state-of-the-art methods while also exhibiting competitive computational efficiency.
arXiv Detail & Related papers (2024-10-16T17:59:49Z)
Uni$^2$Det: Unified and Universal Framework for Prompt-Guided Multi-dataset 3D Detection [64.08296187555095]
Uni$2$Det is a framework for unified and universal multi-dataset training on 3D detection. We introduce multi-stage prompting modules for multi-dataset 3D detection. Results on zero-shot cross-dataset transfer validate the generalization capability of our proposed method.
arXiv Detail & Related papers (2024-09-30T17:57:50Z)
UniTTA: Unified Benchmark and Versatile Framework Towards Realistic Test-Time Adaptation [66.05528698010697]
Test-Time Adaptation aims to adapt pre-trained models to the target domain during testing. Researchers have identified various challenging scenarios and developed diverse methods to address these challenges. We propose a Unified Test-Time Adaptation benchmark, which is comprehensive and widely applicable.
arXiv Detail & Related papers (2024-07-29T15:04:53Z)
Adaptive Test-Time Personalization for Federated Learning [51.25437606915392]
We introduce a novel setting called test-time personalized federated learning (TTPFL) In TTPFL, clients locally adapt a global model in an unsupervised way without relying on any labeled data during test-time. We propose a novel algorithm called ATP to adaptively learn the adaptation rates for each module in the model from distribution shifts among source domains.
arXiv Detail & Related papers (2023-10-28T20:42:47Z)
Multi-Modal Continual Test-Time Adaptation for 3D Semantic Segmentation [26.674085603033742]
Continual Test-Time Adaptation (CTTA) generalizes conventional Test-Time Adaptation (TTA) by assuming that the target domain is dynamic over time rather than stationary. In this paper, we explore Multi-Modal Continual Test-Time Adaptation (MM-CTTA) as a new extension of CTTA for 3D semantic segmentation.
arXiv Detail & Related papers (2023-03-18T16:51:19Z)
Semi-Supervised Multi-Modal Multi-Instance Multi-Label Deep Network with Optimal Transport [24.930976128926314]
We propose a novel Multi-modal Multi-instance Multi-label Deep Network (M3DN) M3DN considers M3 learning in an end-to-end multi-modal deep network and utilizes consistency principle among different modal bag-level predictions. Thereby M3DNS can better predict label and exploit label correlation simultaneously.
arXiv Detail & Related papers (2021-04-17T09:18:28Z)
Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings. We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data. We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.