Multi-Modal Domain Adaptation for Fine-Grained Action Recognition
- URL: http://arxiv.org/abs/2001.09691v2
- Date: Thu, 19 Mar 2020 16:16:01 GMT
- Title: Multi-Modal Domain Adaptation for Fine-Grained Action Recognition
- Authors: Jonathan Munro and Dima Damen
- Abstract summary: We exploit the correspondence of modalities as a self-supervised alignment approach for UDA.
We show that multi-modal self-supervision alone improves the performance over source-only training by 2.4% on average.
We then combine adversarial training with multi-modal self-supervision, showing that our approach outperforms other UDA methods by 3%.
- Score: 35.22906271819216
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fine-grained action recognition datasets exhibit environmental bias, where
multiple video sequences are captured from a limited number of environments.
Training a model in one environment and deploying in another results in a drop
in performance due to an unavoidable domain shift. Unsupervised Domain
Adaptation (UDA) approaches have frequently utilised adversarial training
between the source and target domains. However, these approaches have not
explored the multi-modal nature of video within each domain. In this work we
exploit the correspondence of modalities as a self-supervised alignment
approach for UDA in addition to adversarial alignment.
We test our approach on three kitchens from our large-scale dataset,
EPIC-Kitchens, using two modalities commonly employed for action recognition:
RGB and Optical Flow. We show that multi-modal self-supervision alone improves
the performance over source-only training by 2.4% on average. We then combine
adversarial training with multi-modal self-supervision, showing that our
approach outperforms other UDA methods by 3%.
Related papers
- Multimodal 3D Object Detection on Unseen Domains [37.142470149311904]
Domain adaptation approaches assume access to unannotated samples from the test distribution to address this problem.
We propose CLIX$text3D$, a multimodal fusion and supervised contrastive learning framework for 3D object detection.
We show that CLIX$text3D$ yields state-of-the-art domain generalization performance under multiple dataset shifts.
arXiv Detail & Related papers (2024-04-17T21:47:45Z) - CMDA: Cross-Modal and Domain Adversarial Adaptation for LiDAR-Based 3D
Object Detection [14.063365469339812]
LiDAR-based 3D Object Detection methods often do not generalize well to target domains outside the source (or training) data distribution.
We introduce a novel unsupervised domain adaptation (UDA) method, called CMDA, which leverages visual semantic cues from an image modality.
We also introduce a self-training-based learning strategy, wherein a model is adversarially trained to generate domain-invariant features.
arXiv Detail & Related papers (2024-03-06T14:12:38Z) - Revisiting the Domain Shift and Sample Uncertainty in Multi-source
Active Domain Transfer [69.82229895838577]
Active Domain Adaptation (ADA) aims to maximally boost model adaptation in a new target domain by actively selecting a limited number of target data to annotate.
This setting neglects the more practical scenario where training data are collected from multiple sources.
This motivates us to target a new and challenging setting of knowledge transfer that extends ADA from a single source domain to multiple source domains.
arXiv Detail & Related papers (2023-11-21T13:12:21Z) - Dynamic Domain Discrepancy Adjustment for Active Multi-Domain Adaptation [3.367755441623275]
Multi-source unsupervised domain adaptation (MUDA) aims to transfer knowledge from related source domains to an unlabeled target domain.
We propose a novel approach called Dynamic Domain Discrepancy Adjustment for Active Multi-Domain Adaptation (D3AAMDA)
This mechanism controls the alignment level of features between each source domain and the target domain, effectively leveraging the local advantageous feature information within the source domains.
arXiv Detail & Related papers (2023-07-26T09:40:19Z) - Multi-Prompt Alignment for Multi-Source Unsupervised Domain Adaptation [86.02485817444216]
We introduce Multi-Prompt Alignment (MPA), a simple yet efficient framework for multi-source UDA.
MPA denoises the learned prompts through an auto-encoding process and aligns them by maximizing the agreement of all the reconstructed prompts.
Experiments show that MPA achieves state-of-the-art results on three popular datasets with an impressive average accuracy of 54.1% on DomainNet.
arXiv Detail & Related papers (2022-09-30T03:40:10Z) - Multi-Source domain adaptation via supervised contrastive learning and
confident consistency regularization [0.0]
Multi-Source Unsupervised Domain Adaptation (multi-source UDA) aims to learn a model from several labeled source domains.
We propose Contrastive Multi-Source Domain Adaptation (CMSDA) for multi-source UDA that addresses this limitation.
arXiv Detail & Related papers (2021-06-30T14:39:15Z) - Surprisingly Simple Semi-Supervised Domain Adaptation with Pretraining
and Consistency [93.89773386634717]
Visual domain adaptation involves learning to classify images from a target visual domain using labels available in a different source domain.
We show that in the presence of a few target labels, simple techniques like self-supervision (via rotation prediction) and consistency regularization can be effective without any adversarial alignment to learn a good target classifier.
Our Pretraining and Consistency (PAC) approach, can achieve state of the art accuracy on this semi-supervised domain adaptation task, surpassing multiple adversarial domain alignment methods, across multiple datasets.
arXiv Detail & Related papers (2021-01-29T18:40:17Z) - Multi-Domain Adversarial Feature Generalization for Person
Re-Identification [52.835955258959785]
We propose a multi-dataset feature generalization network (MMFA-AAE)
It is capable of learning a universal domain-invariant feature representation from multiple labeled datasets and generalizing it to unseen' camera systems.
It also surpasses many state-of-the-art supervised methods and unsupervised domain adaptation methods by a large margin.
arXiv Detail & Related papers (2020-11-25T08:03:15Z) - FixBi: Bridging Domain Spaces for Unsupervised Domain Adaptation [26.929772844572213]
We introduce a fixed ratio-based mixup to augment multiple intermediate domains between the source and target domain.
We train the source-dominant model and the target-dominant model that have complementary characteristics.
Through our proposed methods, the models gradually transfer domain knowledge from the source to the target domain.
arXiv Detail & Related papers (2020-11-18T11:58:19Z) - Multi-path Neural Networks for On-device Multi-domain Visual
Classification [55.281139434736254]
This paper proposes a novel approach to automatically learn a multi-path network for multi-domain visual classification on mobile devices.
The proposed multi-path network is learned from neural architecture search by applying one reinforcement learning controller for each domain to select the best path in the super-network created from a MobileNetV3-like search space.
The determined multi-path model selectively shares parameters across domains in shared nodes while keeping domain-specific parameters within non-shared nodes in individual domain paths.
arXiv Detail & Related papers (2020-10-10T05:13:49Z) - Adversarial Bipartite Graph Learning for Video Domain Adaptation [50.68420708387015]
Domain adaptation techniques, which focus on adapting models between distributionally different domains, are rarely explored in the video recognition area.
Recent works on visual domain adaptation which leverage adversarial learning to unify the source and target video representations are not highly effective on the videos.
This paper proposes an Adversarial Bipartite Graph (ABG) learning framework which directly models the source-target interactions.
arXiv Detail & Related papers (2020-07-31T03:48:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.