Learning Cross-modal Contrastive Features for Video Domain Adaptation
- URL: http://arxiv.org/abs/2108.11974v1
- Date: Thu, 26 Aug 2021 18:14:18 GMT
- Title: Learning Cross-modal Contrastive Features for Video Domain Adaptation
- Authors: Donghyun Kim, Yi-Hsuan Tsai, Bingbing Zhuang, Xiang Yu, Stan Sclaroff,
Kate Saenko, Manmohan Chandraker
- Abstract summary: We propose a unified framework for video domain adaptation, which simultaneously regularizes cross-modal and cross-domain feature representations.
Specifically, we treat each modality in a domain as a view and leverage the contrastive learning technique with properly designed sampling strategies.
- Score: 138.75196499580804
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning transferable and domain adaptive feature representations from videos
is important for video-relevant tasks such as action recognition. Existing
video domain adaptation methods mainly rely on adversarial feature alignment,
which has been derived from the RGB image space. However, video data is usually
associated with multi-modal information, e.g., RGB and optical flow, and thus
it remains a challenge to design a better method that considers the cross-modal
inputs under the cross-domain adaptation setting. To this end, we propose a
unified framework for video domain adaptation, which simultaneously regularizes
cross-modal and cross-domain feature representations. Specifically, we treat
each modality in a domain as a view and leverage the contrastive learning
technique with properly designed sampling strategies. As a result, our
objectives regularize feature spaces, which originally lack the connection
across modalities or have less alignment across domains. We conduct experiments
on domain adaptive action recognition benchmark datasets, i.e., UCF, HMDB, and
EPIC-Kitchens, and demonstrate the effectiveness of our components against
state-of-the-art algorithms.
Related papers
- Improving Anomaly Segmentation with Multi-Granularity Cross-Domain
Alignment [17.086123737443714]
Anomaly segmentation plays a pivotal role in identifying atypical objects in images, crucial for hazard detection in autonomous driving systems.
While existing methods demonstrate noteworthy results on synthetic data, they often fail to consider the disparity between synthetic and real-world data domains.
We introduce the Multi-Granularity Cross-Domain Alignment framework, tailored to harmonize features across domains at both the scene and individual sample levels.
arXiv Detail & Related papers (2023-08-16T22:54:49Z) - Cross-Modality Domain Adaptation for Freespace Detection: A Simple yet
Effective Baseline [21.197212665408262]
Freespace detection aims at classifying each pixel of the image captured by the camera as drivable or non-drivable.
We develop a cross-modality domain adaptation framework which exploits both RGB images and surface normal maps generated from depth images.
To better bridge the domain gap between source domain (synthetic data) and target domain (real-world data), we also propose a Selective Feature Alignment (SFA) module.
arXiv Detail & Related papers (2022-10-06T15:31:49Z) - Contrast and Mix: Temporal Contrastive Video Domain Adaptation with
Background Mixing [55.73722120043086]
We introduce Contrast and Mix (CoMix), a new contrastive learning framework that aims to learn discriminative invariant feature representations for unsupervised video domain adaptation.
First, we utilize temporal contrastive learning to bridge the domain gap by maximizing the similarity between encoded representations of an unlabeled video at two different speeds.
Second, we propose a novel extension to the temporal contrastive loss by using background mixing that allows additional positives per anchor, thus adapting contrastive learning to leverage action semantics shared across both domains.
arXiv Detail & Related papers (2021-10-28T14:03:29Z) - AFAN: Augmented Feature Alignment Network for Cross-Domain Object
Detection [90.18752912204778]
Unsupervised domain adaptation for object detection is a challenging problem with many real-world applications.
We propose a novel augmented feature alignment network (AFAN) which integrates intermediate domain image generation and domain-adversarial training.
Our approach significantly outperforms the state-of-the-art methods on standard benchmarks for both similar and dissimilar domain adaptations.
arXiv Detail & Related papers (2021-06-10T05:01:20Z) - Adaptive Intermediate Representations for Video Understanding [50.64187463941215]
We introduce a new way to leverage semantic segmentation as an intermediate representation for video understanding.
We propose a general framework which learns the intermediate representations (optical flow and semantic segmentation) jointly with the final video understanding task.
We obtain more powerful visual representations for videos which lead to performance gains over the state-of-the-art.
arXiv Detail & Related papers (2021-04-14T21:37:23Z) - Variational Interaction Information Maximization for Cross-domain
Disentanglement [34.08140408283391]
Cross-domain disentanglement is the problem of learning representations partitioned into domain-invariant and domain-specific representations.
We cast the simultaneous learning of domain-invariant and domain-specific representations as a joint objective of multiple information constraints.
We show that our model achieves the state-of-the-art performance in the zero-shot sketch based image retrieval task.
arXiv Detail & Related papers (2020-12-08T07:11:35Z) - Channel-wise Alignment for Adaptive Object Detection [66.76486843397267]
Generic object detection has been immensely promoted by the development of deep convolutional neural networks.
Existing methods on this task usually draw attention on the high-level alignment based on the whole image or object of interest.
In this paper, we realize adaptation from a thoroughly different perspective, i.e., channel-wise alignment.
arXiv Detail & Related papers (2020-09-07T02:42:18Z) - Adversarial Bipartite Graph Learning for Video Domain Adaptation [50.68420708387015]
Domain adaptation techniques, which focus on adapting models between distributionally different domains, are rarely explored in the video recognition area.
Recent works on visual domain adaptation which leverage adversarial learning to unify the source and target video representations are not highly effective on the videos.
This paper proposes an Adversarial Bipartite Graph (ABG) learning framework which directly models the source-target interactions.
arXiv Detail & Related papers (2020-07-31T03:48:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.