Simplifying Open-Set Video Domain Adaptation with Contrastive Learning
- URL: http://arxiv.org/abs/2301.03322v1
- Date: Mon, 9 Jan 2023 13:16:50 GMT
- Title: Simplifying Open-Set Video Domain Adaptation with Contrastive Learning
- Authors: Giacomo Zara, Victor Guilherme Turrisi da Costa, Subhankar Roy, Paolo
Rota, Elisa Ricci
- Abstract summary: unsupervised video domain adaptation methods have been proposed to adapt a predictive model from a labelled dataset to an unlabelled dataset.
We address a more realistic scenario, called open-set video domain adaptation (OUVDA), where the target dataset contains "unknown" semantic categories that are not shared with the source.
We propose a video-oriented temporal contrastive loss that enables our method to better cluster the feature space by exploiting the freely available temporal information in video data.
- Score: 16.72734794723157
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In an effort to reduce annotation costs in action recognition, unsupervised
video domain adaptation methods have been proposed that aim to adapt a
predictive model from a labelled dataset (i.e., source domain) to an unlabelled
dataset (i.e., target domain). In this work we address a more realistic
scenario, called open-set video domain adaptation (OUVDA), where the target
dataset contains "unknown" semantic categories that are not shared with the
source. The challenge lies in aligning the shared classes of the two domains
while separating the shared classes from the unknown ones. In this work we
propose to address OUVDA with an unified contrastive learning framework that
learns discriminative and well-clustered features. We also propose a
video-oriented temporal contrastive loss that enables our method to better
cluster the feature space by exploiting the freely available temporal
information in video data. We show that discriminative feature space
facilitates better separation of the unknown classes, and thereby allows us to
use a simple similarity based score to identify them. We conduct thorough
experimental evaluation on multiple OUVDA benchmarks and show the effectiveness
of our proposed method against the prior art.
Related papers
- Uncertainty-guided Open-Set Source-Free Unsupervised Domain Adaptation with Target-private Class Segregation [22.474866164542302]
UDA approaches commonly assume that source and target domains share the same labels space.
This paper considers the more challenging Source-Free Open-set Domain Adaptation (SF-OSDA) setting.
We propose a novel approach for SF-OSDA that exploits the granularity of target-private categories by segregating their samples into multiple unknown classes.
arXiv Detail & Related papers (2024-04-16T13:52:00Z) - CDFSL-V: Cross-Domain Few-Shot Learning for Videos [58.37446811360741]
Few-shot video action recognition is an effective approach to recognizing new categories with only a few labeled examples.
Existing methods in video action recognition rely on large labeled datasets from the same domain.
We propose a novel cross-domain few-shot video action recognition method that leverages self-supervised learning and curriculum learning.
arXiv Detail & Related papers (2023-09-07T19:44:27Z) - Exploiting Instance-based Mixed Sampling via Auxiliary Source Domain
Supervision for Domain-adaptive Action Detection [75.38704117155909]
We propose a novel domain adaptive action detection approach and a new adaptation protocol.
Self-training combined with cross-domain mixed sampling has shown remarkable performance gain in UDA context.
We name our proposed framework as domain-adaptive action instance mixing (DA-AIM)
arXiv Detail & Related papers (2022-09-28T22:03:25Z) - Unsupervised Video Domain Adaptation for Action Recognition: A
Disentanglement Perspective [37.45565756522847]
We consider the generation of cross-domain videos from two sets of latent factors.
TranSVAE framework is then developed to model such generation.
Experiments on the UCF-HMDB, Jester, and Epic-Kitchens datasets verify the effectiveness and superiority of TranSVAE.
arXiv Detail & Related papers (2022-08-15T17:59:31Z) - Contrast and Mix: Temporal Contrastive Video Domain Adaptation with
Background Mixing [55.73722120043086]
We introduce Contrast and Mix (CoMix), a new contrastive learning framework that aims to learn discriminative invariant feature representations for unsupervised video domain adaptation.
First, we utilize temporal contrastive learning to bridge the domain gap by maximizing the similarity between encoded representations of an unlabeled video at two different speeds.
Second, we propose a novel extension to the temporal contrastive loss by using background mixing that allows additional positives per anchor, thus adapting contrastive learning to leverage action semantics shared across both domains.
arXiv Detail & Related papers (2021-10-28T14:03:29Z) - Seeking Similarities over Differences: Similarity-based Domain Alignment
for Adaptive Object Detection [86.98573522894961]
We propose a framework that generalizes the components commonly used by Unsupervised Domain Adaptation (UDA) algorithms for detection.
Specifically, we propose a novel UDA algorithm, ViSGA, that leverages the best design choices and introduces a simple but effective method to aggregate features at instance-level.
We show that both similarity-based grouping and adversarial training allows our model to focus on coarsely aligning feature groups, without being forced to match all instances across loosely aligned domains.
arXiv Detail & Related papers (2021-10-04T13:09:56Z) - Conditional Extreme Value Theory for Open Set Video Domain Adaptation [17.474956295874797]
We propose an open-set video domain adaptation approach to mitigate the domain discrepancy between the source and target data.
To alleviate the negative transfer issue, weights computed by the distance from the sample entropy to the threshold are leveraged in adversarial learning.
The proposed method has been thoroughly evaluated on both small-scale and large-scale cross-domain video datasets.
arXiv Detail & Related papers (2021-09-01T10:51:50Z) - On Universal Black-Box Domain Adaptation [53.7611757926922]
We study an arguably least restrictive setting of domain adaptation in a sense of practical deployment.
Only the interface of source model is available to the target domain, and where the label-space relations between the two domains are allowed to be different and unknown.
We propose to unify them into a self-training framework, regularized by consistency of predictions in local neighborhoods of target samples.
arXiv Detail & Related papers (2021-04-10T02:21:09Z) - Unsupervised Learning of Video Representations via Dense Trajectory
Clustering [86.45054867170795]
This paper addresses the task of unsupervised learning of representations for action recognition in videos.
We first propose to adapt two top performing objectives in this class - instance recognition and local aggregation.
We observe promising performance, but qualitative analysis shows that the learned representations fail to capture motion patterns.
arXiv Detail & Related papers (2020-06-28T22:23:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.