EXTERN: Leveraging Endo-Temporal Regularization for Black-box Video
Domain Adaptation
- URL: http://arxiv.org/abs/2208.05187v1
- Date: Wed, 10 Aug 2022 07:09:57 GMT
- Title: EXTERN: Leveraging Endo-Temporal Regularization for Black-box Video
Domain Adaptation
- Authors: Yuecong Xu, Jianfei Yang, Min Wu, Xiaoli Li, Lihua Xie, Zhenghua Chen
- Abstract summary: Black-box Video Domain Adaptation (BVDA) is a more realistic yet challenging scenario where the source video model is provided only as a black-box predictor.
We propose a novel Endo and eXo-TEmporal Regularized Network (EXTERN) by applying mask-to-mix strategies and video-tailored regularizations.
- Score: 36.8236874357225
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: To enable video models to be applied seamlessly across video tasks in
different environments, various Video Unsupervised Domain Adaptation (VUDA)
methods have been proposed to improve the robustness and transferability of
video models. Despite improvements made in model robustness, these VUDA methods
require access to both source data and source model parameters for adaptation,
raising serious data privacy and model portability issues. To cope with the
above concerns, this paper firstly formulates Black-box Video Domain Adaptation
(BVDA) as a more realistic yet challenging scenario where the source video
model is provided only as a black-box predictor. While a few methods for
Black-box Domain Adaptation (BDA) are proposed in image domain, these methods
cannot apply to video domain since video modality has more complicated temporal
features that are harder to align. To address BVDA, we propose a novel Endo and
eXo-TEmporal Regularized Network (EXTERN) by applying mask-to-mix strategies
and video-tailored regularizations: endo-temporal regularization and
exo-temporal regularization, performed across both clip and temporal features,
while distilling knowledge from the predictions obtained from the black-box
predictor. Empirical results demonstrate the state-of-the-art performance of
EXTERN across various cross-domain closed-set and partial-set action
recognition benchmarks, which even surpassed most existing video domain
adaptation methods with source data accessibility.
Related papers
- Harnessing Large Language Models for Training-free Video Anomaly Detection [34.76811491190446]
Video anomaly detection (VAD) aims to temporally locate abnormal events in a video.
Training-based methods are prone to be domain-specific, thus being costly for practical deployment.
We propose LAnguage-based VAD (LAVAD), a method tackling VAD in a novel, training-free paradigm.
arXiv Detail & Related papers (2024-04-01T09:34:55Z) - MeDM: Mediating Image Diffusion Models for Video-to-Video Translation
with Temporal Correspondence Guidance [10.457759140533168]
This study introduces an efficient and effective method, MeDM, that utilizes pre-trained image Diffusion Models for video-to-video translation with consistent temporal flow.
We employ explicit optical flows to construct a practical coding that enforces physical constraints on generated frames and mediates independent frame-wise scores.
arXiv Detail & Related papers (2023-08-19T17:59:12Z) - Boost Video Frame Interpolation via Motion Adaptation [73.42573856943923]
Video frame (VFI) is a challenging task that aims to generate intermediate frames between two consecutive frames in a video.
Existing learning-based VFI methods have achieved great success, but they still suffer from limited generalization ability.
We propose a novel optimization-based VFI method that can adapt to unseen motions at test time.
arXiv Detail & Related papers (2023-06-24T10:44:02Z) - Transform-Equivariant Consistency Learning for Temporal Sentence
Grounding [66.10949751429781]
We introduce a novel Equivariant Consistency Regulation Learning framework to learn more discriminative representations for each video.
Our motivation comes from that the temporal boundary of the query-guided activity should be consistently predicted.
In particular, we devise a self-supervised consistency loss module to enhance the completeness and smoothness of the augmented video.
arXiv Detail & Related papers (2023-05-06T19:29:28Z) - VIDM: Video Implicit Diffusion Models [75.90225524502759]
Diffusion models have emerged as a powerful generative method for synthesizing high-quality and diverse set of images.
We propose a video generation method based on diffusion models, where the effects of motion are modeled in an implicit condition.
We improve the quality of the generated videos by proposing multiple strategies such as sampling space truncation, robustness penalty, and positional group normalization.
arXiv Detail & Related papers (2022-12-01T02:58:46Z) - Unsupervised Video Domain Adaptation for Action Recognition: A
Disentanglement Perspective [37.45565756522847]
We consider the generation of cross-domain videos from two sets of latent factors.
TranSVAE framework is then developed to model such generation.
Experiments on the UCF-HMDB, Jester, and Epic-Kitchens datasets verify the effectiveness and superiority of TranSVAE.
arXiv Detail & Related papers (2022-08-15T17:59:31Z) - Unsupervised Domain Adaptation for Video Transformers in Action
Recognition [76.31442702219461]
We propose a simple and novel UDA approach for video action recognition.
Our approach builds a robust source model that better generalises to target domain.
We report results on two video action benchmarks recognition for UDA.
arXiv Detail & Related papers (2022-07-26T12:17:39Z) - Learning Temporal Consistency for Source-Free Video Domain Adaptation [16.230405375192262]
In real-world applications, subjects and scenes in the source video domain should be irrelevant to those in the target video domain.
To cope with such concern, a more practical domain adaptation scenario is formulated as the Source-Free Video-based Domain Adaptation (SFVDA)
We propose a novel Attentive Temporal Consistent Network (ATCoN) to address SFVDA by learning temporal consistency.
arXiv Detail & Related papers (2022-03-09T07:33:36Z) - Learning Cross-modal Contrastive Features for Video Domain Adaptation [138.75196499580804]
We propose a unified framework for video domain adaptation, which simultaneously regularizes cross-modal and cross-domain feature representations.
Specifically, we treat each modality in a domain as a view and leverage the contrastive learning technique with properly designed sampling strategies.
arXiv Detail & Related papers (2021-08-26T18:14:18Z) - Generative Adversarial Networks for Video-to-Video Domain Adaptation [32.670977389990306]
We propose a novel generative adversarial network (GAN), namely VideoGAN, to transfer the video-based data across different domains.
As the frames of a video may have similar content and imaging conditions, the proposed VideoGAN has an X-shape generator to preserve the intra-video consistency.
Two colonoscopic datasets from different centres, i.e., CVC-Clinic and ETIS-Larib, are adopted to evaluate the performance of our VideoGAN.
arXiv Detail & Related papers (2020-04-17T04:16:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.