Generative Adversarial Networks for Video-to-Video Domain Adaptation
- URL: http://arxiv.org/abs/2004.08058v1
- Date: Fri, 17 Apr 2020 04:16:37 GMT
- Title: Generative Adversarial Networks for Video-to-Video Domain Adaptation
- Authors: Jiawei Chen, Yuexiang Li, Kai Ma, Yefeng Zheng
- Abstract summary: We propose a novel generative adversarial network (GAN), namely VideoGAN, to transfer the video-based data across different domains.
As the frames of a video may have similar content and imaging conditions, the proposed VideoGAN has an X-shape generator to preserve the intra-video consistency.
Two colonoscopic datasets from different centres, i.e., CVC-Clinic and ETIS-Larib, are adopted to evaluate the performance of our VideoGAN.
- Score: 32.670977389990306
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Endoscopic videos from multicentres often have different imaging conditions,
e.g., color and illumination, which make the models trained on one domain
usually fail to generalize well to another. Domain adaptation is one of the
potential solutions to address the problem. However, few of existing works
focused on the translation of video-based data. In this work, we propose a
novel generative adversarial network (GAN), namely VideoGAN, to transfer the
video-based data across different domains. As the frames of a video may have
similar content and imaging conditions, the proposed VideoGAN has an X-shape
generator to preserve the intra-video consistency during translation.
Furthermore, a loss function, namely color histogram loss, is proposed to tune
the color distribution of each translated frame. Two colonoscopic datasets from
different centres, i.e., CVC-Clinic and ETIS-Larib, are adopted to evaluate the
performance of domain adaptation of our VideoGAN. Experimental results
demonstrate that the adapted colonoscopic video generated by our VideoGAN can
significantly boost the segmentation accuracy, i.e., an improvement of 5%, of
colorectal polyps on multicentre datasets. As our VideoGAN is a general network
architecture, we also evaluate its performance with the CamVid driving video
dataset on the cloudy-to-sunny translation task. Comprehensive experiments show
that the domain gap could be substantially narrowed down by our VideoGAN.
Related papers
- Generative Video Diffusion for Unseen Cross-Domain Video Moment
Retrieval [58.17315970207874]
Video Moment Retrieval (VMR) requires precise modelling of fine-grained moment-text associations to capture intricate visual-language relationships.
Existing methods resort to joint training on both source and target domain videos for cross-domain applications.
We explore generative video diffusion for fine-grained editing of source videos controlled by the target sentences.
arXiv Detail & Related papers (2024-01-24T09:45:40Z) - Multi-Modal Video Topic Segmentation with Dual-Contrastive Domain
Adaptation [74.51546366251753]
Video topic segmentation unveils the coarse-grained semantic structure underlying videos.
We introduce a multi-modal video topic segmenter that utilizes both video transcripts and frames.
Our proposed solution significantly surpasses baseline methods in terms of both accuracy and transferability.
arXiv Detail & Related papers (2023-11-30T21:59:05Z) - Unsupervised Video Domain Adaptation for Action Recognition: A
Disentanglement Perspective [37.45565756522847]
We consider the generation of cross-domain videos from two sets of latent factors.
TranSVAE framework is then developed to model such generation.
Experiments on the UCF-HMDB, Jester, and Epic-Kitchens datasets verify the effectiveness and superiority of TranSVAE.
arXiv Detail & Related papers (2022-08-15T17:59:31Z) - EXTERN: Leveraging Endo-Temporal Regularization for Black-box Video
Domain Adaptation [36.8236874357225]
Black-box Video Domain Adaptation (BVDA) is a more realistic yet challenging scenario where the source video model is provided only as a black-box predictor.
We propose a novel Endo and eXo-TEmporal Regularized Network (EXTERN) by applying mask-to-mix strategies and video-tailored regularizations.
arXiv Detail & Related papers (2022-08-10T07:09:57Z) - Exploring Intra- and Inter-Video Relation for Surgical Semantic Scene
Segmentation [58.74791043631219]
We propose a novel framework STswinCL that explores the complementary intra- and inter-video relations to boost segmentation performance.
We extensively validate our approach on two public surgical video benchmarks, including EndoVis18 Challenge and CaDIS dataset.
Experimental results demonstrate the promising performance of our method, which consistently exceeds previous state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-29T05:52:23Z) - Group Contextualization for Video Recognition [80.3842253625557]
Group contextualization (GC) can boost the performance of 2D-CNN (e.g., TSN) and TSM.
GC embeds feature with four different kinds of contexts in parallel.
Group contextualization can boost the performance of 2D-CNN (e.g., TSN) to a level comparable to the state-the-art video networks.
arXiv Detail & Related papers (2022-03-18T01:49:40Z) - Domain Adaptive Video Segmentation via Temporal Consistency
Regularization [32.77436219094282]
This paper presents DA-VSN, a domain adaptive video segmentation network that addresses domain gaps in videos by temporal consistency regularization (TCR)
The first is cross-domain TCR that guides the prediction of target frames to have similar temporal consistency as that of source frames (learnt from annotated source data) via adversarial learning.
The second is intra-domain TCR that guides unconfident predictions of target frames to have similar temporal consistency as confident predictions of target frames.
arXiv Detail & Related papers (2021-07-23T02:50:42Z) - Coherent Loss: A Generic Framework for Stable Video Segmentation [103.78087255807482]
We investigate how a jittering artifact degrades the visual quality of video segmentation results.
We propose a Coherent Loss with a generic framework to enhance the performance of a neural network against jittering artifacts.
arXiv Detail & Related papers (2020-10-25T10:48:28Z) - Adversarial Bipartite Graph Learning for Video Domain Adaptation [50.68420708387015]
Domain adaptation techniques, which focus on adapting models between distributionally different domains, are rarely explored in the video recognition area.
Recent works on visual domain adaptation which leverage adversarial learning to unify the source and target video representations are not highly effective on the videos.
This paper proposes an Adversarial Bipartite Graph (ABG) learning framework which directly models the source-target interactions.
arXiv Detail & Related papers (2020-07-31T03:48:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.