Video Unsupervised Domain Adaptation with Deep Learning: A Comprehensive
Survey
- URL: http://arxiv.org/abs/2211.10412v2
- Date: Mon, 21 Nov 2022 04:57:54 GMT
- Title: Video Unsupervised Domain Adaptation with Deep Learning: A Comprehensive
Survey
- Authors: Yuecong Xu, Haozhi Cao, Zhenghua Chen, Xiaoli Li, Lihua Xie, Jianfei
Yang
- Abstract summary: Video analysis tasks such as action recognition have received increasing research interest with growing applications in fields such as smart healthcare.
Video models trained on existing datasets suffer from significant performance degradation when deployed directly to real-world applications.
Video unsupervised domain adaptation (VUDA) is introduced to adapt video models from the labeled source domain to the unlabeled target domain.
- Score: 32.526118672614345
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Video analysis tasks such as action recognition have received increasing
research interest with growing applications in fields such as smart healthcare,
thanks to the introduction of large-scale datasets and deep learning-based
representations. However, video models trained on existing datasets suffer from
significant performance degradation when deployed directly to real-world
applications due to domain shifts between the training public video datasets
(source video domains) and real-world videos (target video domains). Further,
with the high cost of video annotation, it is more practical to use unlabeled
videos for training. To tackle performance degradation and address concerns in
high video annotation cost uniformly, the video unsupervised domain adaptation
(VUDA) is introduced to adapt video models from the labeled source domain to
the unlabeled target domain by alleviating video domain shift, improving the
generalizability and portability of video models. This paper surveys recent
progress in VUDA with deep learning. We begin with the motivation of VUDA,
followed by its definition, and recent progress of methods for both closed-set
VUDA and VUDA under different scenarios, and current benchmark datasets for
VUDA research. Eventually, future directions are provided to promote further
VUDA research.
Related papers
- Augmenting and Aligning Snippets for Few-Shot Video Domain Adaptation [22.097165083633175]
Video Unsupervised Domain Adaptation (VUDA) has been introduced to improve the robustness and transferability of video models.
We consider a more realistic textitFew-Shot Video-based Domain Adaptation (FSVDA) scenario where we adapt video models with only a few target video samples.
We propose a novel SSA2lign to address FSVDA at the snippet level, where the target domain is expanded through a simple snippet-level augmentation.
arXiv Detail & Related papers (2023-03-18T16:33:56Z) - Exploring Domain Incremental Video Highlights Detection with the
LiveFood Benchmark [12.151826076159134]
We propose a novel video highlights detection method named Global Prototype (GPE) to learn incrementally for adapting to new domains via parameterized prototypes.
To the best of our knowledge, this is the first work to explore video highlights detection in the incremental learning setting.
arXiv Detail & Related papers (2022-09-12T11:51:08Z) - Unsupervised Domain Adaptation for Video Transformers in Action
Recognition [76.31442702219461]
We propose a simple and novel UDA approach for video action recognition.
Our approach builds a robust source model that better generalises to target domain.
We report results on two video action benchmarks recognition for UDA.
arXiv Detail & Related papers (2022-07-26T12:17:39Z) - Self-Supervised Learning for Videos: A Survey [70.37277191524755]
Self-supervised learning has shown promise in both image and video domains.
In this survey, we provide a review of existing approaches on self-supervised learning focusing on the video domain.
arXiv Detail & Related papers (2022-06-18T00:26:52Z) - VRAG: Region Attention Graphs for Content-Based Video Retrieval [85.54923500208041]
Region Attention Graph Networks (VRAG) improves the state-of-the-art video-level methods.
VRAG represents videos at a finer granularity via region-level features and encodes video-temporal dynamics through region-level relations.
We show that the performance gap between video-level and frame-level methods can be reduced by segmenting videos into shots and using shot embeddings for video retrieval.
arXiv Detail & Related papers (2022-05-18T16:50:45Z) - Unsupervised Domain Adaptation for Video Semantic Segmentation [91.30558794056054]
Unsupervised Domain Adaptation for semantic segmentation has gained immense popularity since it can transfer knowledge from simulation to real.
In this work, we present a new video extension of this task, namely Unsupervised Domain Adaptation for Video Semantic approaches.
We show that our proposals significantly outperform previous image-based UDA methods both on image-level (mIoU) and video-level (VPQ) evaluation metrics.
arXiv Detail & Related papers (2021-07-23T07:18:20Z) - CUPID: Adaptive Curation of Pre-training Data for Video-and-Language
Representation Learning [49.18591896085498]
We propose CUPID to bridge the domain gap between source and target data.
CUPID yields new state-of-the-art performance across multiple video-language and video tasks.
arXiv Detail & Related papers (2021-04-01T06:42:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.