Exploring Domain Incremental Video Highlights Detection with the
LiveFood Benchmark
- URL: http://arxiv.org/abs/2209.05166v4
- Date: Tue, 12 Dec 2023 10:42:26 GMT
- Title: Exploring Domain Incremental Video Highlights Detection with the
LiveFood Benchmark
- Authors: Sen Pei, Shixiong Xu, and Xiaojie Jin
- Abstract summary: We propose a novel video highlights detection method named Global Prototype (GPE) to learn incrementally for adapting to new domains via parameterized prototypes.
To the best of our knowledge, this is the first work to explore video highlights detection in the incremental learning setting.
- Score: 12.151826076159134
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video highlights detection (VHD) is an active research field in computer
vision, aiming to locate the most user-appealing clips given raw video inputs.
However, most VHD methods are based on the closed world assumption, i.e., a
fixed number of highlight categories is defined in advance and all training
data are available beforehand. Consequently, existing methods have poor
scalability with respect to increasing highlight domains and training data. To
address above issues, we propose a novel video highlights detection method
named Global Prototype Encoding (GPE) to learn incrementally for adapting to
new domains via parameterized prototypes. To facilitate this new research
direction, we collect a finely annotated dataset termed LiveFood, including
over 5,100 live gourmet videos that consist of four domains: ingredients,
cooking, presentation, and eating. To the best of our knowledge, this is the
first work to explore video highlights detection in the incremental learning
setting, opening up new land to apply VHD for practical scenarios where both
the concerned highlight domains and training data increase over time. We
demonstrate the effectiveness of GPE through extensive experiments. Notably,
GPE surpasses popular domain incremental learning methods on LiveFood,
achieving significant mAP improvements on all domains. Concerning the classic
datasets, GPE also yields comparable performance as previous arts. The code is
available at: https://github.com/ForeverPs/IncrementalVHD_GPE.
Related papers
- CDFSL-V: Cross-Domain Few-Shot Learning for Videos [58.37446811360741]
Few-shot video action recognition is an effective approach to recognizing new categories with only a few labeled examples.
Existing methods in video action recognition rely on large labeled datasets from the same domain.
We propose a novel cross-domain few-shot video action recognition method that leverages self-supervised learning and curriculum learning.
arXiv Detail & Related papers (2023-09-07T19:44:27Z) - Video Unsupervised Domain Adaptation with Deep Learning: A Comprehensive Survey [42.22801056661226]
Video analysis tasks such as action recognition have received increasing research interest with growing applications in fields such as smart healthcare.
Video models trained on existing datasets suffer from significant performance degradation when deployed directly to real-world applications.
Video unsupervised domain adaptation (VUDA) is introduced to adapt video models from the labeled source domain to the unlabeled target domain.
arXiv Detail & Related papers (2022-11-17T05:05:42Z) - Extending Temporal Data Augmentation for Video Action Recognition [1.3807859854345832]
We propose novel techniques to strengthen the relationship between the spatial and temporal domains.
The video action recognition results of our techniques outperform their respective variants in Top-1 and Top-5 settings on the UCF-101 and the HMDB-51 datasets.
arXiv Detail & Related papers (2022-11-09T13:49:38Z) - Unsupervised Domain Adaptation for Video Transformers in Action
Recognition [76.31442702219461]
We propose a simple and novel UDA approach for video action recognition.
Our approach builds a robust source model that better generalises to target domain.
We report results on two video action benchmarks recognition for UDA.
arXiv Detail & Related papers (2022-07-26T12:17:39Z) - Cross-category Video Highlight Detection via Set-based Learning [55.49267044910344]
We propose a Dual-Learner-based Video Highlight Detection (DL-VHD) framework.
It learns the distinction of target category videos and the characteristics of highlight moments on source video category.
It outperforms five typical Unsupervised Domain Adaptation (UDA) algorithms on various cross-category highlight detection tasks.
arXiv Detail & Related papers (2021-08-26T13:06:47Z) - Unsupervised Domain Adaptation for Video Semantic Segmentation [91.30558794056054]
Unsupervised Domain Adaptation for semantic segmentation has gained immense popularity since it can transfer knowledge from simulation to real.
In this work, we present a new video extension of this task, namely Unsupervised Domain Adaptation for Video Semantic approaches.
We show that our proposals significantly outperform previous image-based UDA methods both on image-level (mIoU) and video-level (VPQ) evaluation metrics.
arXiv Detail & Related papers (2021-07-23T07:18:20Z) - CUPID: Adaptive Curation of Pre-training Data for Video-and-Language
Representation Learning [49.18591896085498]
We propose CUPID to bridge the domain gap between source and target data.
CUPID yields new state-of-the-art performance across multiple video-language and video tasks.
arXiv Detail & Related papers (2021-04-01T06:42:16Z) - Learning from Weakly-labeled Web Videos via Exploring Sub-Concepts [89.06560404218028]
We introduce a new method for pre-training video action recognition models using queried web videos.
Instead of trying to filter out, we propose to convert the potential noises in these queried videos to useful supervision signals.
We show that SPL outperforms several existing pre-training strategies using pseudo-labels.
arXiv Detail & Related papers (2021-01-11T05:50:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.