Related papers: Exploring Domain Incremental Video Highlights Detection with the LiveFood Benchmark

Exploring Domain Incremental Video Highlights Detection with the LiveFood Benchmark

URL: http://arxiv.org/abs/2209.05166v4
Date: Tue, 12 Dec 2023 10:42:26 GMT
Title: Exploring Domain Incremental Video Highlights Detection with the LiveFood Benchmark
Authors: Sen Pei, Shixiong Xu, and Xiaojie Jin
Abstract summary: We propose a novel video highlights detection method named Global Prototype (GPE) to learn incrementally for adapting to new domains via parameterized prototypes. To the best of our knowledge, this is the first work to explore video highlights detection in the incremental learning setting.
Score: 12.151826076159134
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Video highlights detection (VHD) is an active research field in computer vision, aiming to locate the most user-appealing clips given raw video inputs. However, most VHD methods are based on the closed world assumption, i.e., a fixed number of highlight categories is defined in advance and all training data are available beforehand. Consequently, existing methods have poor scalability with respect to increasing highlight domains and training data. To address above issues, we propose a novel video highlights detection method named Global Prototype Encoding (GPE) to learn incrementally for adapting to new domains via parameterized prototypes. To facilitate this new research direction, we collect a finely annotated dataset termed LiveFood, including over 5,100 live gourmet videos that consist of four domains: ingredients, cooking, presentation, and eating. To the best of our knowledge, this is the first work to explore video highlights detection in the incremental learning setting, opening up new land to apply VHD for practical scenarios where both the concerned highlight domains and training data increase over time. We demonstrate the effectiveness of GPE through extensive experiments. Notably, GPE surpasses popular domain incremental learning methods on LiveFood, achieving significant mAP improvements on all domains. Concerning the classic datasets, GPE also yields comparable performance as previous arts. The code is available at: https://github.com/ForeverPs/IncrementalVHD_GPE.

Related papers

Leveraging Pre-Trained Visual Models for AI-Generated Video Detection [54.88903878778194]
The field of video generation has advanced beyond DeepFakes, creating an urgent need for methods capable of detecting AI-generated videos with generic content.<n>We propose a novel approach that leverages pre-trained visual models to distinguish between real and generated videos.<n>Our method achieves high detection accuracy, above 90% on average, underscoring its effectiveness.
arXiv Detail & Related papers (2025-07-17T15:36:39Z)
Domain Generalization for Improved Human Activity Recognition in Office Space Videos Using Adaptive Pre-processing [2.45990890510584]
This paper focuses on office activity recognition amidst environmental variability. We propose three pre-processing techniques applicable to any video encoder, enhancing robustness against environmental variations. Our approach significantly boosts accuracy, precision, recall and F1 score on unseen domains, emphasizing its adaptability in real-world scenarios with diverse video data sources.
arXiv Detail & Related papers (2025-03-16T22:33:41Z)
LAVID: An Agentic LVLM Framework for Diffusion-Generated Video Detection [14.687867348598035]
Large Vision Language Model (LVLM) has become an emerging tool for AI-generated content detection. We propose LAVID, a novel LVLMs-based ai-generated video detection with explicit knowledge enhancement. Our proposed pipeline automatically selects a set of explicit knowledge tools for detection, and then adaptively adjusts the structure prompt by self-rewriting.
arXiv Detail & Related papers (2025-02-20T19:34:58Z)
CDFSL-V: Cross-Domain Few-Shot Learning for Videos [58.37446811360741]
Few-shot video action recognition is an effective approach to recognizing new categories with only a few labeled examples. Existing methods in video action recognition rely on large labeled datasets from the same domain. We propose a novel cross-domain few-shot video action recognition method that leverages self-supervised learning and curriculum learning.
arXiv Detail & Related papers (2023-09-07T19:44:27Z)
Video Unsupervised Domain Adaptation with Deep Learning: A Comprehensive Survey [42.22801056661226]
Video analysis tasks such as action recognition have received increasing research interest with growing applications in fields such as smart healthcare. Video models trained on existing datasets suffer from significant performance degradation when deployed directly to real-world applications. Video unsupervised domain adaptation (VUDA) is introduced to adapt video models from the labeled source domain to the unlabeled target domain.
arXiv Detail & Related papers (2022-11-17T05:05:42Z)
Extending Temporal Data Augmentation for Video Action Recognition [1.3807859854345832]
We propose novel techniques to strengthen the relationship between the spatial and temporal domains. The video action recognition results of our techniques outperform their respective variants in Top-1 and Top-5 settings on the UCF-101 and the HMDB-51 datasets.
arXiv Detail & Related papers (2022-11-09T13:49:38Z)
Unsupervised Domain Adaptation for Video Transformers in Action Recognition [76.31442702219461]
We propose a simple and novel UDA approach for video action recognition. Our approach builds a robust source model that better generalises to target domain. We report results on two video action benchmarks recognition for UDA.
arXiv Detail & Related papers (2022-07-26T12:17:39Z)
Cross-category Video Highlight Detection via Set-based Learning [55.49267044910344]
We propose a Dual-Learner-based Video Highlight Detection (DL-VHD) framework. It learns the distinction of target category videos and the characteristics of highlight moments on source video category. It outperforms five typical Unsupervised Domain Adaptation (UDA) algorithms on various cross-category highlight detection tasks.
arXiv Detail & Related papers (2021-08-26T13:06:47Z)
Unsupervised Domain Adaptation for Video Semantic Segmentation [91.30558794056054]
Unsupervised Domain Adaptation for semantic segmentation has gained immense popularity since it can transfer knowledge from simulation to real. In this work, we present a new video extension of this task, namely Unsupervised Domain Adaptation for Video Semantic approaches. We show that our proposals significantly outperform previous image-based UDA methods both on image-level (mIoU) and video-level (VPQ) evaluation metrics.
arXiv Detail & Related papers (2021-07-23T07:18:20Z)
CUPID: Adaptive Curation of Pre-training Data for Video-and-Language Representation Learning [49.18591896085498]
We propose CUPID to bridge the domain gap between source and target data. CUPID yields new state-of-the-art performance across multiple video-language and video tasks.
arXiv Detail & Related papers (2021-04-01T06:42:16Z)
Learning from Weakly-labeled Web Videos via Exploring Sub-Concepts [89.06560404218028]
We introduce a new method for pre-training video action recognition models using queried web videos. Instead of trying to filter out, we propose to convert the potential noises in these queried videos to useful supervision signals. We show that SPL outperforms several existing pre-training strategies using pseudo-labels.
arXiv Detail & Related papers (2021-01-11T05:50:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.