Reliable Shot Identification for Complex Event Detection via
Visual-Semantic Embedding
- URL: http://arxiv.org/abs/2110.08063v1
- Date: Tue, 12 Oct 2021 11:46:56 GMT
- Title: Reliable Shot Identification for Complex Event Detection via
Visual-Semantic Embedding
- Authors: Minnan Luo and Xiaojun Chang and Chen Gong
- Abstract summary: We propose a visual-semantic guided loss method for event detection in videos.
Motivated by curriculum learning, we introduce a negative elastic regularization term to start training the classifier with instances of high reliability.
An alternative optimization algorithm is developed to solve the proposed challenging non-net regularization problem.
- Score: 72.9370352430965
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multimedia event detection is the task of detecting a specific event of
interest in an user-generated video on websites. The most fundamental challenge
facing this task lies in the enormously varying quality of the video as well as
the high-level semantic abstraction of event inherently. In this paper, we
decompose the video into several segments and intuitively model the task of
complex event detection as a multiple instance learning problem by representing
each video as a "bag" of segments in which each segment is referred to as an
instance. Instead of treating the instances equally, we associate each instance
with a reliability variable to indicate its importance and then select reliable
instances for training. To measure the reliability of the varying instances
precisely, we propose a visual-semantic guided loss by exploiting low-level
feature from visual information together with instance-event similarity based
high-level semantic feature. Motivated by curriculum learning, we introduce a
negative elastic-net regularization term to start training the classifier with
instances of high reliability and gradually taking the instances with
relatively low reliability into consideration. An alternative optimization
algorithm is developed to solve the proposed challenging non-convex non-smooth
problem. Experimental results on standard datasets, i.e., TRECVID MEDTest 2013
and TRECVID MEDTest 2014, demonstrate the effectiveness and superiority of the
proposed method to the baseline algorithms.
Related papers
- Sample-agnostic Adversarial Perturbation for Vision-Language Pre-training Models [7.350203999073509]
Recent studies on AI security have highlighted the vulnerability of Vision-Language Pre-training models to subtle yet intentionally designed perturbations in images and texts.
To the best of our knowledge, it is the first work through multimodal decision boundaries to explore the creation of a universal, sample-agnostic perturbation that applies to any image.
arXiv Detail & Related papers (2024-08-06T06:25:39Z) - Active Learning Principles for In-Context Learning with Large Language
Models [65.09970281795769]
This paper investigates how Active Learning algorithms can serve as effective demonstration selection methods for in-context learning.
We show that in-context example selection through AL prioritizes high-quality examples that exhibit low uncertainty and bear similarity to the test examples.
arXiv Detail & Related papers (2023-05-23T17:16:04Z) - Bayesian Nonparametric Submodular Video Partition for Robust Anomaly
Detection [9.145168943972067]
Multiple-instance learning (MIL) provides an effective way to tackle the video anomaly detection problem.
We propose to conduct novel Bayesian non-parametric submodular video partition (BN-SVP) to significantly improve MIL model training.
Our theoretical analysis ensures a strong performance guarantee of the proposed algorithm.
arXiv Detail & Related papers (2022-03-24T04:00:49Z) - Learning to Track Instances without Video Annotations [85.9865889886669]
We introduce a novel semi-supervised framework by learning instance tracking networks with only a labeled image dataset and unlabeled video sequences.
We show that even when only trained with images, the learned feature representation is robust to instance appearance variations.
In addition, we integrate this module into single-stage instance segmentation and pose estimation frameworks.
arXiv Detail & Related papers (2021-04-01T06:47:41Z) - Distribution Alignment: A Unified Framework for Long-tail Visual
Recognition [52.36728157779307]
We propose a unified distribution alignment strategy for long-tail visual recognition.
We then introduce a generalized re-weight method in the two-stage learning to balance the class prior.
Our approach achieves the state-of-the-art results across all four recognition tasks with a simple and unified framework.
arXiv Detail & Related papers (2021-03-30T14:09:53Z) - A Background-Agnostic Framework with Adversarial Training for Abnormal
Event Detection in Video [120.18562044084678]
Abnormal event detection in video is a complex computer vision problem that has attracted significant attention in recent years.
We propose a background-agnostic framework that learns from training videos containing only normal events.
arXiv Detail & Related papers (2020-08-27T18:39:24Z) - Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed
Videos [82.02074241700728]
In this paper, we present a prohibitive-level action recognition model that is trained with only video-frame labels.
Our method per person detectors have been trained on large image datasets within Multiple Instance Learning framework.
We show how we can apply our method in cases where the standard Multiple Instance Learning assumption, that each bag contains at least one instance with the specified label, is invalid.
arXiv Detail & Related papers (2020-07-21T10:45:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.