The Blessings of Unlabeled Background in Untrimmed Videos
- URL: http://arxiv.org/abs/2103.13183v1
- Date: Wed, 24 Mar 2021 13:34:42 GMT
- Title: The Blessings of Unlabeled Background in Untrimmed Videos
- Authors: Yuan Liu, Jingyuan Chen, Zhenfang Chen, Bing Deng, Jianqiang Huang,
Hanwang Zhang
- Abstract summary: Weakly-supervised Temporal Action Localization (WTAL) aims to detect the intervals of action instances with only video-level action labels available during training.
The key challenge is how to distinguish the segments of interest from the background segments, which are unlabelled even on the video-level.
We propose a Temporal Smoothing PCA-based (TS-PCA) deconfounder, which exploits the unlabelled background to model an observed substitute for the confounder.
- Score: 66.99259967869065
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Weakly-supervised Temporal Action Localization (WTAL) aims to detect the
intervals of action instances with only video-level action labels available
during training. The key challenge is how to distinguish the segments of
interest from the background segments, which are unlabelled even on the
video-level. While previous works treat the background as "curses", we consider
it as "blessings". Specifically, we first use causal analysis to point out that
the common localization errors are due to the unobserved and un-enumerated
confounder that resides ubiquitously in visual recognition. Then, we propose a
Temporal Smoothing PCA-based (TS-PCA) deconfounder, which exploits the
unlabelled background to model an observed substitute for the confounder, to
remove the confounding effect. Note that the proposed deconfounder is
model-agnostic and non-intrusive, and hence can be applied in any WTAL method
without modification. Through extensive experiments on four state-of-the-art
WTAL methods, we show that the deconfounder can improve all of them on the
public datasets: THUMOS-14 and ActivityNet-1.3.
Related papers
- Weakly Supervised Video Anomaly Detection and Localization with Spatio-Temporal Prompts [57.01985221057047]
This paper introduces a novel method that learnstemporal prompt embeddings for weakly supervised video anomaly detection and localization (WSVADL) based on pre-trained vision-language models (VLMs)
Our method achieves state-of-theart performance on three public benchmarks for the WSVADL task.
arXiv Detail & Related papers (2024-08-12T03:31:29Z) - Harnessing Temporal Causality for Advanced Temporal Action Detection [53.654457142657236]
We introduce CausalTAD, which combines causal attention and causal Mamba to achieve state-of-the-art performance on benchmarks.
We ranked 1st in the Action Recognition, Action Detection, and Audio-Based Interaction Detection tracks at the EPIC-Kitchens Challenge 2024, and 1st in the Moment Queries track at the Ego4D Challenge 2024.
arXiv Detail & Related papers (2024-07-25T06:03:02Z) - DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding.
Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition.
We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z) - Stacked Temporal Attention: Improving First-person Action Recognition by
Emphasizing Discriminative Clips [39.29955809641396]
Many backgrounds or noisy frames in a first-person video can distract an action recognition model during its learning process.
Previous works explored to address this problem by applying temporal attention but failed to consider the global context of the full video.
We propose a simple yet effective Stacked Temporal Attention Module (STAM) to compute temporal attention based on the global knowledge across clips.
arXiv Detail & Related papers (2021-12-02T08:02:35Z) - Video-based Person Re-identification without Bells and Whistles [49.51670583977911]
Video-based person re-identification (Re-ID) aims at matching the video tracklets with cropped video frames for identifying the pedestrians under different cameras.
There exists severe spatial and temporal misalignment for those cropped tracklets due to the imperfect detection and tracking results generated with obsolete methods.
We present a simple re-Detect and Link (DL) module which can effectively reduce those unexpected noise through applying the deep learning-based detection and tracking on the cropped tracklets.
arXiv Detail & Related papers (2021-05-22T10:17:38Z) - A Self-Reasoning Framework for Anomaly Detection Using Video-Level
Labels [17.615297975503648]
Alous event detection in surveillance videos is a challenging and practical research problem among image and video processing community.
We propose a weakly supervised anomaly detection framework based on deep neural networks which is trained in a self-reasoning fashion using only video-level labels.
The proposed framework has been evaluated on publicly available real-world anomaly detection datasets including UCF-crime, ShanghaiTech and Ped2.
arXiv Detail & Related papers (2020-08-27T02:14:15Z) - Labelling unlabelled videos from scratch with multi-modal
self-supervision [82.60652426371936]
unsupervised labelling of a video dataset does not come for free from strong feature encoders.
We propose a novel clustering method that allows pseudo-labelling of a video dataset without any human annotations.
An extensive analysis shows that the resulting clusters have high semantic overlap to ground truth human labels.
arXiv Detail & Related papers (2020-06-24T12:28:17Z) - Weakly-supervised Temporal Action Localization by Uncertainty Modeling [34.27514534497615]
Weakly-supervised temporal action localization aims to learn detecting temporal intervals of action classes with only video-level labels.
We present a new perspective on background frames where they are modeled as out-of-distribution samples regarding their inconsistency.
arXiv Detail & Related papers (2020-06-12T08:54:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.