Adversarial Background-Aware Loss for Weakly-supervised Temporal
Activity Localization
- URL: http://arxiv.org/abs/2007.06643v1
- Date: Mon, 13 Jul 2020 19:33:24 GMT
- Title: Adversarial Background-Aware Loss for Weakly-supervised Temporal
Activity Localization
- Authors: Kyle Min, Jason J. Corso
- Abstract summary: Temporally localizing activities within untrimmed videos has been extensively studied in recent years.
Despite recent advances, existing methods for weakly-supervised temporal activity localization struggle to recognize when an activity is not occurring.
- Score: 40.517438760096056
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Temporally localizing activities within untrimmed videos has been extensively
studied in recent years. Despite recent advances, existing methods for
weakly-supervised temporal activity localization struggle to recognize when an
activity is not occurring. To address this issue, we propose a novel method
named A2CL-PT. Two triplets of the feature space are considered in our
approach: one triplet is used to learn discriminative features for each
activity class, and the other one is used to distinguish the features where no
activity occurs (i.e. background features) from activity-related features for
each video. To further improve the performance, we build our network using two
parallel branches which operate in an adversarial way: the first branch
localizes the most salient activities of a video and the second one finds other
supplementary activities from non-localized parts of the video. Extensive
experiments performed on THUMOS14 and ActivityNet datasets demonstrate that our
proposed method is effective. Specifically, the average mAP of IoU thresholds
from 0.1 to 0.9 on the THUMOS14 dataset is significantly improved from 27.9% to
30.0%.
Related papers
- Boundary-Denoising for Video Activity Localization [57.9973253014712]
We study the video activity localization problem from a denoising perspective.
Specifically, we propose an encoder-decoder model named DenoiseLoc.
Experiments show that DenoiseLoc advances %in several video activity understanding tasks.
arXiv Detail & Related papers (2023-04-06T08:48:01Z) - Learning to Refactor Action and Co-occurrence Features for Temporal
Action Localization [74.74339878286935]
Action features and co-occurrence features often dominate the actual action content in videos.
We develop a novel auxiliary task by decoupling these two types of features within a video snippet.
We term our method RefactorNet, which first explicitly factorizes the action content and regularizes its co-occurrence features.
arXiv Detail & Related papers (2022-06-23T06:30:08Z) - End-to-End Semi-Supervised Learning for Video Action Detection [23.042410033982193]
We propose a simple end-to-end based approach effectively which utilizes the unlabeled data.
Video action detection requires both, action class prediction as well as a-temporal consistency.
We demonstrate the effectiveness of the proposed approach on two different action detection benchmark datasets.
arXiv Detail & Related papers (2022-03-08T18:11:25Z) - Weakly Supervised Temporal Action Localization Through Learning Explicit
Subspaces for Action and Context [151.23835595907596]
Methods learn to localize temporal starts and ends of action instances in a video under only video-level supervision.
We introduce a framework that learns two feature subspaces respectively for actions and their context.
The proposed approach outperforms state-of-the-art WS-TAL methods on three benchmarks.
arXiv Detail & Related papers (2021-03-30T08:26:53Z) - A Hybrid Attention Mechanism for Weakly-Supervised Temporal Action
Localization [12.353250130848044]
We present a novel framework named HAM-Net with a hybrid attention mechanism which includes temporal soft, semi-soft and hard attentions.
Our proposed approach outperforms recent state-of-the-art methods by at least 2.2% mAP at IoU threshold 0.5 on the THUMOS14 dataset.
arXiv Detail & Related papers (2021-01-03T03:08:18Z) - Revisiting Few-shot Activity Detection with Class Similarity Control [107.79338380065286]
We present a framework for few-shot temporal activity detection based on proposal regression.
Our model is end-to-end trainable, takes into account the frame rate differences between few-shot activities and untrimmed test videos, and can benefit from additional few-shot examples.
arXiv Detail & Related papers (2020-03-31T22:02:38Z) - ZSTAD: Zero-Shot Temporal Activity Detection [107.63759089583382]
We propose a novel task setting called zero-shot temporal activity detection (ZSTAD), where activities that have never been seen in training can still be detected.
We design an end-to-end deep network based on R-C3D as the architecture for this solution.
Experiments on both the THUMOS14 and the Charades datasets show promising performance in terms of detecting unseen activities.
arXiv Detail & Related papers (2020-03-12T02:40:36Z) - Weakly Supervised Temporal Action Localization Using Deep Metric
Learning [12.49814373580862]
We propose a weakly supervised temporal action localization method that only requires video-level action instances as supervision during training.
We jointly optimize a balanced binary cross-entropy loss and a metric loss using a standard backpropagation algorithm.
Our approach improves the current state-of-the-art result for THUMOS14 by 6.5% mAP at IoU threshold 0.5, and achieves competitive performance for ActivityNet1.2.
arXiv Detail & Related papers (2020-01-21T22:01:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.