Two-Stream Consensus Network for Weakly-Supervised Temporal Action
Localization
- URL: http://arxiv.org/abs/2010.11594v1
- Date: Thu, 22 Oct 2020 10:53:32 GMT
- Title: Two-Stream Consensus Network for Weakly-Supervised Temporal Action
Localization
- Authors: Yuanhao Zhai, Le Wang, Wei Tang, Qilin Zhang, Junsong Yuan, Gang Hua
- Abstract summary: We present a Two-Stream Consensus Network (TSCN) to simultaneously address these challenges.
The proposed TSCN features an iterative refinement training method, where a frame-level pseudo ground truth is iteratively updated.
We propose a new attention normalization loss to encourage the predicted attention to act like a binary selection, and promote the precise localization of action instance boundaries.
- Score: 94.37084866660238
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Weakly-supervised Temporal Action Localization (W-TAL) aims to classify and
localize all action instances in an untrimmed video under only video-level
supervision. However, without frame-level annotations, it is challenging for
W-TAL methods to identify false positive action proposals and generate action
proposals with precise temporal boundaries. In this paper, we present a
Two-Stream Consensus Network (TSCN) to simultaneously address these challenges.
The proposed TSCN features an iterative refinement training method, where a
frame-level pseudo ground truth is iteratively updated, and used to provide
frame-level supervision for improved model training and false positive action
proposal elimination. Furthermore, we propose a new attention normalization
loss to encourage the predicted attention to act like a binary selection, and
promote the precise localization of action instance boundaries. Experiments
conducted on the THUMOS14 and ActivityNet datasets show that the proposed TSCN
outperforms current state-of-the-art methods, and even achieves comparable
results with some recent fully-supervised methods.
Related papers
- Weakly-Supervised Temporal Action Localization with Bidirectional
Semantic Consistency Constraint [83.36913240873236]
Weakly Supervised Temporal Action localization (WTAL) aims to classify and localize temporal boundaries of actions for the video.
We propose a simple yet efficient method, named bidirectional semantic consistency constraint (Bi- SCC) to discriminate the positive actions from co-scene actions.
Experimental results show that our approach outperforms the state-of-the-art methods on THUMOS14 and ActivityNet.
arXiv Detail & Related papers (2023-04-25T07:20:33Z) - Active Learning with Effective Scoring Functions for Semi-Supervised
Temporal Action Localization [15.031156121516211]
This paper focuses on a rarely investigated yet practical task named semi-supervised TAL.
We propose an effective active learning method, named AL-STAL.
Experiment results show that AL-STAL outperforms the existing competitors and achieves satisfying performance compared with fully-supervised learning.
arXiv Detail & Related papers (2022-08-31T13:39:38Z) - Temporal Action Proposal Generation with Background Constraint [25.783837570359267]
Temporal action proposal generation (TAPG) is a challenging task that aims to locate action instances in untrimmed videos with temporal boundaries.
To evaluate the confidence of proposals, the existing works typically predict action score of proposals that are supervised by the temporal Intersection-over-Union (tIoU) between proposal and the ground-truth.
In this paper, we innovatively propose a general auxiliary Background Constraint idea to further suppress low-quality proposals.
arXiv Detail & Related papers (2021-12-15T09:20:49Z) - Transferable Knowledge-Based Multi-Granularity Aggregation Network for
Temporal Action Localization: Submission to ActivityNet Challenge 2021 [33.840281113206444]
This report presents an overview of our solution used in the submission to 2021 HACS Temporal Action localization Challenge.
We use Temporal Context Aggregation Network (TCANet) to generate high-quality action proposals.
We also adopt an additional module to transfer the knowledge from trimmed videos to untrimmed videos.
Our proposed scheme achieves 39.91 and 29.78 average mAP on the challenge testing set of supervised and weakly-supervised temporal action localization track respectively.
arXiv Detail & Related papers (2021-07-27T06:18:21Z) - Action Shuffling for Weakly Supervised Temporal Localization [22.43209053892713]
This paper analyzes the order-sensitive and location-insensitive properties of actions.
It embodies them into a self-augmented learning framework to improve the weakly supervised action localization performance.
arXiv Detail & Related papers (2021-05-10T09:05:58Z) - Weakly Supervised Temporal Action Localization Through Learning Explicit
Subspaces for Action and Context [151.23835595907596]
Methods learn to localize temporal starts and ends of action instances in a video under only video-level supervision.
We introduce a framework that learns two feature subspaces respectively for actions and their context.
The proposed approach outperforms state-of-the-art WS-TAL methods on three benchmarks.
arXiv Detail & Related papers (2021-03-30T08:26:53Z) - Temporal Context Aggregation Network for Temporal Action Proposal
Refinement [93.03730692520999]
Temporal action proposal generation is a challenging yet important task in the video understanding field.
Current methods still suffer from inaccurate temporal boundaries and inferior confidence used for retrieval.
We propose TCANet to generate high-quality action proposals through "local and global" temporal context aggregation.
arXiv Detail & Related papers (2021-03-24T12:34:49Z) - Point-Level Temporal Action Localization: Bridging Fully-supervised
Proposals to Weakly-supervised Losses [84.2964408497058]
Point-level temporal action localization (PTAL) aims to localize actions in untrimmed videos with only one timestamp annotation for each action instance.
Existing methods adopt the frame-level prediction paradigm to learn from the sparse single-frame labels.
This paper attempts to explore the proposal-based prediction paradigm for point-level annotations.
arXiv Detail & Related papers (2020-12-15T12:11:48Z) - Bottom-Up Temporal Action Localization with Mutual Regularization [107.39785866001868]
State-of-the-art solutions for TAL involve evaluating the frame-level probabilities of three action-indicating phases.
We introduce two regularization terms to mutually regularize the learning procedure.
Experiments are performed on two popular TAL datasets, THUMOS14 and ActivityNet1.3.
arXiv Detail & Related papers (2020-02-18T03:59:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.