Adaptive Mutual Supervision for Weakly-Supervised Temporal Action
Localization
- URL: http://arxiv.org/abs/2104.02357v1
- Date: Tue, 6 Apr 2021 08:31:10 GMT
- Title: Adaptive Mutual Supervision for Weakly-Supervised Temporal Action
Localization
- Authors: Chen Ju, Peisen Zhao, Siheng Chen, Ya Zhang, Xiaoyun Zhang, Qi Tian
- Abstract summary: We introduce an adaptive mutual supervision framework (AMS) for temporal action localization.
The proposed AMS method significantly outperforms the state-of-the-art methods.
- Score: 92.96802448718388
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Weakly-supervised temporal action localization aims to localize actions in
untrimmed videos with only video-level action category labels. Most of previous
methods ignore the incompleteness issue of Class Activation Sequences (CAS),
suffering from trivial localization results. To solve this issue, we introduce
an adaptive mutual supervision framework (AMS) with two branches, where the
base branch adopts CAS to localize the most discriminative action regions,
while the supplementary branch localizes the less discriminative action regions
through a novel adaptive sampler. The adaptive sampler dynamically updates the
input of the supplementary branch with a sampling weight sequence negatively
correlated with the CAS from the base branch, thereby prompting the
supplementary branch to localize the action regions underestimated by the base
branch. To promote mutual enhancement between these two branches, we construct
mutual location supervision. Each branch leverages location pseudo-labels
generated from the other branch as localization supervision. By alternately
optimizing the two branches in multiple iterations, we progressively complete
action regions. Extensive experiments on THUMOS14 and ActivityNet1.2
demonstrate that the proposed AMS method significantly outperforms the
state-of-the-art methods.
Related papers
- ADM-Loc: Actionness Distribution Modeling for Point-supervised Temporal
Action Localization [31.314383098734922]
This paper addresses the challenge of point-supervised temporal action detection, in which only one frame per action instance is annotated in the training set.
It proposes a novel framework termed ADM-Loc, which stands for Actionness Distribution Modeling for point-supervised action localization.
arXiv Detail & Related papers (2023-11-27T15:24:54Z) - Distilling Vision-Language Pre-training to Collaborate with
Weakly-Supervised Temporal Action Localization [77.19173283023012]
Weakly-supervised temporal action localization learns to detect and classify action instances with only category labels.
Most methods widely adopt the off-the-shelf Classification-Based Pre-training (CBP) to generate video features for action localization.
arXiv Detail & Related papers (2022-12-19T10:02:50Z) - Estimation of Reliable Proposal Quality for Temporal Action Detection [71.5989469643732]
We propose a new method that gives insights into moment and region perspectives simultaneously to align the two tasks by acquiring reliable proposal quality.
For the moment perspective, Boundary Evaluate Module (BEM) is designed which focuses on local appearance and motion evolvement to estimate boundary quality.
For the region perspective, we introduce Region Evaluate Module (REM) which uses a new and efficient sampling method for proposal feature representation.
arXiv Detail & Related papers (2022-04-25T14:33:49Z) - Fine-grained Temporal Contrastive Learning for Weakly-supervised
Temporal Action Localization [87.47977407022492]
This paper argues that learning by contextually comparing sequence-to-sequence distinctions offers an essential inductive bias in weakly-supervised action localization.
Under a differentiable dynamic programming formulation, two complementary contrastive objectives are designed, including Fine-grained Sequence Distance (FSD) contrasting and Longest Common Subsequence (LCS) contrasting.
Our method achieves state-of-the-art performance on two popular benchmarks.
arXiv Detail & Related papers (2022-03-31T05:13:50Z) - Domain Adaptive Semantic Segmentation with Regional Contrastive
Consistency Regularization [19.279884432843822]
We propose a novel and fully end-to-end trainable approach, called regional contrastive consistency regularization (RCCR) for domain adaptive semantic segmentation.
Our core idea is to pull the similar regional features extracted from the same location of different images to be closer, and meanwhile push the features from the different locations of the two images to be separated.
arXiv Detail & Related papers (2021-10-11T11:45:00Z) - Foreground-Action Consistency Network for Weakly Supervised Temporal
Action Localization [66.66545680550782]
We present a framework named FAC-Net, on which three branches are appended, named class-wise foreground classification branch, class-agnostic attention branch and multiple instance learning branch.
First, our class-wise foreground classification branch regularizes the relation between actions and foreground to maximize the foreground-background separation.
Besides, the class-agnostic attention branch and multiple instance learning branch are adopted to regularize the foreground-action consistency and help to learn a meaningful foreground.
arXiv Detail & Related papers (2021-08-14T12:34:44Z) - Action Shuffling for Weakly Supervised Temporal Localization [22.43209053892713]
This paper analyzes the order-sensitive and location-insensitive properties of actions.
It embodies them into a self-augmented learning framework to improve the weakly supervised action localization performance.
arXiv Detail & Related papers (2021-05-10T09:05:58Z) - Two-Stream Consensus Network for Weakly-Supervised Temporal Action
Localization [94.37084866660238]
We present a Two-Stream Consensus Network (TSCN) to simultaneously address these challenges.
The proposed TSCN features an iterative refinement training method, where a frame-level pseudo ground truth is iteratively updated.
We propose a new attention normalization loss to encourage the predicted attention to act like a binary selection, and promote the precise localization of action instance boundaries.
arXiv Detail & Related papers (2020-10-22T10:53:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.