Weakly Supervised Temporal Action Localization via Representative
Snippet Knowledge Propagation
- URL: http://arxiv.org/abs/2203.02925v3
- Date: Thu, 10 Mar 2022 12:13:12 GMT
- Title: Weakly Supervised Temporal Action Localization via Representative
Snippet Knowledge Propagation
- Authors: Linjiang Huang, Liang Wang, Hongsheng Li
- Abstract summary: Weakly supervised temporal action localization aims to localize temporal boundaries of actions and simultaneously identify their categories with only video-level category labels.
Many existing methods seek to generate pseudo labels for bridging the discrepancy between classification and localization, but usually only make use of limited contextual information for pseudo label generation.
Our method seeks to mine the representative snippets in each video for propagating information between video snippets to generate better pseudo labels.
- Score: 36.86505596138256
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Weakly supervised temporal action localization aims to localize temporal
boundaries of actions and simultaneously identify their categories with only
video-level category labels. Many existing methods seek to generate pseudo
labels for bridging the discrepancy between classification and localization,
but usually only make use of limited contextual information for pseudo label
generation. To alleviate this problem, we propose a representative snippet
summarization and propagation framework. Our method seeks to mine the
representative snippets in each video for propagating information between video
snippets to generate better pseudo labels. For each video, its own
representative snippets and the representative snippets from a memory bank are
propagated to update the input features in an intra- and inter-video manner.
The pseudo labels are generated from the temporal class activation maps of the
updated features to rectify the predictions of the main branch. Our method
obtains superior performance in comparison to the existing methods on two
benchmarks, THUMOS14 and ActivityNet1.3, achieving gains as high as 1.2% in
terms of average mAP on THUMOS14.
Related papers
- Proposal-based Temporal Action Localization with Point-level Supervision [29.98225940694062]
Point-level supervised temporal action localization (PTAL) aims at recognizing and localizing actions in untrimmed videos.
We propose a novel method that localizes actions by generating and evaluating action proposals of flexible duration.
Experiments show that our proposed method achieves competitive or superior performance to the state-of-the-art methods.
arXiv Detail & Related papers (2023-10-09T08:27:05Z) - Weakly-Supervised Action Localization by Hierarchically-structured
Latent Attention Modeling [19.683714649646603]
Weakly-supervised action localization aims to recognize and localize action instancese in untrimmed videos with only video-level labels.
Most existing models rely on multiple instance learning(MIL), where predictions of unlabeled instances are supervised by classifying labeled bags.
We propose a novel attention-based hierarchically-structured latent model to learn the temporal variations of feature semantics.
arXiv Detail & Related papers (2023-08-19T08:45:49Z) - Weakly-Supervised Temporal Action Localization with Bidirectional
Semantic Consistency Constraint [83.36913240873236]
Weakly Supervised Temporal Action localization (WTAL) aims to classify and localize temporal boundaries of actions for the video.
We propose a simple yet efficient method, named bidirectional semantic consistency constraint (Bi- SCC) to discriminate the positive actions from co-scene actions.
Experimental results show that our approach outperforms the state-of-the-art methods on THUMOS14 and ActivityNet.
arXiv Detail & Related papers (2023-04-25T07:20:33Z) - Weakly-Supervised Temporal Action Localization by Inferring Salient
Snippet-Feature [26.7937345622207]
Weakly-supervised temporal action localization aims to locate action regions and identify action categories in unsupervised videos simultaneously.
Pseudo label generation is a promising strategy to solve the challenging problem, but the current methods ignore the natural temporal structure of the video.
We propose a novel weakly-supervised temporal action localization method by inferring salient snippet-feature.
arXiv Detail & Related papers (2023-03-22T06:08:34Z) - Timestamp-Supervised Action Segmentation from the Perspective of
Clustering [12.661218632080207]
Most existing methods generate pseudo-labels for all frames in each video to train the segmentation model.
We propose a novel framework from the perspective of clustering, which includes the following two parts.
iterative clustering iteratively propagates the pseudo-labels to the ambiguous intervals by clustering, and thus updates the pseudo-label sequences to train the model.
arXiv Detail & Related papers (2022-12-22T13:35:00Z) - Unsupervised Pre-training for Temporal Action Localization Tasks [76.01985780118422]
We propose a self-supervised pretext task, coined as Pseudo Action localization (PAL) to Unsupervisedly Pre-train feature encoders for Temporal Action localization tasks (UP-TAL)
Specifically, we first randomly select temporal regions, each of which contains multiple clips, from one video as pseudo actions and then paste them onto different temporal positions of the other two videos.
The pretext task is to align the features of pasted pseudo action regions from two synthetic videos and maximize the agreement between them.
arXiv Detail & Related papers (2022-03-25T12:13:43Z) - Refining Pseudo Labels with Clustering Consensus over Generations for
Unsupervised Object Re-identification [84.72303377833732]
Unsupervised object re-identification targets at learning discriminative representations for object retrieval without any annotations.
We propose to estimate pseudo label similarities between consecutive training generations with clustering consensus and refine pseudo labels with temporally propagated and ensembled pseudo labels.
The proposed pseudo label refinery strategy is simple yet effective and can be seamlessly integrated into existing clustering-based unsupervised re-identification methods.
arXiv Detail & Related papers (2021-06-11T02:42:42Z) - Dual-Refinement: Joint Label and Feature Refinement for Unsupervised
Domain Adaptive Person Re-Identification [51.98150752331922]
Unsupervised domain adaptive (UDA) person re-identification (re-ID) is a challenging task due to the missing of labels for the target domain data.
We propose a novel approach, called Dual-Refinement, that jointly refines pseudo labels at the off-line clustering phase and features at the on-line training phase.
Our method outperforms the state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2020-12-26T07:35:35Z) - Weakly Supervised Temporal Action Localization with Segment-Level Labels [140.68096218667162]
Temporal action localization presents a trade-off between test performance and annotation-time cost.
We introduce a new segment-level supervision setting: segments are labeled when annotators observe actions happening here.
We devise a partial segment loss regarded as a loss sampling to learn integral action parts from labeled segments.
arXiv Detail & Related papers (2020-07-03T10:32:19Z) - Action Graphs: Weakly-supervised Action Localization with Graph
Convolution Networks [25.342482374259017]
We present a method for weakly-supervised action localization based on graph convolutions.
Our method utilizes similarity graphs that encode appearance and motion, and pushes the state of the art on THUMOS '14, ActivityNet 1.2, and Charades for weakly supervised action localization.
arXiv Detail & Related papers (2020-02-04T18:21:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.