ADM-Loc: Actionness Distribution Modeling for Point-supervised Temporal
Action Localization
- URL: http://arxiv.org/abs/2311.15916v1
- Date: Mon, 27 Nov 2023 15:24:54 GMT
- Title: ADM-Loc: Actionness Distribution Modeling for Point-supervised Temporal
Action Localization
- Authors: Elahe Vahdani, Yingli Tian
- Abstract summary: This paper addresses the challenge of point-supervised temporal action detection, in which only one frame per action instance is annotated in the training set.
It proposes a novel framework termed ADM-Loc, which stands for Actionness Distribution Modeling for point-supervised action localization.
- Score: 31.314383098734922
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This paper addresses the challenge of point-supervised temporal action
detection, in which only one frame per action instance is annotated in the
training set. Self-training aims to provide supplementary supervision for the
training process by generating pseudo-labels (action proposals) from a base
model. However, most current methods generate action proposals by applying
manually designed thresholds to action classification probabilities and
treating adjacent snippets as independent entities. As a result, these methods
struggle to generate complete action proposals, exhibit sensitivity to
fluctuations in action classification scores, and generate redundant and
overlapping action proposals. This paper proposes a novel framework termed
ADM-Loc, which stands for Actionness Distribution Modeling for point-supervised
action Localization. ADM-Loc generates action proposals by fitting a composite
distribution, comprising both Gaussian and uniform distributions, to the action
classification signals. This fitting process is tailored to each action class
present in the video and is applied separately for each action instance,
ensuring the distinctiveness of their distributions. ADM-Loc significantly
enhances the alignment between the generated action proposals and ground-truth
action instances and offers high-quality pseudo-labels for self-training.
Moreover, to model action boundary snippets, it enforces consistency in action
classification scores during training by employing Gaussian kernels, supervised
with the proposed loss functions. ADM-Loc outperforms the state-of-the-art
point-supervised methods on THUMOS14 and ActivityNet-v1.2 datasets.
Related papers
- Towards Completeness: A Generalizable Action Proposal Generator for Zero-Shot Temporal Action Localization [31.82121743586165]
Generalizable Action Proposal generator (GAP) is built in a query-based architecture and trained with a proposal-level objective.
Based on this architecture, we propose an Action-aware Discrimination loss to enhance the category-agnostic dynamic information of actions.
Our experiments show that our GAP achieves state-of-the-art performance on two challenging ZSTAL benchmarks.
arXiv Detail & Related papers (2024-08-25T09:07:06Z) - Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation [49.827306773992376]
Continual Test-Time Adaptation (CTTA) is proposed to migrate a source pre-trained model to continually changing target distributions.
Our proposed method attains state-of-the-art performance in both classification and segmentation CTTA tasks.
arXiv Detail & Related papers (2023-12-19T15:34:52Z) - POTLoc: Pseudo-Label Oriented Transformer for Point-Supervised Temporal Action Localization [26.506893363676678]
This paper proposes POTLoc, a Pseudo-label Oriented Transformer for weakly-supervised Action localization.
POTLoc is designed to identify and track continuous action structures via a self-training strategy.
It outperforms the state-of-the-art point-supervised methods on THUMOS'14 and ActivityNet-v1.2 datasets.
arXiv Detail & Related papers (2023-10-20T15:28:06Z) - Proposal-based Temporal Action Localization with Point-level Supervision [29.98225940694062]
Point-level supervised temporal action localization (PTAL) aims at recognizing and localizing actions in untrimmed videos.
We propose a novel method that localizes actions by generating and evaluating action proposals of flexible duration.
Experiments show that our proposed method achieves competitive or superior performance to the state-of-the-art methods.
arXiv Detail & Related papers (2023-10-09T08:27:05Z) - Proposal-Based Multiple Instance Learning for Weakly-Supervised Temporal
Action Localization [98.66318678030491]
Weakly-supervised temporal action localization aims to localize and recognize actions in untrimmed videos with only video-level category labels during training.
We propose a novel Proposal-based Multiple Instance Learning (P-MIL) framework that directly classifies the candidate proposals in both the training and testing stages.
arXiv Detail & Related papers (2023-05-29T02:48:04Z) - Diffusion Action Segmentation [63.061058214427085]
We propose a novel framework via denoising diffusion models, which shares the same inherent spirit of such iterative refinement.
In this framework, action predictions are iteratively generated from random noise with input video features as conditions.
arXiv Detail & Related papers (2023-03-31T10:53:24Z) - ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal
Action Localization [36.90693762365237]
Weakly-supervised temporal action localization aims to recognize and localize action segments in untrimmed videos given only video-level action labels for training.
We propose system, a novel WTAL framework that enables explicit, action-aware segment modeling beyond standard MIL-based methods.
Our framework entails three segment-centric components: (i) dynamic segment sampling for compensating the contribution of short actions; (ii) intra- and inter-segment attention for modeling action dynamics and capturing temporal dependencies; (iii) pseudo instance-level supervision for improving action boundary prediction.
arXiv Detail & Related papers (2022-03-29T01:59:26Z) - Adaptive Mutual Supervision for Weakly-Supervised Temporal Action
Localization [92.96802448718388]
We introduce an adaptive mutual supervision framework (AMS) for temporal action localization.
The proposed AMS method significantly outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2021-04-06T08:31:10Z) - Modeling Multi-Label Action Dependencies for Temporal Action
Localization [53.53490517832068]
Real-world videos contain many complex actions with inherent relationships between action classes.
We propose an attention-based architecture that models these action relationships for the task of temporal action localization in unoccurrence videos.
We show improved performance over state-of-the-art methods on multi-label action localization benchmarks.
arXiv Detail & Related papers (2021-03-04T13:37:28Z) - Two-Stream Consensus Network for Weakly-Supervised Temporal Action
Localization [94.37084866660238]
We present a Two-Stream Consensus Network (TSCN) to simultaneously address these challenges.
The proposed TSCN features an iterative refinement training method, where a frame-level pseudo ground truth is iteratively updated.
We propose a new attention normalization loss to encourage the predicted attention to act like a binary selection, and promote the precise localization of action instance boundaries.
arXiv Detail & Related papers (2020-10-22T10:53:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.