Dilation-Erosion for Single-Frame Supervised Temporal Action
Localization
- URL: http://arxiv.org/abs/2212.06348v1
- Date: Tue, 13 Dec 2022 03:05:13 GMT
- Title: Dilation-Erosion for Single-Frame Supervised Temporal Action
Localization
- Authors: Bin Wang, Yan Song, Fanming Wang, Yang Zhao, Xiangbo Shu, Yan Rui
- Abstract summary: We present the Snippet Classification model and the Dilation-Erosion module.
The Dilation-Erosion module mines pseudo snippet-level ground-truth, hard backgrounds and evident backgrounds.
Experiments on THUMOS14 and ActivityNet 1.2 validate the effectiveness of the proposed method.
- Score: 28.945067347089825
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: To balance the annotation labor and the granularity of supervision,
single-frame annotation has been introduced in temporal action localization. It
provides a rough temporal location for an action but implicitly overstates the
supervision from the annotated-frame during training, leading to the confusion
between actions and backgrounds, i.e., action incompleteness and background
false positives. To tackle the two challenges, in this work, we present the
Snippet Classification model and the Dilation-Erosion module. In the
Dilation-Erosion module, we expand the potential action segments with a loose
criterion to alleviate the problem of action incompleteness and then remove the
background from the potential action segments to alleviate the problem of
action incompleteness. Relying on the single-frame annotation and the output of
the snippet classification, the Dilation-Erosion module mines pseudo
snippet-level ground-truth, hard backgrounds and evident backgrounds, which in
turn further trains the Snippet Classification model. It forms a cyclic
dependency. Furthermore, we propose a new embedding loss to aggregate the
features of action instances with the same label and separate the features of
actions from backgrounds. Experiments on THUMOS14 and ActivityNet 1.2 validate
the effectiveness of the proposed method. Code has been made publicly available
(https://github.com/LingJun123/single-frame-TAL).
Related papers
- Weakly-Supervised Temporal Action Localization with Bidirectional
Semantic Consistency Constraint [83.36913240873236]
Weakly Supervised Temporal Action localization (WTAL) aims to classify and localize temporal boundaries of actions for the video.
We propose a simple yet efficient method, named bidirectional semantic consistency constraint (Bi- SCC) to discriminate the positive actions from co-scene actions.
Experimental results show that our approach outperforms the state-of-the-art methods on THUMOS14 and ActivityNet.
arXiv Detail & Related papers (2023-04-25T07:20:33Z) - Weakly-Supervised Temporal Action Localization by Inferring Salient
Snippet-Feature [26.7937345622207]
Weakly-supervised temporal action localization aims to locate action regions and identify action categories in unsupervised videos simultaneously.
Pseudo label generation is a promising strategy to solve the challenging problem, but the current methods ignore the natural temporal structure of the video.
We propose a novel weakly-supervised temporal action localization method by inferring salient snippet-feature.
arXiv Detail & Related papers (2023-03-22T06:08:34Z) - Forcing the Whole Video as Background: An Adversarial Learning Strategy
for Weakly Temporal Action Localization [6.919243767837342]
We present an adversarial learning strategy to break the limitation of mining pseudo background snippets.
A novel temporal enhancement network is designed to facilitate the model to construct temporal relation of affinity snippets.
arXiv Detail & Related papers (2022-07-14T05:13:50Z) - SegTAD: Precise Temporal Action Detection via Semantic Segmentation [65.01826091117746]
We formulate the task of temporal action detection in a novel perspective of semantic segmentation.
Owing to the 1-dimensional property of TAD, we are able to convert the coarse-grained detection annotations to fine-grained semantic segmentation annotations for free.
We propose an end-to-end framework SegTAD composed of a 1D semantic segmentation network (1D-SSN) and a proposal detection network (PDN)
arXiv Detail & Related papers (2022-03-03T06:52:13Z) - Action Shuffling for Weakly Supervised Temporal Localization [22.43209053892713]
This paper analyzes the order-sensitive and location-insensitive properties of actions.
It embodies them into a self-augmented learning framework to improve the weakly supervised action localization performance.
arXiv Detail & Related papers (2021-05-10T09:05:58Z) - ACM-Net: Action Context Modeling Network for Weakly-Supervised Temporal
Action Localization [18.56421375743287]
We propose an action-context modeling network termed ACM-Net.
It integrates a three-branch attention module to measure the likelihood of each temporal point being action instance, context, or non-action background, simultaneously.
Our method can outperform current state-of-the-art methods, and even achieve comparable performance with fully-supervised methods.
arXiv Detail & Related papers (2021-04-07T07:39:57Z) - Weakly Supervised Temporal Action Localization Through Learning Explicit
Subspaces for Action and Context [151.23835595907596]
Methods learn to localize temporal starts and ends of action instances in a video under only video-level supervision.
We introduce a framework that learns two feature subspaces respectively for actions and their context.
The proposed approach outperforms state-of-the-art WS-TAL methods on three benchmarks.
arXiv Detail & Related papers (2021-03-30T08:26:53Z) - D2-Net: Weakly-Supervised Action Localization via Discriminative
Embeddings and Denoised Activations [172.05295776806773]
This work proposes a weakly-supervised temporal action localization framework, called D2-Net.
Our main contribution is the introduction of a novel loss formulation, which jointly enhances the discriminability of latent embeddings.
Our D2-Net performs favorably in comparison to the existing methods on two datasets.
arXiv Detail & Related papers (2020-12-11T16:01:56Z) - Temporal Action Detection with Multi-level Supervision [116.55596693897388]
We introduce the Semi-supervised Action Detection (SSAD) task with a mixture of labeled and unlabeled data.
We analyze different types of errors in the proposed SSAD baselines which are directly adapted from the semi-supervised classification task.
We incorporate weakly-labeled data into SSAD and propose Omni-supervised Action Detection (OSAD) with three levels of supervision.
arXiv Detail & Related papers (2020-11-24T04:45:17Z) - Weakly Supervised Temporal Action Localization with Segment-Level Labels [140.68096218667162]
Temporal action localization presents a trade-off between test performance and annotation-time cost.
We introduce a new segment-level supervision setting: segments are labeled when annotators observe actions happening here.
We devise a partial segment loss regarded as a loss sampling to learn integral action parts from labeled segments.
arXiv Detail & Related papers (2020-07-03T10:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.