Weakly Supervised Temporal Action Localization with Segment-Level Labels
- URL: http://arxiv.org/abs/2007.01598v1
- Date: Fri, 3 Jul 2020 10:32:19 GMT
- Title: Weakly Supervised Temporal Action Localization with Segment-Level Labels
- Authors: Xinpeng Ding, Nannan Wang, Xinbo Gao, Jie Li, Xiaoyu Wang and
Tongliang Liu
- Abstract summary: Temporal action localization presents a trade-off between test performance and annotation-time cost.
We introduce a new segment-level supervision setting: segments are labeled when annotators observe actions happening here.
We devise a partial segment loss regarded as a loss sampling to learn integral action parts from labeled segments.
- Score: 140.68096218667162
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Temporal action localization presents a trade-off between test performance
and annotation-time cost. Fully supervised methods achieve good performance
with time-consuming boundary annotations. Weakly supervised methods with
cheaper video-level category label annotations result in worse performance. In
this paper, we introduce a new segment-level supervision setting: segments are
labeled when annotators observe actions happening here. We incorporate this
segment-level supervision along with a novel localization module in the
training. Specifically, we devise a partial segment loss regarded as a loss
sampling to learn integral action parts from labeled segments. Since the
labeled segments are only parts of actions, the model tends to overfit along
with the training process. To tackle this problem, we first obtain a similarity
matrix from discriminative features guided by a sphere loss. Then, a
propagation loss is devised based on the matrix to act as a regularization
term, allowing implicit unlabeled segments propagation during training.
Experiments validate that our method can outperform the video-level supervision
methods with almost same the annotation time.
Related papers
- Efficient and Effective Weakly-Supervised Action Segmentation via Action-Transition-Aware Boundary Alignment [33.74853437611066]
Weakly-supervised action segmentation is a task of learning to partition a long video into several action segments, where training videos are only accompanied by transcripts.
Most of existing methods need to infer pseudo segmentation for training by serial alignment between all frames and the transcript.
We propose a novel Action-Transition-Aware Boundary Alignment framework to efficiently and effectively filter out noisy boundaries and detect transitions.
arXiv Detail & Related papers (2024-03-28T08:39:44Z) - Proposal-based Temporal Action Localization with Point-level Supervision [29.98225940694062]
Point-level supervised temporal action localization (PTAL) aims at recognizing and localizing actions in untrimmed videos.
We propose a novel method that localizes actions by generating and evaluating action proposals of flexible duration.
Experiments show that our proposed method achieves competitive or superior performance to the state-of-the-art methods.
arXiv Detail & Related papers (2023-10-09T08:27:05Z) - Weakly-Supervised Temporal Action Localization with Bidirectional
Semantic Consistency Constraint [83.36913240873236]
Weakly Supervised Temporal Action localization (WTAL) aims to classify and localize temporal boundaries of actions for the video.
We propose a simple yet efficient method, named bidirectional semantic consistency constraint (Bi- SCC) to discriminate the positive actions from co-scene actions.
Experimental results show that our approach outperforms the state-of-the-art methods on THUMOS14 and ActivityNet.
arXiv Detail & Related papers (2023-04-25T07:20:33Z) - Temporal Segment Transformer for Action Segmentation [54.25103250496069]
We propose an attention based approach which we call textittemporal segment transformer, for joint segment relation modeling and denoising.
The main idea is to denoise segment representations using attention between segment and frame representations, and also use inter-segment attention to capture temporal correlations between segments.
We show that this novel architecture achieves state-of-the-art accuracy on the popular 50Salads, GTEA and Breakfast benchmarks.
arXiv Detail & Related papers (2023-02-25T13:05:57Z) - Turning to a Teacher for Timestamp Supervised Temporal Action
Segmentation [27.735478880660164]
We propose a new framework for timestamp supervised temporal action segmentation.
We introduce a teacher model parallel to the segmentation model to help stabilize the process of model optimization.
Our method outperforms the state-of-the-art method and performs comparably against the fully-supervised methods at a much lower annotation cost.
arXiv Detail & Related papers (2022-07-02T02:00:55Z) - Video Activity Localisation with Uncertainties in Temporal Boundary [74.7263952414899]
Methods for video activity localisation over time assume implicitly that activity temporal boundaries are determined and precise.
In unscripted natural videos, different activities transit smoothly, so that it is intrinsically ambiguous to determine in labelling precisely when an activity starts and ends over time.
We introduce Elastic Moment Bounding (EMB) to accommodate flexible and adaptive activity temporal boundaries.
arXiv Detail & Related papers (2022-06-26T16:45:56Z) - Unsupervised Action Segmentation with Self-supervised Feature Learning
and Co-occurrence Parsing [32.66011849112014]
temporal action segmentation is a task to classify each frame in the video with an action label.
In this work we explore a self-supervised method that operates on a corpus of unlabeled videos and predicts a likely set of temporal segments across the videos.
We develop CAP, a novel co-occurrence action parsing algorithm that can not only capture the correlation among sub-actions underlying the structure of activities, but also estimate the temporal trajectory of the sub-actions in an accurate and general way.
arXiv Detail & Related papers (2021-05-29T00:29:40Z) - SegGroup: Seg-Level Supervision for 3D Instance and Semantic
Segmentation [88.22349093672975]
We design a weakly supervised point cloud segmentation algorithm that only requires clicking on one point per instance to indicate its location for annotation.
With over-segmentation for pre-processing, we extend these location annotations into segments as seg-level labels.
We show that our seg-level supervised method (SegGroup) achieves comparable results with the fully annotated point-level supervised methods.
arXiv Detail & Related papers (2020-12-18T13:23:34Z) - On Evaluating Weakly Supervised Action Segmentation Methods [79.42955857919497]
We focus on two aspects of the use and evaluation of weakly supervised action segmentation approaches.
We train each method on the Breakfast dataset 5 times and provide average and standard deviation of the results.
Our experiments show that the standard deviation over these repetitions is between 1 and 2.5% and significantly affects the comparison between different approaches.
arXiv Detail & Related papers (2020-05-19T20:30:31Z) - Weakly Supervised Temporal Action Localization Using Deep Metric
Learning [12.49814373580862]
We propose a weakly supervised temporal action localization method that only requires video-level action instances as supervision during training.
We jointly optimize a balanced binary cross-entropy loss and a metric loss using a standard backpropagation algorithm.
Our approach improves the current state-of-the-art result for THUMOS14 by 6.5% mAP at IoU threshold 0.5, and achieves competitive performance for ActivityNet1.2.
arXiv Detail & Related papers (2020-01-21T22:01:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.