HR-Pro: Point-supervised Temporal Action Localization via Hierarchical
Reliability Propagation
- URL: http://arxiv.org/abs/2308.12608v3
- Date: Sat, 27 Jan 2024 05:47:39 GMT
- Title: HR-Pro: Point-supervised Temporal Action Localization via Hierarchical
Reliability Propagation
- Authors: Huaxin Zhang, Xiang Wang, Xiaohao Xu, Zhiwu Qing, Changxin Gao, Nong
Sang
- Abstract summary: Point-supervised Temporal Action Localization (PSTAL) is an emerging research direction for label-efficient learning.
We propose a Hierarchical Reliability Propagation framework, which consists of two reliability-aware stages: Snippet-level Discrimination Learning and Instance-level Completeness Learning.
Our HR-Pro achieves state-of-the-art performance on multiple challenging benchmarks, including an impressive average mAP of 60.3% on THUMOS14.
- Score: 40.52832708232682
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Point-supervised Temporal Action Localization (PSTAL) is an emerging research
direction for label-efficient learning. However, current methods mainly focus
on optimizing the network either at the snippet-level or the instance-level,
neglecting the inherent reliability of point annotations at both levels. In
this paper, we propose a Hierarchical Reliability Propagation (HR-Pro)
framework, which consists of two reliability-aware stages: Snippet-level
Discrimination Learning and Instance-level Completeness Learning, both stages
explore the efficient propagation of high-confidence cues in point annotations.
For snippet-level learning, we introduce an online-updated memory to store
reliable snippet prototypes for each class. We then employ a Reliability-aware
Attention Block to capture both intra-video and inter-video dependencies of
snippets, resulting in more discriminative and robust snippet representation.
For instance-level learning, we propose a point-based proposal generation
approach as a means of connecting snippets and instances, which produces
high-confidence proposals for further optimization at the instance level.
Through multi-level reliability-aware learning, we obtain more reliable
confidence scores and more accurate temporal boundaries of predicted proposals.
Our HR-Pro achieves state-of-the-art performance on multiple challenging
benchmarks, including an impressive average mAP of 60.3% on THUMOS14. Notably,
our HR-Pro largely surpasses all previous point-supervised methods, and even
outperforms several competitive fully supervised methods. Code will be
available at https://github.com/pipixin321/HR-Pro.
Related papers
- Cross Domain Object Detection via Multi-Granularity Confidence Alignment based Mean Teacher [14.715398100791559]
Cross domain object detection learns an object detector for an unlabeled target domain by transferring knowledge from an annotated source domain.
In this study, we find that confidence misalignment of the predictions, including category-level overconfidence, instance-level task confidence inconsistency, and image-level confidence misfocusing, will bring suboptimal performance on the target domain.
arXiv Detail & Related papers (2024-07-10T15:56:24Z) - Revisiting and Maximizing Temporal Knowledge in Semi-supervised Semantic Segmentation [7.005068872406135]
Mean Teacher- and co-training-based approaches are employed to mitigate confirmation bias and coupling problems.
These approaches frequently involve complex training pipelines and a substantial computational burden.
We propose a PrevMatch framework that effectively mitigates the limitations by maximizing the utilization of the temporal knowledge obtained during the training process.
arXiv Detail & Related papers (2024-05-31T03:54:59Z) - BECLR: Batch Enhanced Contrastive Few-Shot Learning [1.450405446885067]
Unsupervised few-shot learning aspires to bridge this gap by discarding the reliance on annotations at training time.
We propose a novel Dynamic Clustered mEmory (DyCE) module to promote a highly separable latent representation space.
We then tackle the, somehow overlooked yet critical, issue of sample bias at the few-shot inference stage.
arXiv Detail & Related papers (2024-02-04T10:52:43Z) - Semi-Supervised Class-Agnostic Motion Prediction with Pseudo Label
Regeneration and BEVMix [59.55173022987071]
We study the potential of semi-supervised learning for class-agnostic motion prediction.
Our framework adopts a consistency-based self-training paradigm, enabling the model to learn from unlabeled data.
Our method exhibits comparable performance to weakly and some fully supervised methods.
arXiv Detail & Related papers (2023-12-13T09:32:50Z) - Towards End-to-end Semi-supervised Learning for One-stage Object
Detection [88.56917845580594]
This paper focuses on the semi-supervised learning for the advanced and popular one-stage detection network YOLOv5.
We propose a novel teacher-student learning recipe called OneTeacher with two innovative designs, namely Multi-view Pseudo-label Refinement (MPR) and Decoupled Semi-supervised Optimization (DSO)
In particular, MPR improves the quality of pseudo-labels via augmented-view refinement and global-view filtering, and DSO handles the joint optimization conflicts via structure tweaks and task-specific pseudo-labeling.
arXiv Detail & Related papers (2023-02-22T11:35:40Z) - Active Learning with Effective Scoring Functions for Semi-Supervised
Temporal Action Localization [15.031156121516211]
This paper focuses on a rarely investigated yet practical task named semi-supervised TAL.
We propose an effective active learning method, named AL-STAL.
Experiment results show that AL-STAL outperforms the existing competitors and achieves satisfying performance compared with fully-supervised learning.
arXiv Detail & Related papers (2022-08-31T13:39:38Z) - Spatio-temporal Relation Modeling for Few-shot Action Recognition [100.3999454780478]
We propose a few-shot action recognition framework, STRM, which enhances class-specific featureriminability while simultaneously learning higher-order temporal representations.
Our approach achieves an absolute gain of 3.5% in classification accuracy, as compared to the best existing method in the literature.
arXiv Detail & Related papers (2021-12-09T18:59:14Z) - Activation to Saliency: Forming High-Quality Labels for Unsupervised
Salient Object Detection [54.92703325989853]
We propose a two-stage Activation-to-Saliency (A2S) framework that effectively generates high-quality saliency cues.
No human annotations are involved in our framework during the whole training process.
Our framework reports significant performance compared with existing USOD methods.
arXiv Detail & Related papers (2021-12-07T11:54:06Z) - SIMPLE: SIngle-network with Mimicking and Point Learning for Bottom-up
Human Pose Estimation [81.03485688525133]
We propose a novel multi-person pose estimation framework, SIngle-network with Mimicking and Point Learning for Bottom-up Human Pose Estimation (SIMPLE)
Specifically, in the training process, we enable SIMPLE to mimic the pose knowledge from the high-performance top-down pipeline.
Besides, SIMPLE formulates human detection and pose estimation as a unified point learning framework to complement each other in single-network.
arXiv Detail & Related papers (2021-04-06T13:12:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.