Two-Stream Consensus Network: Submission to HACS Challenge 2021
Weakly-Supervised Learning Track
- URL: http://arxiv.org/abs/2106.10829v1
- Date: Mon, 21 Jun 2021 03:36:36 GMT
- Title: Two-Stream Consensus Network: Submission to HACS Challenge 2021
Weakly-Supervised Learning Track
- Authors: Yuanhao Zhai, Le Wang, David Doermann, Junsong Yuan
- Abstract summary: The goal of weakly-supervised temporal action localization is to temporally locate and classify action of interest in untrimmed videos.
We adopt the two-stream consensus network (TSCN) as the main framework in this challenge.
Our solution ranked 2rd in this challenge, and we hope our method can serve as a baseline for future academic research.
- Score: 78.64815984927425
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: This technical report presents our solution to the HACS Temporal Action
Localization Challenge 2021, Weakly-Supervised Learning Track. The goal of
weakly-supervised temporal action localization is to temporally locate and
classify action of interest in untrimmed videos given only video-level labels.
We adopt the two-stream consensus network (TSCN) as the main framework in this
challenge. The TSCN consists of a two-stream base model training procedure and
a pseudo ground truth learning procedure. The base model training encourages
the model to predict reliable predictions based on single modality (i.e., RGB
or optical flow), based on the fusion of which a pseudo ground truth is
generated and in turn used as supervision to train the base models. On the HACS
v1.1.1 dataset, without fine-tuning the feature-extraction I3D models, our
method achieves 22.20% on the validation set and 21.68% on the testing set in
terms of average mAP. Our solution ranked the 2rd in this challenge, and we
hope our method can serve as a baseline for future academic research.
Related papers
- Joint Spatial-Temporal Modeling and Contrastive Learning for Self-supervised Heart Rate Measurement [28.370473108391426]
This paper briefly introduces the solutions developed by our team, HFUT-VUT, for Track 1 of self-supervised heart rate measurement.
The goal is to develop a self-supervised Physiological for heart rate (HR) using unlabeled facial videos.
Our solutions achieved a remarkable RMSE score of 8.85277 on the test dataset, securing bftext2nd place in Track 1 of the challenge.
arXiv Detail & Related papers (2024-06-07T13:53:02Z) - Adaptive Rentention & Correction for Continual Learning [114.5656325514408]
A common problem in continual learning is the classification layer's bias towards the most recent task.
We name our approach Adaptive Retention & Correction (ARC)
ARC achieves an average performance increase of 2.7% and 2.6% on the CIFAR-100 and Imagenet-R datasets.
arXiv Detail & Related papers (2024-05-23T08:43:09Z) - Test-Time Zero-Shot Temporal Action Localization [58.84919541314969]
ZS-TAL seeks to identify and locate actions in untrimmed videos unseen during training.
Training-based ZS-TAL approaches assume the availability of labeled data for supervised learning.
We introduce a novel method that performs Test-Time adaptation for Temporal Action localization (T3AL)
arXiv Detail & Related papers (2024-04-08T11:54:49Z) - DiffSTG: Probabilistic Spatio-Temporal Graph Forecasting with Denoising
Diffusion Models [53.67562579184457]
This paper focuses on probabilistic STG forecasting, which is challenging due to the difficulty in modeling uncertainties and complex dependencies.
We present the first attempt to generalize the popular denoising diffusion models to STGs, leading to a novel non-autoregressive framework called DiffSTG.
Our approach combines the intrinsic-temporal learning capabilities STNNs with the uncertainty measurements of diffusion models.
arXiv Detail & Related papers (2023-01-31T13:42:36Z) - Semi-supervised Training for Knowledge Base Graph Self-attention
Networks on Link Prediction [20.64973530280006]
This paper investigates the information aggregation coefficient (self-attention) of adjacent nodes and redesigns the self-attention mechanism of the GAT structure.
Inspired by human thinking habits, we designed a semi-supervised self-training method over pre-trained models.
Experimental results show that our proposed self-attention mechanism and semi-supervised self-training method can effectively improve the performance of the link prediction task.
arXiv Detail & Related papers (2022-09-03T07:27:28Z) - Contextualized Spatio-Temporal Contrastive Learning with
Self-Supervision [106.77639982059014]
We present ConST-CL framework to effectively learn-temporally fine-grained representations.
We first design a region-based self-supervised task which requires the model to learn to transform instance representations from one view to another guided by context features.
We then introduce a simple design that effectively reconciles the simultaneous learning of both holistic and local representations.
arXiv Detail & Related papers (2021-12-09T19:13:41Z) - International Workshop on Continual Semi-Supervised Learning:
Introduction, Benchmarks and Baselines [20.852277473776617]
The aim of this paper is to formalize a new continual semi-supervised learning (CSSL) paradigm.
The paper introduces two new benchmarks specifically designed to assess CSSL on two important computer vision tasks.
We describe the Continual Activity Recognition (CAR) and Continual Crowd Counting (CCC) challenges built upon those benchmarks, the baseline models proposed for the challenges, and describe a simple CSSL baseline.
arXiv Detail & Related papers (2021-10-27T17:34:40Z) - 2nd Place Solution for SODA10M Challenge 2021 -- Continual Detection
Track [35.06282647572304]
We adapt ResNet50-FPN as the baseline and try several improvements for the final submission model.
We find that task-specific replay scheme, learning rate scheduling, model calibration, and using original image scale helps to improve performance for both large and small objects in images.
arXiv Detail & Related papers (2021-10-25T15:58:19Z) - Source-Free Open Compound Domain Adaptation in Semantic Segmentation [99.82890571842603]
In SF-OCDA, only the source pre-trained model and the target data are available to learn the target model.
We propose the Cross-Patch Style Swap (CPSS) to diversify samples with various patch styles in the feature-level.
Our method produces state-of-the-art results on the C-Driving dataset.
arXiv Detail & Related papers (2021-06-07T08:38:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.