Relation Modeling in Spatio-Temporal Action Localization
- URL: http://arxiv.org/abs/2106.08061v2
- Date: Wed, 16 Jun 2021 07:00:12 GMT
- Title: Relation Modeling in Spatio-Temporal Action Localization
- Authors: Yutong Feng, Jianwen Jiang, Ziyuan Huang, Zhiwu Qing, Xiang Wang,
Shiwei Zhang, Mingqian Tang, Yue Gao
- Abstract summary: This paper presents our solution to the AVA-Kinetics Crossover Challenge of ActivityNet workshop at CVPR 2021.
Our solution utilizes multiple types of relation methods for relation-temporal action detection and adopts a training strategy to integrate multiple relation modeling in end-to-end training over the two large-scale video datasets.
We finally achieve 40.67 mAP on the test set of AVA-Kinetics.
- Score: 25.09128518931016
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents our solution to the AVA-Kinetics Crossover Challenge of
ActivityNet workshop at CVPR 2021. Our solution utilizes multiple types of
relation modeling methods for spatio-temporal action detection and adopts a
training strategy to integrate multiple relation modeling in end-to-end
training over the two large-scale video datasets. Learning with memory bank and
finetuning for long-tailed distribution are also investigated to further
improve the performance. In this paper, we detail the implementations of our
solution and provide experiments results and corresponding discussions. We
finally achieve 40.67 mAP on the test set of AVA-Kinetics.
Related papers
- Action Recognition Using Temporal Shift Module and Ensemble Learning [0.0]
The paper presents the first-rank solution for the Multi-Modal Action Recognition Challenge, part of the Multi-Modal Visual Pattern Recognition Workshop at the aclICPR 2024.
The competition aimed to recognize human actions using a diverse dataset of 20 action classes, collected from multi-modal sources.
Our solution achieved a perfect top-1 accuracy on the test set, demonstrating the effectiveness of the proposed approach in recognizing human actions across 20 classes.
arXiv Detail & Related papers (2025-01-29T10:36:55Z) - An Active Learning Framework for Inclusive Generation by Large Language Models [32.16984263644299]
Large Language Models (LLMs) generate text representative of diverse sub-populations.
We propose a novel clustering-based active learning framework, enhanced with knowledge distillation.
We construct two new datasets in tandem with model training, showing a performance improvement of 2%-10% over baseline models.
arXiv Detail & Related papers (2024-10-17T15:09:35Z) - The Solution for Temporal Action Localisation Task of Perception Test Challenge 2024 [27.30100635072298]
TAL focuses on identifying and classifying actions within specific time intervals throughout a video sequence.
We employ a data augmentation technique by expanding the training dataset using overlapping labels from the Something-SomethingV2 dataset.
For feature extraction, we utilize state-of-the-art models, including UMT, VideoMAEv2 for video features, and BEATs and CAV-MAE for audio features.
arXiv Detail & Related papers (2024-10-08T01:07:21Z) - Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion [53.33473557562837]
Solving multi-objective optimization problems for large deep neural networks is a challenging task due to the complexity of the loss landscape and the expensive computational cost.
We propose a practical and scalable approach to solve this problem via mixture of experts (MoE) based model fusion.
By ensembling the weights of specialized single-task models, the MoE module can effectively capture the trade-offs between multiple objectives.
arXiv Detail & Related papers (2024-06-14T07:16:18Z) - Ensemble Modeling for Multimodal Visual Action Recognition [50.38638300332429]
We propose an ensemble modeling approach for multimodal action recognition.
We independently train individual modality models using a variant of focal loss tailored to handle the long-tailed distribution of the MECCANO [21] dataset.
arXiv Detail & Related papers (2023-08-10T08:43:20Z) - An Information-Theoretic Approach for Estimating Scenario Generalization
in Crowd Motion Prediction [27.10815774845461]
We propose a novel scoring method, which characterizes generalization of models trained on source crowd scenarios and applied to target crowd scenarios.
The Interaction component aims to characterize the difficulty of scenario domains, while the diversity of a scenario domain is captured in the Diversity score.
Our experimental results validate the efficacy of the proposed method on several simulated and real-world (source,target) generalization tasks.
arXiv Detail & Related papers (2022-11-02T01:39:30Z) - Multitask Adaptation by Retrospective Exploration with Learned World
Models [77.34726150561087]
We propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from task-agnostic storage.
The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage.
arXiv Detail & Related papers (2021-10-25T20:02:57Z) - Multi-Scale Aligned Distillation for Low-Resolution Detection [68.96325141432078]
This paper focuses on boosting the performance of low-resolution models by distilling knowledge from a high- or multi-resolution model.
On several instance-level detection tasks and datasets, the low-resolution models trained via our approach perform competitively with high-resolution models trained via conventional multi-scale training.
arXiv Detail & Related papers (2021-09-14T12:53:35Z) - Two-Stream Consensus Network: Submission to HACS Challenge 2021
Weakly-Supervised Learning Track [78.64815984927425]
The goal of weakly-supervised temporal action localization is to temporally locate and classify action of interest in untrimmed videos.
We adopt the two-stream consensus network (TSCN) as the main framework in this challenge.
Our solution ranked 2rd in this challenge, and we hope our method can serve as a baseline for future academic research.
arXiv Detail & Related papers (2021-06-21T03:36:36Z) - Ada-Segment: Automated Multi-loss Adaptation for Panoptic Segmentation [95.31590177308482]
We propose an automated multi-loss adaptation (named Ada-Segment) to flexibly adjust multiple training losses over the course of training.
With an end-to-end architecture, Ada-Segment generalizes to different datasets without the need of re-tuning hyper parameters.
Ada-Segment brings 2.7% panoptic quality (PQ) improvement on COCO val split from the vanilla baseline, achieving the state-of-the-art 48.5% PQ on COCO test-dev split and 32.9% PQ on ADE20K dataset.
arXiv Detail & Related papers (2020-12-07T11:43:10Z) - Model-based Reinforcement Learning for Semi-Markov Decision Processes
with Neural ODEs [30.36381338938319]
We present two solutions for modeling continuous-time dynamics using neural ordinary differential equations (ODEs)
Our models accurately characterize continuous-time dynamics and enable us to develop high-performing policies using a small amount of data.
We experimentally demonstrate the efficacy of our methods across various continuous-time domains.
arXiv Detail & Related papers (2020-06-29T17:21:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.