Relation Modeling in Spatio-Temporal Action Localization
- URL: http://arxiv.org/abs/2106.08061v2
- Date: Wed, 16 Jun 2021 07:00:12 GMT
- Title: Relation Modeling in Spatio-Temporal Action Localization
- Authors: Yutong Feng, Jianwen Jiang, Ziyuan Huang, Zhiwu Qing, Xiang Wang,
Shiwei Zhang, Mingqian Tang, Yue Gao
- Abstract summary: This paper presents our solution to the AVA-Kinetics Crossover Challenge of ActivityNet workshop at CVPR 2021.
Our solution utilizes multiple types of relation methods for relation-temporal action detection and adopts a training strategy to integrate multiple relation modeling in end-to-end training over the two large-scale video datasets.
We finally achieve 40.67 mAP on the test set of AVA-Kinetics.
- Score: 25.09128518931016
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents our solution to the AVA-Kinetics Crossover Challenge of
ActivityNet workshop at CVPR 2021. Our solution utilizes multiple types of
relation modeling methods for spatio-temporal action detection and adopts a
training strategy to integrate multiple relation modeling in end-to-end
training over the two large-scale video datasets. Learning with memory bank and
finetuning for long-tailed distribution are also investigated to further
improve the performance. In this paper, we detail the implementations of our
solution and provide experiments results and corresponding discussions. We
finally achieve 40.67 mAP on the test set of AVA-Kinetics.
Related papers
- Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion [53.33473557562837]
Solving multi-objective optimization problems for large deep neural networks is a challenging task due to the complexity of the loss landscape and the expensive computational cost.
We propose a practical and scalable approach to solve this problem via mixture of experts (MoE) based model fusion.
By ensembling the weights of specialized single-task models, the MoE module can effectively capture the trade-offs between multiple objectives.
arXiv Detail & Related papers (2024-06-14T07:16:18Z) - Cross-Domain Knowledge Distillation for Low-Resolution Human Pose Estimation [31.970739018426645]
In practical applications of human pose estimation, low-resolution inputs frequently occur, and existing state-of-the-art models perform poorly with low-resolution images.
This work focuses on boosting the performance of low-resolution models by distilling knowledge from a high-resolution model.
arXiv Detail & Related papers (2024-05-19T04:57:17Z) - Ensemble Modeling for Multimodal Visual Action Recognition [50.38638300332429]
We propose an ensemble modeling approach for multimodal action recognition.
We independently train individual modality models using a variant of focal loss tailored to handle the long-tailed distribution of the MECCANO [21] dataset.
arXiv Detail & Related papers (2023-08-10T08:43:20Z) - An Empirical Study on Multi-Domain Robust Semantic Segmentation [42.79166534691889]
We train a unified model that is expected to perform well across domains on several popularity segmentation datasets.
Our solution ranks 2nd on RVC 2022 semantic segmentation task, with a dataset only 1/3 size of the 1st model used.
arXiv Detail & Related papers (2022-12-08T12:04:01Z) - Towards using Few-Shot Prompt Learning for Automating Model Completion [0.0]
We propose a simple yet a novel approach to improve completion in domain modeling activities.
Our approach exploits the power of large language models by using few-shot prompt learning without the need to train or fine-tune those models.
arXiv Detail & Related papers (2022-12-07T02:11:26Z) - An Information-Theoretic Approach for Estimating Scenario Generalization
in Crowd Motion Prediction [27.10815774845461]
We propose a novel scoring method, which characterizes generalization of models trained on source crowd scenarios and applied to target crowd scenarios.
The Interaction component aims to characterize the difficulty of scenario domains, while the diversity of a scenario domain is captured in the Diversity score.
Our experimental results validate the efficacy of the proposed method on several simulated and real-world (source,target) generalization tasks.
arXiv Detail & Related papers (2022-11-02T01:39:30Z) - Multitask Adaptation by Retrospective Exploration with Learned World
Models [77.34726150561087]
We propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from task-agnostic storage.
The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage.
arXiv Detail & Related papers (2021-10-25T20:02:57Z) - Multi-Scale Aligned Distillation for Low-Resolution Detection [68.96325141432078]
This paper focuses on boosting the performance of low-resolution models by distilling knowledge from a high- or multi-resolution model.
On several instance-level detection tasks and datasets, the low-resolution models trained via our approach perform competitively with high-resolution models trained via conventional multi-scale training.
arXiv Detail & Related papers (2021-09-14T12:53:35Z) - Two-Stream Consensus Network: Submission to HACS Challenge 2021
Weakly-Supervised Learning Track [78.64815984927425]
The goal of weakly-supervised temporal action localization is to temporally locate and classify action of interest in untrimmed videos.
We adopt the two-stream consensus network (TSCN) as the main framework in this challenge.
Our solution ranked 2rd in this challenge, and we hope our method can serve as a baseline for future academic research.
arXiv Detail & Related papers (2021-06-21T03:36:36Z) - Ada-Segment: Automated Multi-loss Adaptation for Panoptic Segmentation [95.31590177308482]
We propose an automated multi-loss adaptation (named Ada-Segment) to flexibly adjust multiple training losses over the course of training.
With an end-to-end architecture, Ada-Segment generalizes to different datasets without the need of re-tuning hyper parameters.
Ada-Segment brings 2.7% panoptic quality (PQ) improvement on COCO val split from the vanilla baseline, achieving the state-of-the-art 48.5% PQ on COCO test-dev split and 32.9% PQ on ADE20K dataset.
arXiv Detail & Related papers (2020-12-07T11:43:10Z) - Model-based Reinforcement Learning for Semi-Markov Decision Processes
with Neural ODEs [30.36381338938319]
We present two solutions for modeling continuous-time dynamics using neural ordinary differential equations (ODEs)
Our models accurately characterize continuous-time dynamics and enable us to develop high-performing policies using a small amount of data.
We experimentally demonstrate the efficacy of our methods across various continuous-time domains.
arXiv Detail & Related papers (2020-06-29T17:21:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.