ReAct: Temporal Action Detection with Relational Queries
- URL: http://arxiv.org/abs/2207.07097v1
- Date: Thu, 14 Jul 2022 17:46:37 GMT
- Title: ReAct: Temporal Action Detection with Relational Queries
- Authors: Dingfeng Shi, Yujie Zhong, Qiong Cao, Jing Zhang, Lin Ma, Jia Li and
Dacheng Tao
- Abstract summary: This work aims at advancing temporal action detection (TAD) using an encoder-decoder framework with action queries.
We first propose a relational attention mechanism in the decoder, which guides the attention among queries based on their relations.
Lastly, we propose to predict the localization quality of each action query at inference in order to distinguish high-quality queries.
- Score: 84.76646044604055
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work aims at advancing temporal action detection (TAD) using an
encoder-decoder framework with action queries, similar to DETR, which has shown
great success in object detection. However, the framework suffers from several
problems if directly applied to TAD: the insufficient exploration of
inter-query relation in the decoder, the inadequate classification training due
to a limited number of training samples, and the unreliable classification
scores at inference. To this end, we first propose a relational attention
mechanism in the decoder, which guides the attention among queries based on
their relations. Moreover, we propose two losses to facilitate and stabilize
the training of action classification. Lastly, we propose to predict the
localization quality of each action query at inference in order to distinguish
high-quality queries. The proposed method, named ReAct, achieves the
state-of-the-art performance on THUMOS14, with much lower computational costs
than previous methods. Besides, extensive ablation studies are conducted to
verify the effectiveness of each proposed component. The code is available at
https://github.com/sssste/React.
Related papers
- Bidirectional Decoding: Improving Action Chunking via Closed-Loop Resampling [51.38330727868982]
Bidirectional Decoding (BID) is a test-time inference algorithm that bridges action chunking with closed-loop operations.
We show that BID boosts the performance of two state-of-the-art generative policies across seven simulation benchmarks and two real-world tasks.
arXiv Detail & Related papers (2024-08-30T15:39:34Z) - CoRec: An Easy Approach for Coordination Recognition [8.618336635685859]
We propose a pipeline model COordination RECognizer (CoRec)
It consists of two components: coordinator and conjunct boundary detector.
Experiments show that CoRec positively impacts downstream tasks, improving the yield of state-of-the-art Open IE models.
arXiv Detail & Related papers (2023-11-30T17:11:27Z) - Decoupled DETR: Spatially Disentangling Localization and Classification
for Improved End-to-End Object Detection [48.429555904690595]
We introduce spatially decoupled DETR, which includes a task-aware query generation module and a disentangled feature learning process.
We demonstrate that our approach achieves a significant improvement in MSCOCO datasets compared to previous work.
arXiv Detail & Related papers (2023-10-24T15:54:11Z) - PSDiff: Diffusion Model for Person Search with Iterative and
Collaborative Refinement [59.6260680005195]
We present a novel Person Search framework based on the Diffusion model, PSDiff.
PSDiff formulates the person search as a dual denoising process from noisy boxes and ReID embeddings to ground truths.
Following the new paradigm, we further design a new Collaborative Denoising Layer (CDL) to optimize detection and ReID sub-tasks in an iterative and collaborative way.
arXiv Detail & Related papers (2023-09-20T08:16:39Z) - Small Object Detection via Coarse-to-fine Proposal Generation and
Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning.
CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z) - Action Quality Assessment with Temporal Parsing Transformer [84.1272079121699]
Action Quality Assessment (AQA) is important for action understanding and resolving the task poses unique challenges due to subtle visual differences.
We propose a temporal parsing transformer to decompose the holistic feature into temporal part-level representations.
Our proposed method outperforms prior work on three public AQA benchmarks by a considerable margin.
arXiv Detail & Related papers (2022-07-19T13:29:05Z) - DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection [17.326702469604676]
Few-shot object detection aims at detecting novel objects rapidly from extremely few examples of previously unseen classes.
Most existing approaches employ the Faster R-CNN as basic detection framework.
We propose a simple yet effective architecture named Decoupled Faster R-CNN (DeFRCN)
arXiv Detail & Related papers (2021-08-20T06:12:55Z) - Modulating Localization and Classification for Harmonized Object
Detection [40.82723262074911]
We propose a mutual learning framework to modulate the two tasks.
In particular, the two tasks are forced to learn from each other with a novel mutual labeling strategy.
We achieve a significant performance gain over the baseline detectors on the COCO dataset.
arXiv Detail & Related papers (2021-03-16T10:36:02Z) - Few-shot Action Recognition with Prototype-centered Attentive Learning [88.10852114988829]
Prototype-centered Attentive Learning (PAL) model composed of two novel components.
First, a prototype-centered contrastive learning loss is introduced to complement the conventional query-centered learning objective.
Second, PAL integrates a attentive hybrid learning mechanism that can minimize the negative impacts of outliers.
arXiv Detail & Related papers (2021-01-20T11:48:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.