OadTR: Online Action Detection with Transformers
- URL: http://arxiv.org/abs/2106.11149v1
- Date: Mon, 21 Jun 2021 14:39:35 GMT
- Title: OadTR: Online Action Detection with Transformers
- Authors: Xiang Wang, Shiwei Zhang, Zhiwu Qing, Yuanjie Shao, Zhengrong Zuo,
Changxin Gao, Nong Sang
- Abstract summary: We propose a new encoder-decoder framework based on Transformers, named OadTR, to tackle these problems.
OadTR can recognize current actions by encoding historical information and predicting future context simultaneously.
- Score: 40.227281499219444
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most recent approaches for online action detection tend to apply Recurrent
Neural Network (RNN) to capture long-range temporal structure. However, RNN
suffers from non-parallelism and gradient vanishing, hence it is hard to be
optimized. In this paper, we propose a new encoder-decoder framework based on
Transformers, named OadTR, to tackle these problems. The encoder attached with
a task token aims to capture the relationships and global interactions between
historical observations. The decoder extracts auxiliary information by
aggregating anticipated future clip representations. Therefore, OadTR can
recognize current actions by encoding historical information and predicting
future context simultaneously. We extensively evaluate the proposed OadTR on
three challenging datasets: HDD, TVSeries, and THUMOS14. The experimental
results show that OadTR achieves higher training and inference speeds than
current RNN based approaches, and significantly outperforms the
state-of-the-art methods in terms of both mAP and mcAP. Code is available at
https://github.com/wangxiang1230/OadTR.
Related papers
- Relation DETR: Exploring Explicit Position Relation Prior for Object Detection [26.03892270020559]
We present a scheme for enhancing the convergence and performance of DETR (DEtection TRansformer)
Our approach, termed Relation-DETR, introduces an encoder to construct position relation embeddings for progressive attention refinement.
Experiments on both generic and task-specific datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-07-16T13:17:07Z) - Real-Time Motion Prediction via Heterogeneous Polyline Transformer with
Relative Pose Encoding [121.08841110022607]
Existing agent-centric methods have demonstrated outstanding performance on public benchmarks.
We introduce the K-nearest neighbor attention with relative pose encoding (KNARPE), a novel attention mechanism allowing the pairwise-relative representation to be used by Transformers.
By sharing contexts among agents and reusing the unchanged contexts, our approach is as efficient as scene-centric methods, while performing on par with state-of-the-art agent-centric methods.
arXiv Detail & Related papers (2023-10-19T17:59:01Z) - A Distance Correlation-Based Approach to Characterize the Effectiveness of Recurrent Neural Networks for Time Series Forecasting [1.9950682531209158]
We provide an approach to link time series characteristics with RNN components via the versatile metric of distance correlation.
We empirically show that the RNN activation layers learn the lag structures of time series well.
We also show that the activation layers cannot adequately model moving average and heteroskedastic time series processes.
arXiv Detail & Related papers (2023-07-28T22:32:08Z) - Recurrent Glimpse-based Decoder for Detection with Transformer [85.64521612986456]
We introduce a novel REcurrent Glimpse-based decOder (REGO) in this paper.
In particular, the REGO employs a multi-stage recurrent processing structure to help the attention of DETR gradually focus on foreground objects.
REGO consistently boosts the performance of different DETR detectors by up to 7% relative gain at the same setting of 50 training epochs.
arXiv Detail & Related papers (2021-12-09T00:29:19Z) - Unsupervised Representation Learning via Neural Activation Coding [66.65837512531729]
We present neural activation coding (NAC) as a novel approach for learning deep representations from unlabeled data for downstream applications.
We show that NAC learns both continuous and discrete representations of data, which we respectively evaluate on two downstream tasks.
arXiv Detail & Related papers (2021-12-07T21:59:45Z) - TCTN: A 3D-Temporal Convolutional Transformer Network for Spatiotemporal
Predictive Learning [1.952097552284465]
We propose an algorithm named 3D-temporal convolutional transformer (TCTN), where a transformer-based encoder with temporal convolutional layers is employed to capture short-term and long-term dependencies.
Our proposed algorithm can be easy to implement and trained much faster compared with RNN-based methods thanks to the parallel mechanism of Transformer.
arXiv Detail & Related papers (2021-12-02T10:05:01Z) - End-to-End Object Detection with Transformers [88.06357745922716]
We present a new method that views object detection as a direct set prediction problem.
Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components.
The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss.
arXiv Detail & Related papers (2020-05-26T17:06:38Z) - Learning to Hash with Graph Neural Networks for Recommender Systems [103.82479899868191]
Graph representation learning has attracted much attention in supporting high quality candidate search at scale.
Despite its effectiveness in learning embedding vectors for objects in the user-item interaction network, the computational costs to infer users' preferences in continuous embedding space are tremendous.
We propose a simple yet effective discrete representation learning framework to jointly learn continuous and discrete codes.
arXiv Detail & Related papers (2020-03-04T06:59:56Z) - Volterra Neural Networks (VNNs) [24.12314339259243]
We propose a Volterra filter-inspired Network architecture to reduce the complexity of Convolutional Neural Networks.
We show an efficient parallel implementation of this Volterra Neural Network (VNN) along with its remarkable performance.
The proposed approach is evaluated on UCF-101 and HMDB-51 datasets for action recognition, and is shown to outperform state of the art CNN approaches.
arXiv Detail & Related papers (2019-10-21T19:22:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.