ST-DETR: Spatio-Temporal Object Traces Attention Detection Transformer
- URL: http://arxiv.org/abs/2107.05887v1
- Date: Tue, 13 Jul 2021 07:38:08 GMT
- Title: ST-DETR: Spatio-Temporal Object Traces Attention Detection Transformer
- Authors: Eslam Mohamed and Ahmad El-Sallab
- Abstract summary: We propose a Spatio-Temporal Transformer-based architecture for object detection from a sequence of temporal frames.
We employ the full attention mechanisms to take advantage of the features correlations over both dimensions.
Results show a significant 5% mAP improvement on the KITTI MOD dataset.
- Score: 2.4366811507669124
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We propose ST-DETR, a Spatio-Temporal Transformer-based architecture for
object detection from a sequence of temporal frames. We treat the temporal
frames as sequences in both space and time and employ the full attention
mechanisms to take advantage of the features correlations over both dimensions.
This treatment enables us to deal with frames sequence as temporal object
features traces over every location in the space. We explore two possible
approaches; the early spatial features aggregation over the temporal dimension,
and the late temporal aggregation of object query spatial features. Moreover,
we propose a novel Temporal Positional Embedding technique to encode the time
sequence information. To evaluate our approach, we choose the Moving Object
Detection (MOD)task, since it is a perfect candidate to showcase the importance
of the temporal dimension. Results show a significant 5% mAP improvement on the
KITTI MOD dataset over the 1-step spatial baseline.
Related papers
- STCMOT: Spatio-Temporal Cohesion Learning for UAV-Based Multiple Object Tracking [13.269416985959404]
Multiple object tracking (MOT) in Unmanned Aerial Vehicle (UAV) videos is important for diverse applications in computer vision.
We propose a novel Spatio-Temporal Cohesion Multiple Object Tracking framework (STCMOT)
We use historical embedding features to model the representation of ReID and detection features in a sequential order.
Our framework sets a new state-of-the-art performance in MOTA and IDF1 metrics.
arXiv Detail & Related papers (2024-09-17T14:34:18Z) - PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection [66.94819989912823]
We propose a point-trajectory transformer with long short-term memory for efficient temporal 3D object detection.
We use point clouds of current-frame objects and their historical trajectories as input to minimize the memory bank storage requirement.
We conduct extensive experiments on the large-scale dataset to demonstrate that our approach performs well against state-of-the-art methods.
arXiv Detail & Related papers (2023-12-13T18:59:13Z) - Tracking Objects and Activities with Attention for Temporal Sentence
Grounding [51.416914256782505]
Temporal sentence (TSG) aims to localize the temporal segment which is semantically aligned with a natural language query in an untrimmed segment.
We propose a novel Temporal Sentence Tracking Network (TSTNet), which contains (A) a Cross-modal Targets Generator to generate multi-modal and search space, and (B) a Temporal Sentence Tracker to track multi-modal targets' behavior and to predict query-related segment.
arXiv Detail & Related papers (2023-02-21T16:42:52Z) - PTSEFormer: Progressive Temporal-Spatial Enhanced TransFormer Towards
Video Object Detection [28.879484515844375]
We introduce a progressive way to introduce both temporal information and spatial information for an integrated enhancement.
PTSEFormer follows an end-to-end fashion to avoid heavy post-processing procedures while achieving 88.1% mAP on the ImageNet VID dataset.
arXiv Detail & Related papers (2022-09-06T06:32:57Z) - Motion-aware Memory Network for Fast Video Salient Object Detection [15.967509480432266]
We design a space-time memory (STM)-based network, which extracts useful temporal information of the current frame from adjacent frames as the temporal branch of VSOD.
In the encoding stage, we generate high-level temporal features by using high-level features from the current and its adjacent frames.
In the decoding stage, we propose an effective fusion strategy for spatial and temporal branches.
The proposed model does not require optical flow or other preprocessing, and can reach a speed of nearly 100 FPS during inference.
arXiv Detail & Related papers (2022-08-01T15:56:19Z) - SpOT: Spatiotemporal Modeling for 3D Object Tracking [68.12017780034044]
3D multi-object tracking aims to consistently identify all mobile time.
Current 3D tracking methods rely on abstracted information and limited history.
We develop a holistic representation of scenes that leverage both spatial and temporal information.
arXiv Detail & Related papers (2022-07-12T21:45:49Z) - DS-Net: Dynamic Spatiotemporal Network for Video Salient Object
Detection [78.04869214450963]
We propose a novel dynamic temporal-temporal network (DSNet) for more effective fusion of temporal and spatial information.
We show that the proposed method achieves superior performance than state-of-the-art algorithms.
arXiv Detail & Related papers (2020-12-09T06:42:30Z) - LiDAR-based Online 3D Video Object Detection with Graph-based Message
Passing and Spatiotemporal Transformer Attention [100.52873557168637]
3D object detectors usually focus on the single-frame detection, while ignoring the information in consecutive point cloud frames.
In this paper, we propose an end-to-end online 3D video object detector that operates on point sequences.
arXiv Detail & Related papers (2020-04-03T06:06:52Z) - Spatio-temporal Tubelet Feature Aggregation and Object Linking in Videos [2.4923006485141284]
paper addresses the problem of how to exploittemporal information in available videos to improve the object classification.
We propose a two stage object detector called FANet based on short-term detection aggregation feature.
arXiv Detail & Related papers (2020-04-01T13:52:03Z) - A Spatial-Temporal Attentive Network with Spatial Continuity for
Trajectory Prediction [74.00750936752418]
We propose a novel model named spatial-temporal attentive network with spatial continuity (STAN-SC)
First, spatial-temporal attention mechanism is presented to explore the most useful and important information.
Second, we conduct a joint feature sequence based on the sequence and instant state information to make the generative trajectories keep spatial continuity.
arXiv Detail & Related papers (2020-03-13T04:35:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.