DR-TANet: Dynamic Receptive Temporal Attention Network for Street Scene
Change Detection
- URL: http://arxiv.org/abs/2103.00879v1
- Date: Mon, 1 Mar 2021 10:01:35 GMT
- Title: DR-TANet: Dynamic Receptive Temporal Attention Network for Street Scene
Change Detection
- Authors: Shuo Chen, Kailun Yang, Rainer Stiefelhagen
- Abstract summary: This paper proposes the temporal attention and explores the impact of the dependency-scope size of temporal attention on the performance of change detection.
On street scene datasets GSV', TSUNAMI' and VL-CMU-CD', our approach gains excellent performance, establishing new state-of-the-art scores without bells and whistles.
- Score: 35.29786193920396
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Street scene change detection continues to capture researchers' interests in
the computer vision community. It aims to identify the changed regions of the
paired street-view images captured at different times. The state-of-the-art
network based on the encoder-decoder architecture leverages the feature maps at
the corresponding level between two channels to gain sufficient information of
changes. Still, the efficiency of feature extraction, feature correlation
calculation, even the whole network requires further improvement. This paper
proposes the temporal attention and explores the impact of the dependency-scope
size of temporal attention on the performance of change detection. In addition,
based on the Temporal Attention Module (TAM), we introduce a more efficient and
light-weight version - Dynamic Receptive Temporal Attention Module (DRTAM) and
propose the Concurrent Horizontal and Vertical Attention (CHVA) to improve the
accuracy of the network on specific challenging entities. On street scene
datasets `GSV', `TSUNAMI' and `VL-CMU-CD', our approach gains excellent
performance, establishing new state-of-the-art scores without bells and
whistles, while maintaining high efficiency applicable in autonomous vehicles.
Related papers
- Elastic Interaction Energy-Informed Real-Time Traffic Scene Perception [8.429178814528617]
A topology-aware energy loss function-based network training strategy named EIEGSeg is proposed.
EIEGSeg is designed for multi-class segmentation on real-time traffic scene perception.
Our results demonstrate that EIEGSeg consistently improves the performance, especially on real-time, lightweight networks.
arXiv Detail & Related papers (2023-10-02T01:30:42Z) - Remote Sensing Image Change Detection with Graph Interaction [1.8579693774597708]
We propose a bitemporal image graph Interaction network for remote sensing change detection, namely BGINet-CD.
Our model demonstrates superior performance compared to other state-of-the-art methods (SOTA) on the GZ CD dataset.
arXiv Detail & Related papers (2023-07-05T03:32:49Z) - Deeply-Coupled Convolution-Transformer with Spatial-temporal
Complementary Learning for Video-based Person Re-identification [91.56939957189505]
We propose a novel spatial-temporal complementary learning framework named Deeply-Coupled Convolution-Transformer (DCCT) for high-performance video-based person Re-ID.
Our framework could attain better performances than most state-of-the-art methods.
arXiv Detail & Related papers (2023-04-27T12:16:44Z) - DroneAttention: Sparse Weighted Temporal Attention for Drone-Camera
Based Activity Recognition [2.705905918316948]
Human activity recognition (HAR) using drone-mounted cameras has attracted considerable interest from the computer vision research community in recent years.
We propose a novel Sparse Weighted Temporal Attention (SWTA) module to utilize sparsely sampled video frames for obtaining global weighted temporal attention.
The proposed model has received an accuracy of 72.76%, 92.56%, and 78.86% on the respective datasets.
arXiv Detail & Related papers (2022-12-07T00:33:40Z) - DetectorNet: Transformer-enhanced Spatial Temporal Graph Neural Network
for Traffic Prediction [4.302265301004301]
Detectors with high coverage have direct and far-reaching benefits for road users in route planning and avoiding traffic congestion.
utilizing these data presents unique challenges including: the dynamic temporal correlation, and the dynamic spatial correlation caused by changes in road conditions.
We propose DetectorNet enhanced by Transformer to address these challenges.
arXiv Detail & Related papers (2021-10-19T03:47:38Z) - MFGNet: Dynamic Modality-Aware Filter Generation for RGB-T Tracking [72.65494220685525]
We propose a new dynamic modality-aware filter generation module (named MFGNet) to boost the message communication between visible and thermal data.
We generate dynamic modality-aware filters with two independent networks. The visible and thermal filters will be used to conduct a dynamic convolutional operation on their corresponding input feature maps respectively.
To address issues caused by heavy occlusion, fast motion, and out-of-view, we propose to conduct a joint local and global search by exploiting a new direction-aware target-driven attention mechanism.
arXiv Detail & Related papers (2021-07-22T03:10:51Z) - Learning Comprehensive Motion Representation for Action Recognition [124.65403098534266]
2D CNN-based methods are efficient but may yield redundant features due to applying the same 2D convolution kernel to each frame.
Recent efforts attempt to capture motion information by establishing inter-frame connections while still suffering the limited temporal receptive field or high latency.
We propose a Channel-wise Motion Enhancement (CME) module to adaptively emphasize the channels related to dynamic information with a channel-wise gate vector.
We also propose a Spatial-wise Motion Enhancement (SME) module to focus on the regions with the critical target in motion, according to the point-to-point similarity between adjacent feature maps.
arXiv Detail & Related papers (2021-03-23T03:06:26Z) - Spatio-temporal Modeling for Large-scale Vehicular Networks Using Graph
Convolutional Networks [110.80088437391379]
A graph-based framework called SMART is proposed to model and keep track of the statistics of vehicle-to-temporal (V2I) communication latency across a large geographical area.
We develop a graph reconstruction-based approach using a graph convolutional network integrated with a deep Q-networks algorithm.
Our results show that the proposed method can significantly improve both the accuracy and efficiency for modeling and the latency performance of large vehicular networks.
arXiv Detail & Related papers (2021-03-13T06:56:29Z) - DS-Net: Dynamic Spatiotemporal Network for Video Salient Object
Detection [78.04869214450963]
We propose a novel dynamic temporal-temporal network (DSNet) for more effective fusion of temporal and spatial information.
We show that the proposed method achieves superior performance than state-of-the-art algorithms.
arXiv Detail & Related papers (2020-12-09T06:42:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.