SSGA-Net: Stepwise Spatial Global-local Aggregation Networks for for Autonomous Driving
- URL: http://arxiv.org/abs/2405.18857v1
- Date: Wed, 29 May 2024 08:12:51 GMT
- Title: SSGA-Net: Stepwise Spatial Global-local Aggregation Networks for for Autonomous Driving
- Authors: Yiming Cui, Cheng Han, Dongfang Liu,
- Abstract summary: Current models usually aggregate features from the neighboring frames to enhance the object representations for the task heads.
These methods rely on the information from the future frames and suffer from high computational complexity.
We introduce a stepwise spatial global-local aggregation network to solve these problems.
- Score: 27.731481134782577
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Visual-based perception is the key module for autonomous driving. Among those visual perception tasks, video object detection is a primary yet challenging one because of feature degradation caused by fast motion or multiple poses. Current models usually aggregate features from the neighboring frames to enhance the object representations for the task heads to generate more accurate predictions. Though getting better performance, these methods rely on the information from the future frames and suffer from high computational complexity. Meanwhile, the aggregation process is not reconfigurable during the inference time. These issues make most of the existing models infeasible for online applications. To solve these problems, we introduce a stepwise spatial global-local aggregation network. Our proposed models mainly contain three parts: 1). Multi-stage stepwise network gradually refines the predictions and object representations from the previous stage; 2). Spatial global-local aggregation fuses the local information from the neighboring frames and global semantics from the current frame to eliminate the feature degradation; 3). Dynamic aggregation strategy stops the aggregation process early based on the refinement results to remove redundancy and improve efficiency. Extensive experiments on the ImageNet VID benchmark validate the effectiveness and efficiency of our proposed models.
Related papers
- EraW-Net: Enhance-Refine-Align W-Net for Scene-Associated Driver Attention Estimation [17.0226030258296]
Associating driver attention with driving scene across two fields of views is a hard cross-domain perception problem.
Previous methods typically focus on a single view or map attention to the scene via estimated gaze.
We propose a novel method for end-to-end scene-associated driver attention estimation, called EraWNet.
arXiv Detail & Related papers (2024-08-16T07:12:47Z) - Spatial-Temporal Graph Enhanced DETR Towards Multi-Frame 3D Object Detection [54.041049052843604]
We present STEMD, a novel end-to-end framework that enhances the DETR-like paradigm for multi-frame 3D object detection.
First, to model the inter-object spatial interaction and complex temporal dependencies, we introduce the spatial-temporal graph attention network.
Finally, it poses a challenge for the network to distinguish between the positive query and other highly similar queries that are not the best match.
arXiv Detail & Related papers (2023-07-01T13:53:14Z) - Global-to-Local Modeling for Video-based 3D Human Pose and Shape
Estimation [53.04781510348416]
Video-based 3D human pose and shape estimations are evaluated by intra-frame accuracy and inter-frame smoothness.
We propose to structurally decouple the modeling of long-term and short-term correlations in an end-to-end framework, Global-to-Local Transformer (GLoT)
Our GLoT surpasses previous state-of-the-art methods with the lowest model parameters on popular benchmarks, i.e., 3DPW, MPI-INF-3DHP, and Human3.6M.
arXiv Detail & Related papers (2023-03-26T14:57:49Z) - Attention-based Spatial-Temporal Graph Convolutional Recurrent Networks
for Traffic Forecasting [12.568905377581647]
Traffic forecasting is one of the most fundamental problems in transportation science and artificial intelligence.
Existing methods cannot accurately model both long-term and short-term temporal correlations simultaneously.
We propose a novel spatial-temporal neural network framework, which consists of a graph convolutional recurrent module (GCRN) and a global attention module.
arXiv Detail & Related papers (2023-02-25T03:37:00Z) - Learning to Aggregate Multi-Scale Context for Instance Segmentation in
Remote Sensing Images [28.560068780733342]
A novel context aggregation network (CATNet) is proposed to improve the feature extraction process.
The proposed model exploits three lightweight plug-and-play modules, namely dense feature pyramid network (DenseFPN), spatial context pyramid ( SCP), and hierarchical region of interest extractor (HRoIE)
arXiv Detail & Related papers (2021-11-22T08:55:25Z) - Efficient Person Search: An Anchor-Free Approach [86.45858994806471]
Person search aims to simultaneously localize and identify a query person from realistic, uncropped images.
To achieve this goal, state-of-the-art models typically add a re-id branch upon two-stage detectors like Faster R-CNN.
In this work, we present an anchor-free approach to efficiently tackling this challenging task, by introducing the following dedicated designs.
arXiv Detail & Related papers (2021-09-01T07:01:33Z) - Enhancing Object Detection for Autonomous Driving by Optimizing Anchor
Generation and Addressing Class Imbalance [0.0]
This study presents an enhanced 2D object detector based on Faster R-CNN that is better suited for the context of autonomous vehicles.
The proposed modifications over the Faster R-CNN do not increase computational cost and can easily be extended to optimize other anchor-based detection frameworks.
arXiv Detail & Related papers (2021-04-08T16:58:31Z) - Secrets of 3D Implicit Object Shape Reconstruction in the Wild [92.5554695397653]
Reconstructing high-fidelity 3D objects from sparse, partial observation is crucial for various applications in computer vision, robotics, and graphics.
Recent neural implicit modeling methods show promising results on synthetic or dense datasets.
But, they perform poorly on real-world data that is sparse and noisy.
This paper analyzes the root cause of such deficient performance of a popular neural implicit model.
arXiv Detail & Related papers (2021-01-18T03:24:48Z) - Progressive Self-Guided Loss for Salient Object Detection [102.35488902433896]
We present a progressive self-guided loss function to facilitate deep learning-based salient object detection in images.
Our framework takes advantage of adaptively aggregated multi-scale features to locate and detect salient objects effectively.
arXiv Detail & Related papers (2021-01-07T07:33:38Z) - PC-RGNN: Point Cloud Completion and Graph Neural Network for 3D Object
Detection [57.49788100647103]
LiDAR-based 3D object detection is an important task for autonomous driving.
Current approaches suffer from sparse and partial point clouds of distant and occluded objects.
In this paper, we propose a novel two-stage approach, namely PC-RGNN, dealing with such challenges by two specific solutions.
arXiv Detail & Related papers (2020-12-18T18:06:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.