Technical Report for Argoverse Challenges on Unified Sensor-based
Detection, Tracking, and Forecasting
- URL: http://arxiv.org/abs/2311.15615v1
- Date: Mon, 27 Nov 2023 08:25:23 GMT
- Title: Technical Report for Argoverse Challenges on Unified Sensor-based
Detection, Tracking, and Forecasting
- Authors: Zhepeng Wang, Feng Chen, Kanokphan Lertniphonphan, Siwei Chen, Jinyao
Bao, Pengfei Zheng, Jinbao Zhang, Kaer Huang, Tao Zhang
- Abstract summary: We propose a unified network that incorporates three tasks, including detection, tracking, and forecasting.
This solution adopts a strong Bird's Eye View (BEV) encoder with spatial and temporal fusion and generates unified representations for multi-tasks.
We achieved 1st place in Detection, Tracking, and Forecasting on the E2E Forecasting track in Argoverse Challenges at CVPR 2023 WAD.
- Score: 14.44580354496143
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This report presents our Le3DE2E solution for unified sensor-based detection,
tracking, and forecasting in Argoverse Challenges at CVPR 2023 Workshop on
Autonomous Driving (WAD). We propose a unified network that incorporates three
tasks, including detection, tracking, and forecasting. This solution adopts a
strong Bird's Eye View (BEV) encoder with spatial and temporal fusion and
generates unified representations for multi-tasks. The solution was tested in
the Argoverse 2 sensor dataset to evaluate the detection, tracking, and
forecasting of 26 object categories. We achieved 1st place in Detection,
Tracking, and Forecasting on the E2E Forecasting track in Argoverse Challenges
at CVPR 2023 WAD.
Related papers
- First Place Solution to the ECCV 2024 ROAD++ Challenge @ ROAD++ Spatiotemporal Agent Detection 2024 [12.952512012601874]
The task of Track 1 is agent detection, which aims to construct an "agent tube" for agents in consecutive video frames.
Our solutions focus on the challenges in this task including extreme-size objects, low-light, imbalance and fine-grained classification.
We rank first in the test set of Track 1 for the ROAD++ Challenge 2024, and achieve 30.82% average video-mAP.
arXiv Detail & Related papers (2024-10-30T14:52:43Z) - V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results [142.5704093410454]
The V3Det Challenge 2024 aims to push the boundaries of object detection research.
The challenge consists of two tracks: Vast Vocabulary Object Detection and Open Vocabulary Object Detection.
We aim to inspire future research directions in vast vocabulary and open-vocabulary object detection.
arXiv Detail & Related papers (2024-06-17T16:58:51Z) - DeTra: A Unified Model for Object Detection and Trajectory Forecasting [68.85128937305697]
Our approach formulates the union of the two tasks as a trajectory refinement problem.
To tackle this unified task, we design a refinement transformer that infers the presence, pose, and multi-modal future behaviors of objects.
In our experiments, we observe that ourmodel outperforms the state-of-the-art on Argoverse 2 Sensor and Open dataset.
arXiv Detail & Related papers (2024-06-06T18:12:04Z) - OOSTraj: Out-of-Sight Trajectory Prediction With Vision-Positioning Denoising [49.86409475232849]
Trajectory prediction is fundamental in computer vision and autonomous driving.
Existing approaches in this field often assume precise and complete observational data.
We present a novel method for out-of-sight trajectory prediction that leverages a vision-positioning technique.
arXiv Detail & Related papers (2024-04-02T18:30:29Z) - Technical Report for Argoverse Challenges on 4D Occupancy Forecasting [32.43324720856606]
Our solution consists of a strong LiDAR-based Bird's Eye View (BEV) encoder with temporal fusion and a two-stage decoder.
The solution was tested on the Argoverse 2 sensor dataset to evaluate the occupancy state 3 seconds in the future.
Our solution achieved 18% lower L1 Error (3.57) than the baseline and got the 1 place on the 4D Occupancy Forecasting task in Argoverse Challenges at CVPR 2023.
arXiv Detail & Related papers (2023-11-27T09:40:53Z) - The 2nd Workshop on Maritime Computer Vision (MaCVi) 2024 [71.80200746293505]
2nd Workshop on Maritime Computer Vision (MaCVi) 2024 addresses maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicles (USV)
arXiv Detail & Related papers (2023-11-23T21:01:14Z) - Object Semantics Give Us the Depth We Need: Multi-task Approach to
Aerial Depth Completion [1.2239546747355885]
We propose a novel approach to jointly execute the two tasks in a single pass.
The proposed method is based on an encoder-focused multi-task learning model that exposes the two tasks to jointly learned features.
Experimental results show that the proposed multi-task network outperforms its single-task counterpart.
arXiv Detail & Related papers (2023-04-25T03:21:32Z) - Minkowski Tracker: A Sparse Spatio-Temporal R-CNN for Joint Object
Detection and Tracking [53.64390261936975]
We present Minkowski Tracker, a sparse-temporal R-CNN that jointly solves object detection and tracking problems.
Inspired by region-based CNN (R-CNN), we propose to track motion as a second stage of the object detector R-CNN.
We show in large-scale experiments that the overall performance gain of our method is due to four factors.
arXiv Detail & Related papers (2022-08-22T04:47:40Z) - Multi-Camera Multiple 3D Object Tracking on the Move for Autonomous
Vehicles [17.12321292167318]
It is important for object detection and tracking to address new challenges, such as achieving consistent results across views of cameras.
This work presents a new Global Association Graph Model with Link Prediction approach to predict existing tracklets location and link detections with tracklets.
Our model exploits to improve the detection accuracy of a standard 3D object detector in the nuScenes detection challenge.
arXiv Detail & Related papers (2022-04-19T22:50:36Z) - The State of Aerial Surveillance: A Survey [62.198765910573556]
This paper provides a comprehensive overview of human-centric aerial surveillance tasks from a computer vision and pattern recognition perspective.
The main object of interest is humans, where single or multiple subjects are to be detected, identified, tracked, re-identified and have their behavior analyzed.
arXiv Detail & Related papers (2022-01-09T20:13:27Z) - SPG: Unsupervised Domain Adaptation for 3D Object Detection via Semantic
Point Generation [28.372067223801203]
In autonomous driving, a LiDAR-based object detector should perform reliably at different geographic locations and under various weather conditions.
While recent 3D detection research focuses on improving performance within a single domain, our study reveals that the performance of modern detectors can drop drastically cross-domain.
We present Semantic Point Generation (SPG), a general approach to enhance the reliability of LiDAR detectors against domain shifts.
arXiv Detail & Related papers (2021-08-15T10:00:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.