Related papers: HawkDrive: A Transformer-driven Visual Perception System for Autonomous Driving in Night Scene

HawkDrive: A Transformer-driven Visual Perception System for Autonomous Driving in Night Scene

URL: http://arxiv.org/abs/2404.04653v2
Date: Mon, 6 May 2024 12:34:34 GMT
Title: HawkDrive: A Transformer-driven Visual Perception System for Autonomous Driving in Night Scene
Authors: Ziang Guo, Stepan Perminov, Mikhail Konenkov, Dzmitry Tsetserukou,
Abstract summary: HawkDrive is a novel vision system with hardware and software solutions. Hardware that utilizes stereo vision perception, is partnered with the edge computing device Nvidia Jetson Xavier AGX. Our software for low light enhancement, depth estimation, and semantic segmentation tasks, is a transformer-based neural network.
Score: 2.5022287664959446
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Many established vision perception systems for autonomous driving scenarios ignore the influence of light conditions, one of the key elements for driving safety. To address this problem, we present HawkDrive, a novel perception system with hardware and software solutions. Hardware that utilizes stereo vision perception, which has been demonstrated to be a more reliable way of estimating depth information than monocular vision, is partnered with the edge computing device Nvidia Jetson Xavier AGX. Our software for low light enhancement, depth estimation, and semantic segmentation tasks, is a transformer-based neural network. Our software stack, which enables fast inference and noise reduction, is packaged into system modules in Robot Operating System 2 (ROS2). Our experimental results have shown that the proposed end-to-end system is effective in improving the depth estimation and semantic segmentation performance. Our dataset and codes will be released at https://github.com/ZionGo6/HawkDrive.

Related papers

OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning [68.45848423501927]
We propose a holistic vision-language dataset that aligns agent models with 3D driving tasks through counterfactual reasoning. Our approach enhances decision-making by evaluating potential scenarios and their outcomes, similar to human drivers considering alternative actions.
arXiv Detail & Related papers (2025-04-06T03:54:21Z)
Human Insights Driven Latent Space for Different Driving Perspectives: A Unified Encoder for Efficient Multi-Task Inference [43.474068248379815]
We propose a shared encoder trained on multiple computer vision tasks critical for urban navigation. We introduce a multi-scale feature network for pose estimation to improve depth learning. Our findings demonstrate that a shared backbone trained on diverse visual tasks is capable of providing overall perception capabilities.
arXiv Detail & Related papers (2024-09-16T08:54:03Z)
OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning [68.45848423501927]
We propose a holistic vision-language dataset that aligns agent models with 3D driving tasks through counterfactual reasoning. Our approach enhances decision-making by evaluating potential scenarios and their outcomes, similar to human drivers considering alternative actions.
arXiv Detail & Related papers (2024-05-02T17:59:24Z)
Policy Pre-training for End-to-end Autonomous Driving via Self-supervised Geometric Modeling [96.31941517446859]
We propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving. We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos. In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input. In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only.
arXiv Detail & Related papers (2023-01-03T08:52:49Z)
Learning Deep Sensorimotor Policies for Vision-based Autonomous Drone Racing [52.50284630866713]
Existing systems often require hand-engineered components for state estimation, planning, and control. This paper tackles the vision-based autonomous-drone-racing problem by learning deep sensorimotor policies.
arXiv Detail & Related papers (2022-10-26T19:03:17Z)
Exploring Contextual Representation and Multi-Modality for End-to-End Autonomous Driving [58.879758550901364]
Recent perception systems enhance spatial understanding with sensor fusion but often lack full environmental context. We introduce a framework that integrates three cameras to emulate the human field of view, coupled with top-down bird-eye-view semantic data to enhance contextual representation. Our method achieves displacement error by 0.67m in open-loop settings, surpassing current methods by 6.9% on the nuScenes dataset.
arXiv Detail & Related papers (2022-10-13T05:56:20Z)
StereoSpike: Depth Learning with a Spiking Neural Network [0.0]
We present an end-to-end neuromorphic approach to depth estimation. We use a Spiking Neural Network (SNN) with a slightly modified U-Net-like encoder-decoder architecture, that we named StereoSpike. We demonstrate that this architecture generalizes very well, even better than its non-spiking counterparts.
arXiv Detail & Related papers (2021-09-28T14:11:36Z)
YOLOP: You Only Look Once for Panoptic Driving Perception [21.802146960999394]
We present a panoptic driving perception network (YOLOP) to perform traffic object detection, drivable area segmentation and lane detection simultaneously. It is composed of one encoder for feature extraction and three decoders to handle the specific tasks. Our model performs extremely well on the challenging BDD100K dataset, achieving state-of-the-art on all three tasks in terms of accuracy and speed.
arXiv Detail & Related papers (2021-08-25T14:19:42Z)
Provident Vehicle Detection at Night for Advanced Driver Assistance Systems [3.7468898363447654]
We present a complete system capable of providingntly detect oncoming vehicles at nighttime based on their caused light artifacts. We quantify the time benefit that the provident vehicle detection system provides compared to an in-production computer vision system.
arXiv Detail & Related papers (2021-07-23T15:27:17Z)
Efficient and Robust LiDAR-Based End-to-End Navigation [132.52661670308606]
We present an efficient and robust LiDAR-based end-to-end navigation framework. We propose Fast-LiDARNet that is based on sparse convolution kernel optimization and hardware-aware model design. We then propose Hybrid Evidential Fusion that directly estimates the uncertainty of the prediction from only a single forward pass.
arXiv Detail & Related papers (2021-05-20T17:52:37Z)
Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images. Our approach is fully automatic without any human interaction. We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z)
Sensor Fusion of Camera and Cloud Digital Twin Information for Intelligent Vehicles [26.00647601539363]
We introduce a novel sensor fusion methodology, integrating camera image and Digital Twin knowledge from the cloud. The best matching result, with a 79.2% accuracy under 0.7 Intersection over Union (IoU) threshold, is obtained with depth image served as an additional feature source. Game engine-based simulation results also reveal that the visual guidance system could improve driving safety significantly cooperate with the cloud Digital Twin system.
arXiv Detail & Related papers (2020-07-08T18:09:54Z)
MVLidarNet: Real-Time Multi-Class Scene Understanding for Autonomous Driving Using Multiple Views [60.538802124885414]
We present Multi-View LidarNet (MVLidarNet), a two-stage deep neural network for multi-class object detection and drivable space segmentation. MVLidarNet is able to detect and classify objects while simultaneously determining the drivable space using a single LiDAR scan as input. We show results on both KITTI and a much larger internal dataset, thus demonstrating the method's ability to scale by an order of magnitude.
arXiv Detail & Related papers (2020-06-09T21:28:17Z)
End-to-end Autonomous Driving Perception with Sequential Latent Representation Learning [34.61415516112297]
An end-to-end approach might clean up the system and avoid huge efforts of human engineering. A latent space is introduced to capture all relevant features useful for perception, which is learned through sequential latent representation learning. The learned end-to-end perception model is able to solve the detection, tracking, localization and mapping problems altogether with only minimum human engineering efforts.
arXiv Detail & Related papers (2020-03-21T05:37:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.