Enhanced Parking Perception by Multi-Task Fisheye Cross-view Transformers
- URL: http://arxiv.org/abs/2408.12575v2
- Date: Mon, 30 Sep 2024 13:30:54 GMT
- Title: Enhanced Parking Perception by Multi-Task Fisheye Cross-view Transformers
- Authors: Antonyo Musabini, Ivan Novikov, Sana Soula, Christel Leonet, Lihao Wang, Rachid Benmokhtar, Fabian Burger, Thomas Boulay, Xavier Perrotton,
- Abstract summary: Current parking area perception algorithms primarily focus on detecting vacant slots within a limited range.
This paper introduces Multi-Task Fisheye Cross View Transformers (MT F-CVT)
MT F-CVT positions objects within a 25m x 25m real open-road scenes with an average error of only 20 cm.
- Score: 1.1227698959066101
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Current parking area perception algorithms primarily focus on detecting vacant slots within a limited range, relying on error-prone homographic projection for both labeling and inference. However, recent advancements in Advanced Driver Assistance System (ADAS) require interaction with end-users through comprehensive and intelligent Human-Machine Interfaces (HMIs). These interfaces should present a complete perception of the parking area going from distinguishing vacant slots' entry lines to the orientation of other parked vehicles. This paper introduces Multi-Task Fisheye Cross View Transformers (MT F-CVT), which leverages features from a four-camera fisheye Surround-view Camera System (SVCS) with multihead attentions to create a detailed Bird-Eye View (BEV) grid feature map. Features are processed by both a segmentation decoder and a Polygon-Yolo based object detection decoder for parking slots and vehicles. Trained on data labeled using LiDAR, MT F-CVT positions objects within a 25m x 25m real open-road scenes with an average error of only 20 cm. Our larger model achieves an F-1 score of 0.89. Moreover the smaller model operates at 16 fps on an Nvidia Jetson Orin embedded board, with similar detection results to the larger one. MT F-CVT demonstrates robust generalization capability across different vehicles and camera rig configurations. A demo video from an unseen vehicle and camera rig is available at: https://streamable.com/jjw54x.
Related papers
- Holistic Parking Slot Detection with Polygon-Shaped Representations [1.1649926489639983]
We propose one-step Holistic Parking Slot Network (HPS-Net), a tailor-made adaptation of the You Only Look Once (YOLO)v4 algorithm.
Experiments show that HPS-Net can detect various vacant parking slots with a F1-score of 0.92.
It achieves a real-time detection speed of 17 FPS on Nvidia Drive AGX Xavier.
arXiv Detail & Related papers (2023-10-17T23:37:23Z) - Multi-target multi-camera vehicle tracking using transformer-based
camera link model and spatial-temporal information [29.34298951501007]
Multi-target multi-camera tracking of vehicles, i.e. tracking vehicles across multiple cameras, is a crucial application for the development of smart city and intelligent traffic system.
Main challenges of MTMCT of vehicles include the intra-class variability of the same vehicle and inter-class similarity between different vehicles.
We propose a transformer-based camera link model with spatial and temporal filtering to conduct cross camera tracking.
arXiv Detail & Related papers (2023-01-18T22:27:08Z) - SurroundDepth: Entangling Surrounding Views for Self-Supervised
Multi-Camera Depth Estimation [101.55622133406446]
We propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras.
Specifically, we employ a joint network to process all the surrounding views and propose a cross-view transformer to effectively fuse the information from multiple views.
In experiments, our method achieves the state-of-the-art performance on the challenging multi-camera depth estimation datasets.
arXiv Detail & Related papers (2022-04-07T17:58:47Z) - PersFormer: 3D Lane Detection via Perspective Transformer and the
OpenLane Benchmark [109.03773439461615]
PersFormer is an end-to-end monocular 3D lane detector with a novel Transformer-based spatial feature transformation module.
We release one of the first large-scale real-world 3D lane datasets, called OpenLane, with high-quality annotation and scenario diversity.
arXiv Detail & Related papers (2022-03-21T16:12:53Z) - City-Scale Multi-Camera Vehicle Tracking Guided by Crossroad Zones [28.922703073971466]
This paper describes our solution to the Track 3 multi-camera vehicle tracking task in 2021 AI City Challenge (AICITY21)
The framework includes:.
Use mature detection and vehicle re-identification models to extract targets and appearance features.
According to the characteristics of the crossroad, the Tracklet Filter Strategy and the Direction Based Temporal Mask are proposed.
arXiv Detail & Related papers (2021-05-14T03:01:17Z) - SVDistNet: Self-Supervised Near-Field Distance Estimation on Surround
View Fisheye Cameras [30.480562747903186]
A 360deg perception of scene geometry is essential for automated driving, notably for parking and urban driving scenarios.
We present novel camera-geometry adaptive multi-scale convolutions which utilize the camera parameters as a conditional input.
We evaluate our approach on the Fisheye WoodScape surround-view dataset, significantly improving over previous approaches.
arXiv Detail & Related papers (2021-04-09T15:20:20Z) - OmniDet: Surround View Cameras based Multi-task Visual Perception
Network for Autonomous Driving [10.3540046389057]
This work presents a multi-task visual perception network on unrectified fisheye images.
It consists of six primary tasks necessary for an autonomous driving system.
We demonstrate that the jointly trained model performs better than the respective single task versions.
arXiv Detail & Related papers (2021-02-15T10:46:24Z) - Multiview Detection with Feature Perspective Transformation [59.34619548026885]
We propose a novel multiview detection system, MVDet.
We take an anchor-free approach to aggregate multiview information by projecting feature maps onto the ground plane.
Our entire model is end-to-end learnable and achieves 88.2% MODA on the standard Wildtrack dataset.
arXiv Detail & Related papers (2020-07-14T17:58:30Z) - MVLidarNet: Real-Time Multi-Class Scene Understanding for Autonomous
Driving Using Multiple Views [60.538802124885414]
We present Multi-View LidarNet (MVLidarNet), a two-stage deep neural network for multi-class object detection and drivable space segmentation.
MVLidarNet is able to detect and classify objects while simultaneously determining the drivable space using a single LiDAR scan as input.
We show results on both KITTI and a much larger internal dataset, thus demonstrating the method's ability to scale by an order of magnitude.
arXiv Detail & Related papers (2020-06-09T21:28:17Z) - Parsing-based View-aware Embedding Network for Vehicle Re-Identification [138.11983486734576]
We propose a parsing-based view-aware embedding network (PVEN) to achieve the view-aware feature alignment and enhancement for vehicle ReID.
The experiments conducted on three datasets show that our model outperforms state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2020-04-10T13:06:09Z) - Road Curb Detection and Localization with Monocular Forward-view Vehicle
Camera [74.45649274085447]
We propose a robust method for estimating road curb 3D parameters using a calibrated monocular camera equipped with a fisheye lens.
Our approach is able to estimate the vehicle to curb distance in real time with mean accuracy of more than 90%.
arXiv Detail & Related papers (2020-02-28T00:24:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.