Universal Semantic Segmentation for Fisheye Urban Driving Images
- URL: http://arxiv.org/abs/2002.03736v2
- Date: Mon, 24 Aug 2020 13:02:09 GMT
- Title: Universal Semantic Segmentation for Fisheye Urban Driving Images
- Authors: Yaozu Ye, Kailun Yang, Kaite Xiang, Juan Wang and Kaiwei Wang
- Abstract summary: We propose a seven degrees of freedom (DoF) augmentation method to transform rectilinear image to fisheye image.
In the training process, rectilinear images are transformed into fisheye images in seven DoF, which simulates the fisheye images taken by cameras of different positions, orientations and focal lengths.
The result shows that training with the seven-DoF augmentation can improve the model's accuracy and robustness against different distorted fisheye data.
- Score: 6.56742346304883
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semantic segmentation is a critical method in the field of autonomous
driving. When performing semantic image segmentation, a wider field of view
(FoV) helps to obtain more information about the surrounding environment,
making automatic driving safer and more reliable, which could be offered by
fisheye cameras. However, large public fisheye datasets are not available, and
the fisheye images captured by the fisheye camera with large FoV comes with
large distortion, so commonly-used semantic segmentation model cannot be
directly utilized. In this paper, a seven degrees of freedom (DoF) augmentation
method is proposed to transform rectilinear image to fisheye image in a more
comprehensive way. In the training process, rectilinear images are transformed
into fisheye images in seven DoF, which simulates the fisheye images taken by
cameras of different positions, orientations and focal lengths. The result
shows that training with the seven-DoF augmentation can improve the model's
accuracy and robustness against different distorted fisheye data. This
seven-DoF augmentation provides a universal semantic segmentation solution for
fisheye cameras in different autonomous driving applications. Also, we provide
specific parameter settings of the augmentation for autonomous driving. At
last, we tested our universal semantic segmentation model on real fisheye
images and obtained satisfactory results. The code and configurations are
released at https://github.com/Yaozhuwa/FisheyeSeg.
Related papers
- RoFIR: Robust Fisheye Image Rectification Framework Impervious to Optical Center Deviation [88.54817424560056]
We propose a distortion vector map (DVM) that measures the degree and direction of local distortion.
By learning the DVM, the model can independently identify local distortions at each pixel without relying on global distortion patterns.
In the pre-training stage, it predicts the distortion vector map and perceives the local distortion features of each pixel.
In the fine-tuning stage, it predicts a pixel-wise flow map for deviated fisheye image rectification.
arXiv Detail & Related papers (2024-06-27T06:38:56Z) - Towards Viewpoint Robustness in Bird's Eye View Segmentation [85.99907496019972]
We study how AV perception models are affected by changes in camera viewpoint.
Small changes to pitch, yaw, depth, or height of the camera at inference time lead to large drops in performance.
We introduce a technique for novel view synthesis and use it to transform collected data to the viewpoint of target rigs.
arXiv Detail & Related papers (2023-09-11T02:10:07Z) - FisheyePP4AV: A privacy-preserving method for autonomous vehicles on
fisheye camera images [1.534667887016089]
In many parts of the world, the use of vast amounts of data collected on public roadways for autonomous driving has increased.
In order to detect and anonymize pedestrian faces and nearby car license plates in actual road-driving scenarios, there is an urgent need for effective solutions.
In this work, we pay particular attention to protecting privacy while yet adhering to several laws for fisheye camera photos taken by driverless vehicles.
arXiv Detail & Related papers (2023-09-07T15:51:31Z) - SimFIR: A Simple Framework for Fisheye Image Rectification with
Self-supervised Representation Learning [105.01294305972037]
We introduce SimFIR, a framework for fisheye image rectification based on self-supervised representation learning.
To learn fine-grained distortion representations, we first split a fisheye image into multiple patches and extract their representations with a Vision Transformer.
The transfer performance on the downstream rectification task is remarkably boosted, which verifies the effectiveness of the learned representations.
arXiv Detail & Related papers (2023-08-17T15:20:17Z) - Sector Patch Embedding: An Embedding Module Conforming to The Distortion
Pattern of Fisheye Image [23.73394258521532]
We propose a novel patch embedding method called Sector Patch Embedding(SPE), conforming to the distortion pattern of the fisheye image.
The classification top-1 accuracy of ViT and PVT is improved by 0.75% and 2.8% with SPE respectively.
Our method can be easily adopted to other Transformer-based models.
arXiv Detail & Related papers (2023-03-26T07:20:02Z) - FishDreamer: Towards Fisheye Semantic Completion via Unified Image
Outpainting and Segmentation [33.71849096992972]
This paper raises the new task of Fisheye Semantic Completion (FSC), where dense texture, structure, and semantics of a fisheye image are inferred even beyond the sensor field-of-view (FoV)
We introduce the new FishDreamer which relies on successful ViTs enhanced with a novel Polar-aware Cross Attention module (PCA) to leverage dense context and guide semantically-consistent content generation.
arXiv Detail & Related papers (2023-03-24T07:34:25Z) - FisheyeEX: Polar Outpainting for Extending the FoV of Fisheye Lens [84.12722334460022]
Fisheye lens gains increasing applications in computational photography and assisted driving because of its wide field of view (FoV)
In this paper, we present a FisheyeEX method that extends the FoV of the fisheye lens by outpainting the invalid regions.
The results demonstrate that our approach significantly outperforms the state-of-the-art methods, gaining around 27% more content beyond the original fisheye image.
arXiv Detail & Related papers (2022-06-12T21:38:50Z) - SurroundDepth: Entangling Surrounding Views for Self-Supervised
Multi-Camera Depth Estimation [101.55622133406446]
We propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras.
Specifically, we employ a joint network to process all the surrounding views and propose a cross-view transformer to effectively fuse the information from multiple views.
In experiments, our method achieves the state-of-the-art performance on the challenging multi-camera depth estimation datasets.
arXiv Detail & Related papers (2022-04-07T17:58:47Z) - SVDistNet: Self-Supervised Near-Field Distance Estimation on Surround
View Fisheye Cameras [30.480562747903186]
A 360deg perception of scene geometry is essential for automated driving, notably for parking and urban driving scenarios.
We present novel camera-geometry adaptive multi-scale convolutions which utilize the camera parameters as a conditional input.
We evaluate our approach on the Fisheye WoodScape surround-view dataset, significantly improving over previous approaches.
arXiv Detail & Related papers (2021-04-09T15:20:20Z) - Generalized Object Detection on Fisheye Cameras for Autonomous Driving:
Dataset, Representations and Baseline [5.1450366450434295]
We explore better representations like oriented bounding box, ellipse, and generic polygon for object detection in fisheye images.
We design a novel curved bounding box model that has optimal properties for fisheye distortion models.
It is the first detailed study on object detection on fisheye cameras for autonomous driving scenarios.
arXiv Detail & Related papers (2020-12-03T18:00:16Z) - Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by
Implicitly Unprojecting to 3D [100.93808824091258]
We propose a new end-to-end architecture that directly extracts a bird's-eye-view representation of a scene given image data from an arbitrary number of cameras.
Our approach is to "lift" each image individually into a frustum of features for each camera, then "splat" all frustums into a bird's-eye-view grid.
We show that the representations inferred by our model enable interpretable end-to-end motion planning by "shooting" template trajectories into a bird's-eye-view cost map output by our network.
arXiv Detail & Related papers (2020-08-13T06:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.