FisheyeHDK: Hyperbolic Deformable Kernel Learning for Ultra-Wide
Field-of-View Image Recognition
- URL: http://arxiv.org/abs/2203.07255v1
- Date: Mon, 14 Mar 2022 16:37:54 GMT
- Title: FisheyeHDK: Hyperbolic Deformable Kernel Learning for Ultra-Wide
Field-of-View Image Recognition
- Authors: Ola Ahmad and Freddy Lecue
- Abstract summary: Conventional convolution neural networks (CNNs) trained on narrow Field-of-View (FoV) images are the state-of-the-art approaches for object recognition tasks.
Some methods proposed the adaptation of CNNs to ultra-wide FoV images by learning deformable kernels.
We demonstrate that learning the shape of convolution kernels in non-Euclidean spaces is better than existing deformable kernel methods.
- Score: 0.3655021726150367
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Conventional convolution neural networks (CNNs) trained on narrow
Field-of-View (FoV) images are the state-of-the-art approaches for object
recognition tasks. Some methods proposed the adaptation of CNNs to ultra-wide
FoV images by learning deformable kernels. However, they are limited by the
Euclidean geometry and their accuracy degrades under strong distortions caused
by fisheye projections. In this work, we demonstrate that learning the shape of
convolution kernels in non-Euclidean spaces is better than existing deformable
kernel methods. In particular, we propose a new approach that learns deformable
kernel parameters (positions) in hyperbolic space. FisheyeHDK is a hybrid CNN
architecture combining hyperbolic and Euclidean convolution layers for
positions and features learning. First, we provide an intuition of hyperbolic
space for wide FoV images. Using synthetic distortion profiles, we demonstrate
the effectiveness of our approach. We select two datasets - Cityscapes and
BDD100K 2020 - of perspective images which we transform to fisheye equivalents
at different scaling factors (analog to focal lengths). Finally, we provide an
experiment on data collected by a real fisheye camera. Validations and
experiments show that our approach improves existing deformable kernel methods
for CNN adaptation on fisheye images.
Related papers
- Deformable Convolution Based Road Scene Semantic Segmentation of Fisheye Images in Autonomous Driving [4.720434481945155]
This study investigates the effectiveness of modern Deformable Convolutional Neural Networks (DCNNs) for semantic segmentation tasks.
Our experiments focus on segmenting the WoodScape fisheye image dataset into ten distinct classes, assessing the Deformable Networks' ability to capture intricate spatial relationships.
The significant improvement in mIoU score resulting from integrating Deformable CNNs demonstrates their effectiveness in handling the geometric distortions present in fisheye imagery.
arXiv Detail & Related papers (2024-07-23T17:02:24Z) - RoFIR: Robust Fisheye Image Rectification Framework Impervious to Optical Center Deviation [88.54817424560056]
We propose a distortion vector map (DVM) that measures the degree and direction of local distortion.
By learning the DVM, the model can independently identify local distortions at each pixel without relying on global distortion patterns.
In the pre-training stage, it predicts the distortion vector map and perceives the local distortion features of each pixel.
In the fine-tuning stage, it predicts a pixel-wise flow map for deviated fisheye image rectification.
arXiv Detail & Related papers (2024-06-27T06:38:56Z) - Convolution kernel adaptation to calibrated fisheye [45.90423821963144]
Convolution kernels are the basic structural component of convolutional neural networks (CNNs)
We propose a method that leverages the calibration of cameras to deform the convolution kernel accordingly and adapt to the distortion.
We show how, with just a brief fine-tuning stage in a small dataset, we improve the performance of the network for the calibrated fisheye.
arXiv Detail & Related papers (2024-02-02T14:44:50Z) - Geometric-aware Pretraining for Vision-centric 3D Object Detection [77.7979088689944]
We propose a novel geometric-aware pretraining framework called GAPretrain.
GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors.
We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively.
arXiv Detail & Related papers (2023-04-06T14:33:05Z) - RecRecNet: Rectangling Rectified Wide-Angle Images by Thin-Plate Spline
Model and DoF-based Curriculum Learning [62.86400614141706]
We propose a new learning model, i.e., Rectangling Rectification Network (RecRecNet)
Our model can flexibly warp the source structure to the target domain and achieves an end-to-end unsupervised deformation.
Experiments show the superiority of our solution over the compared methods on both quantitative and qualitative evaluations.
arXiv Detail & Related papers (2023-01-04T15:12:57Z) - OSLO: On-the-Sphere Learning for Omnidirectional images and its
application to 360-degree image compression [59.58879331876508]
We study the learning of representation models for omnidirectional images and propose to use the properties of HEALPix uniform sampling of the sphere to redefine the mathematical tools used in deep learning models for omnidirectional images.
Our proposed on-the-sphere solution leads to a better compression gain that can save 13.7% of the bit rate compared to similar learned models applied to equirectangular images.
arXiv Detail & Related papers (2021-07-19T22:14:30Z) - Adaptable Deformable Convolutions for Semantic Segmentation of Fisheye
Images in Autonomous Driving Systems [4.231909978425546]
We show that a CNN trained on standard images can be readily adapted to fisheye images.
Our adaptation protocol mainly relies on modifying the support of the convolutions by using their deformable equivalents on top of pre-existing layers.
arXiv Detail & Related papers (2021-02-19T22:47:44Z) - Neural Ray Surfaces for Self-Supervised Learning of Depth and Ego-motion [51.19260542887099]
We show that self-supervision can be used to learn accurate depth and ego-motion estimation without prior knowledge of the camera model.
Inspired by the geometric model of Grossberg and Nayar, we introduce Neural Ray Surfaces (NRS), convolutional networks that represent pixel-wise projection rays.
We demonstrate the use of NRS for self-supervised learning of visual odometry and depth estimation from raw videos obtained using a wide variety of camera systems.
arXiv Detail & Related papers (2020-08-15T02:29:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.