Related papers: Adapting CNNs for Fisheye Cameras without Retraining

Adapting CNNs for Fisheye Cameras without Retraining

URL: http://arxiv.org/abs/2404.08187v1
Date: Fri, 12 Apr 2024 01:36:00 GMT
Title: Adapting CNNs for Fisheye Cameras without Retraining
Authors: Ryan Griffiths, Donald G. Dansereau,
Abstract summary: In many applications it is beneficial to use non conventional cameras, such as fisheye cameras, that have a larger field of view. The issue arises that these large-FOV images can't be rectified to a perspective projection without significant cropping of the original image. We propose Rectified Convolutions; a new approach for adapting pre-trained convolutional networks to operate with new non-perspective images.
Score: 3.683202928838613
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The majority of image processing approaches assume images are in or can be rectified to a perspective projection. However, in many applications it is beneficial to use non conventional cameras, such as fisheye cameras, that have a larger field of view (FOV). The issue arises that these large-FOV images can't be rectified to a perspective projection without significant cropping of the original image. To address this issue we propose Rectified Convolutions (RectConv); a new approach for adapting pre-trained convolutional networks to operate with new non-perspective images, without any retraining. Replacing the convolutional layers of the network with RectConv layers allows the network to see both rectified patches and the entire FOV. We demonstrate RectConv adapting multiple pre-trained networks to perform segmentation and detection on fisheye imagery from two publicly available datasets. Our approach requires no additional data or training, and operates directly on the native image as captured from the camera. We believe this work is a step toward adapting the vast resources available for perspective images to operate across a broad range of camera geometries.

Related papers

Convolution kernel adaptation to calibrated fisheye [45.90423821963144]
Convolution kernels are the basic structural component of convolutional neural networks (CNNs) We propose a method that leverages the calibration of cameras to deform the convolution kernel accordingly and adapt to the distortion. We show how, with just a brief fine-tuning stage in a small dataset, we improve the performance of the network for the calibrated fisheye.
arXiv Detail & Related papers (2024-02-02T14:44:50Z)
FoVA-Depth: Field-of-View Agnostic Depth Estimation for Cross-Dataset Generalization [57.98448472585241]
We propose a method to train a stereo depth estimation model on the widely available pinhole data. We show strong generalization ability of our approach on both indoor and outdoor datasets.
arXiv Detail & Related papers (2024-01-24T20:07:59Z)
Geometric-aware Pretraining for Vision-centric 3D Object Detection [77.7979088689944]
We propose a novel geometric-aware pretraining framework called GAPretrain. GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors. We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively.
arXiv Detail & Related papers (2023-04-06T14:33:05Z)
FisheyeHDK: Hyperbolic Deformable Kernel Learning for Ultra-Wide Field-of-View Image Recognition [0.3655021726150367]
Conventional convolution neural networks (CNNs) trained on narrow Field-of-View (FoV) images are the state-of-the-art approaches for object recognition tasks. Some methods proposed the adaptation of CNNs to ultra-wide FoV images by learning deformable kernels. We demonstrate that learning the shape of convolution kernels in non-Euclidean spaces is better than existing deformable kernel methods.
arXiv Detail & Related papers (2022-03-14T16:37:54Z)
Unsupervised Domain Adaptation for Video Semantic Segmentation [91.30558794056054]
Unsupervised Domain Adaptation for semantic segmentation has gained immense popularity since it can transfer knowledge from simulation to real. In this work, we present a new video extension of this task, namely Unsupervised Domain Adaptation for Video Semantic approaches. We show that our proposals significantly outperform previous image-based UDA methods both on image-level (mIoU) and video-level (VPQ) evaluation metrics.
arXiv Detail & Related papers (2021-07-23T07:18:20Z)
Recognizing Actions in Videos from Unseen Viewpoints [80.6338404141284]
We show that current convolutional neural network models are unable to recognize actions from camera viewpoints not present in training data. We introduce a new dataset for unseen view recognition and show the approaches ability to learn viewpoint invariant representations.
arXiv Detail & Related papers (2021-03-30T17:17:54Z)
Adaptable Deformable Convolutions for Semantic Segmentation of Fisheye Images in Autonomous Driving Systems [4.231909978425546]
We show that a CNN trained on standard images can be readily adapted to fisheye images. Our adaptation protocol mainly relies on modifying the support of the convolutions by using their deformable equivalents on top of pre-existing layers.
arXiv Detail & Related papers (2021-02-19T22:47:44Z)
Image Restoration by Deep Projected GSURE [115.57142046076164]
Ill-posed inverse problems appear in many image processing applications, such as deblurring and super-resolution. We propose a new image restoration framework that is based on minimizing a loss function that includes a "projected-version" of the Generalized SteinUnbiased Risk Estimator (GSURE) and parameterization of the latent image by a CNN.
arXiv Detail & Related papers (2021-02-04T08:52:46Z)
BEV-Seg: Bird's Eye View Semantic Segmentation Using Geometry and Semantic Point Cloud [21.29622194272066]
We focus on bird's eye semantic segmentation, a task that predicts pixel-wise semantic segmentation in BEV from side RGB images. There are two main challenges to this task: the view transformation from side view to bird's eye view, as well as transfer learning to unseen domains. Our novel 2-staged perception pipeline explicitly predicts pixel depths and combines them with pixel semantics in an efficient manner.
arXiv Detail & Related papers (2020-06-19T23:30:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.