DarSwin: Distortion Aware Radial Swin Transformer
- URL: http://arxiv.org/abs/2304.09691v5
- Date: Wed, 24 Jul 2024 13:17:07 GMT
- Title: DarSwin: Distortion Aware Radial Swin Transformer
- Authors: Akshaya Athwale, Arman Afrasiyabi, Justin Lagüe, Ichrak Shili, Ola Ahmad, Jean-François Lalonde,
- Abstract summary: We present a novel transformer-based model that automatically adapts to the distortion produced by wide-angle lenses.
We also present DarSwin-Unet, which extends DarSwin to an encoder-decoder architecture suitable for pixel-level tasks.
- Score: 15.110063221436155
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Wide-angle lenses are commonly used in perception tasks requiring a large field of view. Unfortunately, these lenses produce significant distortions, making conventional models that ignore the distortion effects unable to adapt to wide-angle images. In this paper, we present a novel transformer-based model that automatically adapts to the distortion produced by wide-angle lenses. Our proposed image encoder architecture, dubbed DarSwin, leverages the physical characteristics of such lenses analytically defined by the radial distortion profile. In contrast to conventional transformer-based architectures, DarSwin comprises a radial patch partitioning, a distortion-based sampling technique for creating token embeddings, and an angular position encoding for radial patch merging. Compared to other baselines, DarSwin achieves the best results on different datasets with significant gains when trained on bounded levels of distortions (very low, low, medium, and high) and tested on all, including out-of-distribution distortions. While the base DarSwin architecture requires knowledge of the radial distortion profile, we show it can be combined with a self-calibration network that estimates such a profile from the input image itself, resulting in a completely uncalibrated pipeline. Finally, we also present DarSwin-Unet, which extends DarSwin, to an encoder-decoder architecture suitable for pixel-level tasks. We demonstrate its performance on depth estimation and show through extensive experiments that DarSwin-Unet can perform zero-shot adaptation to unseen distortions of different wide-angle lenses. The code and models are publicly available at https://lvsn.github.io/darswin/
Related papers
- DarSwin-Unet: Distortion Aware Encoder-Decoder Architecture [13.412728770638465]
We present an encoder-decoder model that adapts to distortions in wide-angle lenses by leveraging the physical characteristics defined by the radial distortion profile.
In contrast to the original model, which only performs classification tasks, we introduce a U-Net architecture, DarSwin-Unet, designed for pixel level tasks.
Our approach enhances the model capability to handle pixel-level tasks in wide-angle fisheye images, making it more effective for real-world applications.
arXiv Detail & Related papers (2024-07-24T14:52:18Z) - Convolution kernel adaptation to calibrated fisheye [45.90423821963144]
Convolution kernels are the basic structural component of convolutional neural networks (CNNs)
We propose a method that leverages the calibration of cameras to deform the convolution kernel accordingly and adapt to the distortion.
We show how, with just a brief fine-tuning stage in a small dataset, we improve the performance of the network for the calibrated fisheye.
arXiv Detail & Related papers (2024-02-02T14:44:50Z) - CNN Injected Transformer for Image Exposure Correction [20.282217209520006]
Previous exposure correction methods based on convolutions often produce exposure deviation in images.
We propose a CNN Injected Transformer (CIT) to harness the individual strengths of CNN and Transformer simultaneously.
In addition to the hybrid architecture design for exposure correction, we apply a set of carefully formulated loss functions to improve the spatial coherence and rectify potential color deviations.
arXiv Detail & Related papers (2023-09-08T14:53:00Z) - RecRecNet: Rectangling Rectified Wide-Angle Images by Thin-Plate Spline
Model and DoF-based Curriculum Learning [62.86400614141706]
We propose a new learning model, i.e., Rectangling Rectification Network (RecRecNet)
Our model can flexibly warp the source structure to the target domain and achieves an end-to-end unsupervised deformation.
Experiments show the superiority of our solution over the compared methods on both quantitative and qualitative evaluations.
arXiv Detail & Related papers (2023-01-04T15:12:57Z) - Multitask AET with Orthogonal Tangent Regularity for Dark Object
Detection [84.52197307286681]
We propose a novel multitask auto encoding transformation (MAET) model to enhance object detection in a dark environment.
In a self-supervision manner, the MAET learns the intrinsic visual structure by encoding and decoding the realistic illumination-degrading transformation.
We have achieved the state-of-the-art performance using synthetic and real-world datasets.
arXiv Detail & Related papers (2022-05-06T16:27:14Z) - Transformers Solve the Limited Receptive Field for Monocular Depth
Prediction [82.90445525977904]
We propose TransDepth, an architecture which benefits from both convolutional neural networks and transformers.
This is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels.
arXiv Detail & Related papers (2021-03-22T18:00:13Z) - SIR: Self-supervised Image Rectification via Seeing the Same Scene from
Multiple Different Lenses [82.56853587380168]
We propose a novel self-supervised image rectification (SIR) method based on an important insight that the rectified results of distorted images of the same scene from different lens should be the same.
We leverage a differentiable warping module to generate the rectified images and re-distorted images from the distortion parameters.
Our method achieves comparable or even better performance than the supervised baseline method and representative state-of-the-art methods.
arXiv Detail & Related papers (2020-11-30T08:23:25Z) - Wide-angle Image Rectification: A Survey [86.36118799330802]
wide-angle images contain distortions that violate the assumptions underlying pinhole camera models.
Image rectification, which aims to correct these distortions, can solve these problems.
We present a detailed description and discussion of the camera models used in different approaches.
Next, we review both traditional geometry-based image rectification methods and deep learning-based methods.
arXiv Detail & Related papers (2020-10-30T17:28:40Z) - A Deep Ordinal Distortion Estimation Approach for Distortion Rectification [62.72089758481803]
We propose a novel distortion rectification approach that can obtain more accurate parameters with higher efficiency.
We design a local-global associated estimation network that learns the ordinal distortion to approximate the realistic distortion distribution.
Considering the redundancy of distortion information, our approach only uses a part of distorted image for the ordinal distortion estimation.
arXiv Detail & Related papers (2020-07-21T10:03:42Z) - UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a
Generic Framework for Handling Common Camera Distortion Models [8.484676769284578]
We propose a generic scale-aware self-supervised pipeline for estimating depth, euclidean distance, and visual odometry from unrectified monocular videos.
The proposed algorithm is evaluated further on the KITTI rectified dataset, and we achieve state-of-the-art results.
arXiv Detail & Related papers (2020-07-13T20:35:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.