DarSwin-Unet: Distortion Aware Encoder-Decoder Architecture
- URL: http://arxiv.org/abs/2407.17328v1
- Date: Wed, 24 Jul 2024 14:52:18 GMT
- Title: DarSwin-Unet: Distortion Aware Encoder-Decoder Architecture
- Authors: Akshaya Athwale, Ichrak Shili, Émile Bergeron, Ola Ahmad, Jean-François Lalonde,
- Abstract summary: We present an encoder-decoder model that adapts to distortions in wide-angle lenses by leveraging the physical characteristics defined by the radial distortion profile.
In contrast to the original model, which only performs classification tasks, we introduce a U-Net architecture, DarSwin-Unet, designed for pixel level tasks.
Our approach enhances the model capability to handle pixel-level tasks in wide-angle fisheye images, making it more effective for real-world applications.
- Score: 13.412728770638465
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Wide-angle fisheye images are becoming increasingly common for perception tasks in applications such as robotics, security, and mobility (e.g. drones, avionics). However, current models often either ignore the distortions in wide-angle images or are not suitable to perform pixel-level tasks. In this paper, we present an encoder-decoder model based on a radial transformer architecture that adapts to distortions in wide-angle lenses by leveraging the physical characteristics defined by the radial distortion profile. In contrast to the original model, which only performs classification tasks, we introduce a U-Net architecture, DarSwin-Unet, designed for pixel level tasks. Furthermore, we propose a novel strategy that minimizes sparsity when sampling the image for creating its input tokens. Our approach enhances the model capability to handle pixel-level tasks in wide-angle fisheye images, making it more effective for real-world applications. Compared to other baselines, DarSwin-Unet achieves the best results across different datasets, with significant gains when trained on bounded levels of distortions (very low, low, medium, and high) and tested on all, including out-of-distribution distortions. We demonstrate its performance on depth estimation and show through extensive experiments that DarSwin-Unet can perform zero-shot adaptation to unseen distortions of different wide-angle lenses.
Related papers
- HEAL-SWIN: A Vision Transformer On The Sphere [4.379414115481346]
High-resolution wide-angle fisheye images are becoming more important for robotics applications such as autonomous driving.
We introduce the HEAL-SWIN transformer, which combines the highly uniform Hierarchical Equal Area iso-Latitude Pixelation grid used in astrophysics and cosmology.
In HEAL-SWIN, the nested structure of the HEALPix grid is used to perform the patching and windowing operations of the SWIN transformer, enabling the network to process spherical representations with minimal computational overhead.
arXiv Detail & Related papers (2023-07-14T12:46:59Z) - DarSwin: Distortion Aware Radial Swin Transformer [15.110063221436155]
We present a novel transformer-based model that automatically adapts to the distortion produced by wide-angle lenses.
We also present DarSwin-Unet, which extends DarSwin to an encoder-decoder architecture suitable for pixel-level tasks.
arXiv Detail & Related papers (2023-04-19T14:32:56Z) - RecRecNet: Rectangling Rectified Wide-Angle Images by Thin-Plate Spline
Model and DoF-based Curriculum Learning [62.86400614141706]
We propose a new learning model, i.e., Rectangling Rectification Network (RecRecNet)
Our model can flexibly warp the source structure to the target domain and achieves an end-to-end unsupervised deformation.
Experiments show the superiority of our solution over the compared methods on both quantitative and qualitative evaluations.
arXiv Detail & Related papers (2023-01-04T15:12:57Z) - Effective Invertible Arbitrary Image Rescaling [77.46732646918936]
Invertible Neural Networks (INN) are able to increase upscaling accuracy significantly by optimizing the downscaling and upscaling cycle jointly.
A simple and effective invertible arbitrary rescaling network (IARN) is proposed to achieve arbitrary image rescaling by training only one model in this work.
It is shown to achieve a state-of-the-art (SOTA) performance in bidirectional arbitrary rescaling without compromising perceptual quality in LR outputs.
arXiv Detail & Related papers (2022-09-26T22:22:30Z) - Multitask AET with Orthogonal Tangent Regularity for Dark Object
Detection [84.52197307286681]
We propose a novel multitask auto encoding transformation (MAET) model to enhance object detection in a dark environment.
In a self-supervision manner, the MAET learns the intrinsic visual structure by encoding and decoding the realistic illumination-degrading transformation.
We have achieved the state-of-the-art performance using synthetic and real-world datasets.
arXiv Detail & Related papers (2022-05-06T16:27:14Z) - Plug-In Inversion: Model-Agnostic Inversion for Vision with Data
Augmentations [61.95114821573875]
We introduce Plug-In Inversion, which relies on a simple set of augmentations and does not require excessive hyper- parameter tuning.
We illustrate the practicality of our approach by inverting Vision Transformers (ViTs) and Multi-Layer Perceptrons (MLPs) trained on the ImageNet dataset.
arXiv Detail & Related papers (2022-01-31T02:12:45Z) - Single Image Automatic Radial Distortion Compensation Using Deep
Convolutional Network [0.12891210250935145]
We present a novel method for single-image automatic lens distortion compensation based on deep convolutional neural networks.
The method is capable of real-time performance and accuracy using two highest-order coefficients of the radial distortion model operating in the application domain of sports broadcast.
arXiv Detail & Related papers (2021-12-14T13:04:03Z) - Restormer: Efficient Transformer for High-Resolution Image Restoration [118.9617735769827]
convolutional neural networks (CNNs) perform well at learning generalizable image priors from large-scale data.
Transformers have shown significant performance gains on natural language and high-level vision tasks.
Our model, named Restoration Transformer (Restormer), achieves state-of-the-art results on several image restoration tasks.
arXiv Detail & Related papers (2021-11-18T18:59:10Z) - Adaptable Deformable Convolutions for Semantic Segmentation of Fisheye
Images in Autonomous Driving Systems [4.231909978425546]
We show that a CNN trained on standard images can be readily adapted to fisheye images.
Our adaptation protocol mainly relies on modifying the support of the convolutions by using their deformable equivalents on top of pre-existing layers.
arXiv Detail & Related papers (2021-02-19T22:47:44Z) - SIR: Self-supervised Image Rectification via Seeing the Same Scene from
Multiple Different Lenses [82.56853587380168]
We propose a novel self-supervised image rectification (SIR) method based on an important insight that the rectified results of distorted images of the same scene from different lens should be the same.
We leverage a differentiable warping module to generate the rectified images and re-distorted images from the distortion parameters.
Our method achieves comparable or even better performance than the supervised baseline method and representative state-of-the-art methods.
arXiv Detail & Related papers (2020-11-30T08:23:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.