Sector Patch Embedding: An Embedding Module Conforming to The Distortion
Pattern of Fisheye Image
- URL: http://arxiv.org/abs/2303.14645v1
- Date: Sun, 26 Mar 2023 07:20:02 GMT
- Title: Sector Patch Embedding: An Embedding Module Conforming to The Distortion
Pattern of Fisheye Image
- Authors: Dianyi Yang, Jiadong Tang, Yu Gao, Yi Yang, Mengyin Fu
- Abstract summary: We propose a novel patch embedding method called Sector Patch Embedding(SPE), conforming to the distortion pattern of the fisheye image.
The classification top-1 accuracy of ViT and PVT is improved by 0.75% and 2.8% with SPE respectively.
Our method can be easily adopted to other Transformer-based models.
- Score: 23.73394258521532
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fisheye cameras suffer from image distortion while having a large field of
view(LFOV). And this fact leads to poor performance on some fisheye vision
tasks. One of the solutions is to optimize the current vision algorithm for
fisheye images. However, most of the CNN-based methods and the
Transformer-based methods lack the capability of leveraging distortion
information efficiently. In this work, we propose a novel patch embedding
method called Sector Patch Embedding(SPE), conforming to the distortion pattern
of the fisheye image. Furthermore, we put forward a synthetic fisheye dataset
based on the ImageNet-1K and explore the performance of several Transformer
models on the dataset. The classification top-1 accuracy of ViT and PVT is
improved by 0.75% and 2.8% with SPE respectively. The experiments show that the
proposed sector patch embedding method can better perceive distortion and
extract features on the fisheye images. Our method can be easily adopted to
other Transformer-based models. Source code is at
https://github.com/IN2-ViAUn/Sector-Patch-Embedding.
Related papers
- RoFIR: Robust Fisheye Image Rectification Framework Impervious to Optical Center Deviation [88.54817424560056]
We propose a distortion vector map (DVM) that measures the degree and direction of local distortion.
By learning the DVM, the model can independently identify local distortions at each pixel without relying on global distortion patterns.
In the pre-training stage, it predicts the distortion vector map and perceives the local distortion features of each pixel.
In the fine-tuning stage, it predicts a pixel-wise flow map for deviated fisheye image rectification.
arXiv Detail & Related papers (2024-06-27T06:38:56Z) - SimFIR: A Simple Framework for Fisheye Image Rectification with
Self-supervised Representation Learning [105.01294305972037]
We introduce SimFIR, a framework for fisheye image rectification based on self-supervised representation learning.
To learn fine-grained distortion representations, we first split a fisheye image into multiple patches and extract their representations with a Vision Transformer.
The transfer performance on the downstream rectification task is remarkably boosted, which verifies the effectiveness of the learned representations.
arXiv Detail & Related papers (2023-08-17T15:20:17Z) - A Stronger Stitching Algorithm for Fisheye Images based on Deblurring
and Registration [3.6417475195085602]
We devise a stronger stitching algorithm for fisheye images by combining the traditional image processing method with deep learning.
In the stage of fisheye image correction, we propose the Attention-based Activation Free Network (ANAFNet) to deblur fisheye images corrected by calibration method.
In the part of image registration, we propose the ORB-FREAK-GMS (OFG), a comprehensive image matching algorithm, to improve the accuracy of image registration.
arXiv Detail & Related papers (2023-07-22T06:54:16Z) - FisheyeEX: Polar Outpainting for Extending the FoV of Fisheye Lens [84.12722334460022]
Fisheye lens gains increasing applications in computational photography and assisted driving because of its wide field of view (FoV)
In this paper, we present a FisheyeEX method that extends the FoV of the fisheye lens by outpainting the invalid regions.
The results demonstrate that our approach significantly outperforms the state-of-the-art methods, gaining around 27% more content beyond the original fisheye image.
arXiv Detail & Related papers (2022-06-12T21:38:50Z) - Three things everyone should know about Vision Transformers [67.30250766591405]
transformer architectures have rapidly gained traction in computer vision.
We offer three insights based on simple and easy to implement variants of vision transformers.
We evaluate the impact of these design choices using the ImageNet-1k dataset, and confirm our findings on the ImageNet-v2 test set.
arXiv Detail & Related papers (2022-03-18T08:23:03Z) - Attribute Surrogates Learning and Spectral Tokens Pooling in
Transformers for Few-shot Learning [50.95116994162883]
Vision transformers have been thought of as a promising alternative to convolutional neural networks for visual recognition.
This paper presents hierarchically cascaded transformers that exploit intrinsic image structures through spectral tokens pooling.
HCTransformers surpass the DINO baseline by a large margin of 9.7% 5-way 1-shot accuracy and 9.17% 5-way 5-shot accuracy on miniImageNet.
arXiv Detail & Related papers (2022-03-17T03:49:58Z) - Patch Slimming for Efficient Vision Transformers [107.21146699082819]
We study the efficiency problem for visual transformers by excavating redundant calculation in given networks.
We present a novel patch slimming approach that discards useless patches in a top-down paradigm.
Experimental results on benchmark datasets demonstrate that the proposed method can significantly reduce the computational costs of vision transformers.
arXiv Detail & Related papers (2021-06-05T09:46:00Z) - Fisheye Distortion Rectification from Deep Straight Lines [34.61402494687801]
We present a novel line-aware rectification network (LaRecNet) to address the problem of fisheye distortion rectification.
Our model achieves state-of-the-art performance in terms of both geometric accuracy and image quality.
In particular, the images rectified by LaRecNet achieve the highest peak signal-to-noise ratio (PSNR) and structure similarity index (SSIM) compared with the groundtruth.
arXiv Detail & Related papers (2020-03-25T13:20:00Z) - Universal Semantic Segmentation for Fisheye Urban Driving Images [6.56742346304883]
We propose a seven degrees of freedom (DoF) augmentation method to transform rectilinear image to fisheye image.
In the training process, rectilinear images are transformed into fisheye images in seven DoF, which simulates the fisheye images taken by cameras of different positions, orientations and focal lengths.
The result shows that training with the seven-DoF augmentation can improve the model's accuracy and robustness against different distorted fisheye data.
arXiv Detail & Related papers (2020-01-31T11:19:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.