Image Reconstruction using Enhanced Vision Transformer
- URL: http://arxiv.org/abs/2307.05616v1
- Date: Tue, 11 Jul 2023 02:14:18 GMT
- Title: Image Reconstruction using Enhanced Vision Transformer
- Authors: Nikhil Verma, Deepkamal Kaur, Lydia Chau
- Abstract summary: We propose a novel image reconstruction framework which can be used for tasks such as image denoising, deblurring or inpainting.
The model proposed in this project is based on Vision Transformer (ViT) that takes 2D images as input and outputs embeddings.
We incorporate four additional optimization techniques in the framework to improve the model reconstruction capability.
- Score: 0.08594140167290097
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Removing noise from images is a challenging and fundamental problem in the
field of computer vision. Images captured by modern cameras are inevitably
degraded by noise which limits the accuracy of any quantitative measurements on
those images. In this project, we propose a novel image reconstruction
framework which can be used for tasks such as image denoising, deblurring or
inpainting. The model proposed in this project is based on Vision Transformer
(ViT) that takes 2D images as input and outputs embeddings which can be used
for reconstructing denoised images. We incorporate four additional optimization
techniques in the framework to improve the model reconstruction capability,
namely Locality Sensitive Attention (LSA), Shifted Patch Tokenization (SPT),
Rotary Position Embeddings (RoPE) and adversarial loss function inspired from
Generative Adversarial Networks (GANs). LSA, SPT and RoPE enable the
transformer to learn from the dataset more efficiently, while the adversarial
loss function enhances the resolution of the reconstructed images. Based on our
experiments, the proposed architecture outperforms the benchmark U-Net model by
more than 3.5\% structural similarity (SSIM) for the reconstruction tasks of
image denoising and inpainting. The proposed enhancements further show an
improvement of \textasciitilde5\% SSIM over the benchmark for both tasks.
Related papers
- Multi-Scale Representation Learning for Image Restoration with State-Space Model [13.622411683295686]
We propose a novel Multi-Scale State-Space Model-based (MS-Mamba) for efficient image restoration.
Our proposed method achieves new state-of-the-art performance while maintaining low computational complexity.
arXiv Detail & Related papers (2024-08-19T16:42:58Z) - GTR: Improving Large 3D Reconstruction Models through Geometry and Texture Refinement [51.97726804507328]
We propose a novel approach for 3D mesh reconstruction from multi-view images.
Our method takes inspiration from large reconstruction models that use a transformer-based triplane generator and a Neural Radiance Field (NeRF) model trained on multi-view images.
arXiv Detail & Related papers (2024-06-09T05:19:24Z) - Efficient Visual State Space Model for Image Deblurring [83.57239834238035]
Convolutional neural networks (CNNs) and Vision Transformers (ViTs) have achieved excellent performance in image restoration.
We propose a simple yet effective visual state space model (EVSSM) for image deblurring.
arXiv Detail & Related papers (2024-05-23T09:13:36Z) - IPT-V2: Efficient Image Processing Transformer using Hierarchical Attentions [26.09373405194564]
We present an efficient image processing transformer architecture with hierarchical attentions, called IPTV2.
We adopt a focal context self-attention (FCSA) and a global grid self-attention (GGSA) to obtain adequate token interactions in local and global receptive fields.
Our proposed IPT-V2 achieves state-of-the-art results on various image processing tasks, covering denoising, deblurring, deraining and obtains much better trade-off for performance and computational complexity than previous methods.
arXiv Detail & Related papers (2024-03-31T10:01:20Z) - Segmentation Guided Sparse Transformer for Under-Display Camera Image
Restoration [91.65248635837145]
Under-Display Camera (UDC) is an emerging technology that achieves full-screen display via hiding the camera under the display panel.
In this paper, we observe that when using the Vision Transformer for UDC degraded image restoration, the global attention samples a large amount of redundant information and noise.
We propose a Guided Sparse Transformer method (SGSFormer) for the task of restoring high-quality images from UDC degraded images.
arXiv Detail & Related papers (2024-03-09T13:11:59Z) - LIR: A Lightweight Baseline for Image Restoration [4.187190284830909]
The inherent characteristics of the Image Restoration task are often overlooked in many works.
We propose a Lightweight Baseline network for Image Restoration called LIR to efficiently restore the image and remove degradations.
Our LIR achieves the state-of-the-art Structure Similarity Index Measure (SSIM) and comparable performance to state-of-the-art models on Peak Signal-to-Noise Ratio (PSNR)
arXiv Detail & Related papers (2024-02-02T12:39:47Z) - HAT: Hybrid Attention Transformer for Image Restoration [61.74223315807691]
Transformer-based methods have shown impressive performance in image restoration tasks, such as image super-resolution and denoising.
We propose a new Hybrid Attention Transformer (HAT) to activate more input pixels for better restoration.
Our HAT achieves state-of-the-art performance both quantitatively and qualitatively.
arXiv Detail & Related papers (2023-09-11T05:17:55Z) - Reconstruction Distortion of Learned Image Compression with
Imperceptible Perturbations [69.25683256447044]
We introduce an attack approach designed to effectively degrade the reconstruction quality of Learned Image Compression (LIC)
We generate adversarial examples by introducing a Frobenius norm-based loss function to maximize the discrepancy between original images and reconstructed adversarial examples.
Experiments conducted on the Kodak dataset using various LIC models demonstrate effectiveness.
arXiv Detail & Related papers (2023-06-01T20:21:05Z) - Dual Perceptual Loss for Single Image Super-Resolution Using ESRGAN [13.335546116599494]
This paper proposes a method called Dual Perceptual Loss (DP Loss) to replace the original perceptual loss to solve the problem of single image super-resolution reconstruction.
Due to the complementary property between the VGG features and the ResNet features, the proposed DP Loss considers the advantages of learning two features simultaneously.
The qualitative and quantitative analysis on benchmark datasets demonstrates the superiority of our proposed method over state-of-the-art super-resolution methods.
arXiv Detail & Related papers (2022-01-17T12:42:56Z) - Restormer: Efficient Transformer for High-Resolution Image Restoration [118.9617735769827]
convolutional neural networks (CNNs) perform well at learning generalizable image priors from large-scale data.
Transformers have shown significant performance gains on natural language and high-level vision tasks.
Our model, named Restoration Transformer (Restormer), achieves state-of-the-art results on several image restoration tasks.
arXiv Detail & Related papers (2021-11-18T18:59:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.