Catch Missing Details: Image Reconstruction with Frequency Augmented
Variational Autoencoder
- URL: http://arxiv.org/abs/2305.02541v2
- Date: Fri, 3 Nov 2023 21:48:40 GMT
- Title: Catch Missing Details: Image Reconstruction with Frequency Augmented
Variational Autoencoder
- Authors: Xinmiao Lin, Yikang Li, Jenhao Hsiao, Chiuman Ho, Yu Kong
- Abstract summary: A higher compression rate induces more loss of visual signals on the higher frequency spectrum which reflect the details on pixel space.
A Frequency Complement Module (FCM) architecture is proposed to capture the missing frequency information for enhancing reconstruction quality.
A Cross-attention Autoregressive Transformer (CAT) is proposed to obtain more precise semantic attributes in texts.
- Score: 27.149365819904745
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The popular VQ-VAE models reconstruct images through learning a discrete
codebook but suffer from a significant issue in the rapid quality degradation
of image reconstruction as the compression rate rises. One major reason is that
a higher compression rate induces more loss of visual signals on the higher
frequency spectrum which reflect the details on pixel space. In this paper, a
Frequency Complement Module (FCM) architecture is proposed to capture the
missing frequency information for enhancing reconstruction quality. The FCM can
be easily incorporated into the VQ-VAE structure, and we refer to the new model
as Frequency Augmented VAE (FA-VAE). In addition, a Dynamic Spectrum Loss (DSL)
is introduced to guide the FCMs to balance between various frequencies
dynamically for optimal reconstruction. FA-VAE is further extended to the
text-to-image synthesis task, and a Cross-attention Autoregressive Transformer
(CAT) is proposed to obtain more precise semantic attributes in texts.
Extensive reconstruction experiments with different compression rates are
conducted on several benchmark datasets, and the results demonstrate that the
proposed FA-VAE is able to restore more faithfully the details compared to SOTA
methods. CAT also shows improved generation quality with better image-text
semantic alignment.
Related papers
- SC-CDM: Enhancing Quality of Image Semantic Communication with a Compact Diffusion Model [27.462224078883786]
We propose a generative SC for wireless image transmission (denoted as SC-CDM)
We aim to redesign the swin Transformer as a new backbone for efficient semantic feature extraction and compression.
We further increase the Peak Signal-to-Noise Ratio (PSNR) by over 17% on top of CNN-based DeepJSCC.
arXiv Detail & Related papers (2024-10-03T01:01:04Z) - AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and Modulation [99.57024606542416]
We propose an adaptive all-in-one image restoration network based on frequency mining and modulation.
Our approach is motivated by the observation that different degradation types impact the image content on different frequency subbands.
The proposed model achieves adaptive reconstruction by accentuating the informative frequency subbands according to different input degradations.
arXiv Detail & Related papers (2024-03-21T17:58:14Z) - Hierarchical Frequency-based Upsampling and Refining for Compressed Video Quality Enhancement [14.653248860008981]
We propose a hierarchical frequency-based upsampling and refining neural network (HFUR) for compressed video quality enhancement.
ImpFreqUp exploits DCT-domain prior derived through implicit DCT transform, and accurately reconstructs the DCT-domain loss via a coarse-to-fine transfer.
HIR is introduced to facilitate cross-collaboration and information compensation between the scales, thus further refine the feature maps and promote the visual quality of the final output.
arXiv Detail & Related papers (2024-03-18T08:13:26Z) - mdctGAN: Taming transformer-based GAN for speech super-resolution with
Modified DCT spectra [4.721572768262729]
Speech super-resolution (SSR) aims to recover a high resolution (HR) speech from its corresponding low resolution (LR) counterpart.
Recent SSR methods focus more on the reconstruction of the magnitude spectrogram, ignoring the importance of phase reconstruction.
We propose mdctGAN, a novel SSR framework based on modified discrete cosine transform (MDCT)
arXiv Detail & Related papers (2023-05-18T16:49:46Z) - Contextual Learning in Fourier Complex Field for VHR Remote Sensing
Images [64.84260544255477]
transformer-based models demonstrated outstanding potential for learning high-order contextual relationships from natural images with general resolution (224x224 pixels)
We propose a complex self-attention (CSA) mechanism to model the high-order contextual information with less than half computations of naive SA.
By stacking various layers of CSA blocks, we propose the Fourier Complex Transformer (FCT) model to learn global contextual information from VHR aerial images.
arXiv Detail & Related papers (2022-10-28T08:13:33Z) - Hybrid Parallel Imaging and Compressed Sensing MRI Reconstruction with
GRAPPA Integrated Multi-loss Supervised GAN [2.7110495144693374]
This paper proposes a novel Generative Adversarial Network (GAN) namely RECGAN-GR supervised with multi-modal losses for de-aliasing the reconstructed image.
The proposed work contributes to significant improvement in the image quality for low retained data leading to 5x or 10x faster acquisition.
arXiv Detail & Related papers (2022-09-19T07:26:45Z) - ReconFormer: Accelerated MRI Reconstruction Using Recurrent Transformer [60.27951773998535]
We propose a recurrent transformer model, namely textbfReconFormer, for MRI reconstruction.
It can iteratively reconstruct high fertility magnetic resonance images from highly under-sampled k-space data.
We show that it achieves significant improvements over the state-of-the-art methods with better parameter efficiency.
arXiv Detail & Related papers (2022-01-23T21:58:19Z) - Fourier Space Losses for Efficient Perceptual Image Super-Resolution [131.50099891772598]
We show that it is possible to improve the performance of a recently introduced efficient generator architecture solely with the application of our proposed loss functions.
We show that our losses' direct emphasis on the frequencies in Fourier-space significantly boosts the perceptual image quality.
The trained generator achieves comparable results with and is 2.4x and 48x faster than state-of-the-art perceptual SR methods RankSRGAN and SRFlow respectively.
arXiv Detail & Related papers (2021-06-01T20:34:52Z) - Adaptive Gradient Balancing for UndersampledMRI Reconstruction and
Image-to-Image Translation [60.663499381212425]
We enhance the image quality by using a Wasserstein Generative Adversarial Network combined with a novel Adaptive Gradient Balancing technique.
In MRI, our method minimizes artifacts, while maintaining a high-quality reconstruction that produces sharper images than other techniques.
arXiv Detail & Related papers (2021-04-05T13:05:22Z) - Focal Frequency Loss for Image Reconstruction and Synthesis [125.7135706352493]
We show that narrowing gaps in the frequency domain can ameliorate image reconstruction and synthesis quality further.
We propose a novel focal frequency loss, which allows a model to adaptively focus on frequency components that are hard to synthesize.
arXiv Detail & Related papers (2020-12-23T17:32:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.