Frequency-Domain Fusion Transformer for Image Inpainting
- URL: http://arxiv.org/abs/2506.18437v1
- Date: Mon, 23 Jun 2025 09:19:04 GMT
- Title: Frequency-Domain Fusion Transformer for Image Inpainting
- Authors: Sijin He, Guangfeng Lin, Tao Li, Yajun Chen,
- Abstract summary: This paper proposes a Transformer-based image inpainting method incorporating frequency-domain fusion.<n> Experimental results demonstrate that the proposed method effectively improves the quality of image inpainting by preserving more high-frequency information.
- Score: 6.4194162137514725
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image inpainting plays a vital role in restoring missing image regions and supporting high-level vision tasks, but traditional methods struggle with complex textures and large occlusions. Although Transformer-based approaches have demonstrated strong global modeling capabilities, they often fail to preserve high-frequency details due to the low-pass nature of self-attention and suffer from high computational costs. To address these challenges, this paper proposes a Transformer-based image inpainting method incorporating frequency-domain fusion. Specifically, an attention mechanism combining wavelet transform and Gabor filtering is introduced to enhance multi-scale structural modeling and detail preservation. Additionally, a learnable frequency-domain filter based on the fast Fourier transform is designed to replace the feedforward network, enabling adaptive noise suppression and detail retention. The model adopts a four-level encoder-decoder structure and is guided by a novel loss strategy to balance global semantics and fine details. Experimental results demonstrate that the proposed method effectively improves the quality of image inpainting by preserving more high-frequency information.
Related papers
- Global Modeling Matters: A Fast, Lightweight and Effective Baseline for Efficient Image Restoration [9.2933763571933]
Pyramid Wavelet-Fourier Network (PW-FNet) is an efficient restoration baseline for image restoration.<n>PW-FNet features multi-input multi-output structure to achieve multi-scale and multi-frequency bands decomposition.<n>Experiments on tasks such as image deraining, raindrop removal, image super-resolution, motion deblurring, image dehazing and underwater/low-light enhancement demonstrate that PW-FNet not only surpasses state-of-the-art methods in restoration quality but also achieves superior efficiency.
arXiv Detail & Related papers (2025-07-18T05:15:04Z) - Learning Multi-scale Spatial-frequency Features for Image Denoising [58.883244886588336]
We propose a novel multi-scale adaptive dual-domain network (MADNet) for image denoising.<n>We use image pyramid inputs to restore noise-free results from low-resolution images.<n>In order to realize the interaction of high-frequency and low-frequency information, we design an adaptive spatial-frequency learning unit.
arXiv Detail & Related papers (2025-06-19T13:28:09Z) - A Novel Lightweight Transformer with Edge-Aware Fusion for Remote Sensing Image Captioning [0.12499537119440242]
A lightweight transformer architecture is proposed to reduce the dimensionality of the encoder layers and employ a distilled version of GPT-2 as the decoder.<n>A knowledge distillation strategy is used to transfer knowledge from a more complex teacher model to improve the performance of the lightweight network.<n> Experimental results demonstrate that the proposed approach significantly improves caption quality compared to state-of-the-art methods.
arXiv Detail & Related papers (2025-06-11T06:24:02Z) - Freqformer: Image-Demoiréing Transformer via Efficient Frequency Decomposition [83.40450475728792]
We present Freqformer, a Transformer-based framework specifically designed for image demoir'eing through targeted frequency separation.<n>Our method performs an effective frequency decomposition that explicitly splits moir'e patterns into high-frequency spatially-localized textures and low-frequency scale-robust color distortions.<n>Experiments on various demoir'eing benchmarks demonstrate that Freqformer achieves state-of-the-art performance with a compact model size.
arXiv Detail & Related papers (2025-05-25T12:23:10Z) - Wavelet-Driven Masked Image Modeling: A Path to Efficient Visual Representation [27.576174611043367]
Masked Image Modeling (MIM) has garnered significant attention in self-supervised learning, thanks to its impressive capacity to learn scalable visual representations tailored for downstream tasks.<n>However, images inherently contain abundant redundant information, leading the pixel-based MIM reconstruction process to focus excessively on finer details such as textures, thus prolonging training times unnecessarily.<n>In this study, we leverage wavelet transform as a tool for efficient representation learning to expedite the training process of MIM.
arXiv Detail & Related papers (2025-03-02T08:11:26Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - Holistic Dynamic Frequency Transformer for Image Fusion and Exposure Correction [18.014481087171657]
The correction of exposure-related issues is a pivotal component in enhancing the quality of images.
This paper proposes a novel methodology that leverages the frequency domain to improve and unify the handling of exposure correction tasks.
Our proposed method achieves state-of-the-art results, paving the way for more sophisticated and unified solutions in exposure correction.
arXiv Detail & Related papers (2023-09-03T14:09:14Z) - Gated Multi-Resolution Transfer Network for Burst Restoration and
Enhancement [75.25451566988565]
We propose a novel Gated Multi-Resolution Transfer Network (GMTNet) to reconstruct a spatially precise high-quality image from a burst of low-quality raw images.
Detailed experimental analysis on five datasets validates our approach and sets a state-of-the-art for burst super-resolution, burst denoising, and low-light burst enhancement.
arXiv Detail & Related papers (2023-04-13T17:54:00Z) - MAT: Mask-Aware Transformer for Large Hole Image Inpainting [79.67039090195527]
We present a novel model for large hole inpainting, which unifies the merits of transformers and convolutions.
Experiments demonstrate the state-of-the-art performance of the new model on multiple benchmark datasets.
arXiv Detail & Related papers (2022-03-29T06:36:17Z) - CM-GAN: Image Inpainting with Cascaded Modulation GAN and Object-Aware
Training [112.96224800952724]
We propose cascaded modulation GAN (CM-GAN) to generate plausible image structures when dealing with large holes in complex images.
In each decoder block, global modulation is first applied to perform coarse semantic-aware synthesis structure, then spatial modulation is applied on the output of global modulation to further adjust the feature map in a spatially adaptive fashion.
In addition, we design an object-aware training scheme to prevent the network from hallucinating new objects inside holes, fulfilling the needs of object removal tasks in real-world scenarios.
arXiv Detail & Related papers (2022-03-22T16:13:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.